Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Can you please suggest a way (with example) of assigning this unique key to a
pdf file?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074588.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Okay. Can you please suggest a way (with an example) of assigning this unique
key to a pdf file. Say, a unique number to each pdf file. How do i achieve
this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread Shalin Shekhar Mangar
We can't tell you what the id of your own document should be. Isn't
there anything which is unique about your pdf files? How about the
file name or the absolute path?

On Tue, Jul 2, 2013 at 11:33 AM, archit2112 archit2...@gmail.com wrote:
 Okay. Can you please suggest a way (with an example) of assigning this unique
 key to a pdf file. Say, a unique number to each pdf file. How do i achieve
 this?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074592.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Yes. The absolute path is unique.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074620.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread archit2112
Yes. The absolute path is unique. How do i implement it? can you please
explain?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074638.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-02 Thread Shalin Shekhar Mangar
See http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

The implicit fields generated by the FileListEntityProcessor are
fileDir, file, fileAbsolutePath, fileSize, fileLastModified and these
are available for use within the entity

On Tue, Jul 2, 2013 at 2:47 PM, archit2112 archit2...@gmail.com wrote:
 Yes. The absolute path is unique. How do i implement it? can you please
 explain?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074638.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.


Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Hi

Im trying to index pdf files in solr 4.3.0 using the data import handler. 

*My request handler - *

requestHandler name=/dataimport1 
class=org.apache.solr.handler.dataimport.DataImportHandler 
lst name=defaults 
  str name=configdata-config1.xml/str 
/lst 
  /requestHandler 

*My data-config1.xml *

dataConfig 
dataSource type=BinFileDataSource / 
document 
entity name=f dataSource=null rootEntity=false 
processor=FileListEntityProcessor 
baseDir=C:\Users\aroraarc\Desktop\Impdo fileName=.*pdf 
recursive=true 
entity name=tika-test processor=TikaEntityProcessor 
url=${f.fileAbsolutePath} format=text 
field column=Author name=author meta=true/
field column=title name=title1 meta=true/
field column=text name=text/
/entity 
/entity 
/document 
/dataConfig 


Now When i try and index the files i get the following error -

org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: id
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:517)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at 
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)


This problem can be solved easily in case of database indexing but i dont
know how to go about the unique key of a document. how do i define the id
field (unique key) of a pdf file. how do i solve this problem?

Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky

It all depends on your data model - tell us more about your data model.

For example, how will users or applications query these documents and what 
will they expect to be able to do with the ID/key for the documents?


How are you expecting to identify documents in your data model?

-- Jack Krupansky

-Original Message- 
From: archit2112

Sent: Monday, July 01, 2013 7:17 AM
To: solr-user@lucene.apache.org
Subject: Unique key error while indexing pdf files

Hi

Im trying to index pdf files in solr 4.3.0 using the data import handler.

*My request handler - *

requestHandler name=/dataimport1
class=org.apache.solr.handler.dataimport.DataImportHandler
   lst name=defaults
 str name=configdata-config1.xml/str
   /lst
 /requestHandler

*My data-config1.xml *

dataConfig
dataSource type=BinFileDataSource /
document
entity name=f dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=C:\Users\aroraarc\Desktop\Impdo fileName=.*pdf
recursive=true
entity name=tika-test processor=TikaEntityProcessor
url=${f.fileAbsolutePath} format=text
field column=Author name=author meta=true/
field column=title name=title1 meta=true/
field column=text name=text/
/entity
/entity
/document
/dataConfig


Now When i try and index the files i get the following error -

org.apache.solr.common.SolrException: Document is missing mandatory
uniqueKey field: id
at
org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:88)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:517)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:396)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:70)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:500)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)


This problem can be solved easily in case of database indexing but i dont
know how to go about the unique key of a document. how do i define the id
field (unique key) of a pdf file. how do i solve this problem?

Thanks in advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Im new to solr. Im just trying to understand and explore various features
offered by solr and their implementations. I would be very grateful if you
could solve my problem with any example of your choice. I just want to learn
how i can index pdf documents using data import handler.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It's really 100% up to you how you want to come up with the unique key 
values for your documents. What would you like them to be? Just use that. 
Anything (within reason) - anything goes.


But it also comes back to your data model. You absolutely must come up with 
a data model for how you expect to index and query data in Solr before you 
just start throwing random data into Solr.


1. Design your data model.
2. Produce a Solr schema from that data model.
3. Map the raw data from your data sources (e.g., PDF files) to the fields 
in your Solr schema.


That last step includes the ID/key field, but your data model will imply any 
requirements for what the ID/key should be.


To be absolutely clear, it is 100% up to you to design the ID/key for every 
document; Solr does NOT do that for you.


Even if you are just exploring, at least come up with an exploratory 
data model - which includes what expectations you have about the unique 
ID/key for each document.


So, for that first PDF file, what expectation (according to your data model) 
do you have for what its ID/key should be?


-- Jack Krupansky

-Original Message- 
From: archit2112

Sent: Monday, July 01, 2013 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Unique key error while indexing pdf files

Im new to solr. Im just trying to understand and explore various features
offered by solr and their implementations. I would be very grateful if you
could solve my problem with any example of your choice. I just want to learn
how i can index pdf documents using data import handler.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unique-key-error-while-indexing-pdf-files-tp4074314p4074327.html
Sent from the Solr - User mailing list archive at Nabble.com.