Re: dataconfig to index ZIP Files
Try setting dataSource=null for your toplevel entity and use filename=\.zip$ as filename selector. Am 28.06.2013 23:14, schrieb ericrs22: unfortunately not. I had tried that before with the logs saying: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 With .*zip i get this: WARN SimplePropertiesWriter Unable to read: dataimport.properties -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
To answer the previous Post: I was not sure what datasource=binaryFile I took it from a PDF sample thinking that would help. after setting datasource=null I'm still gett the same errors... dataConfig dataSource type=BinFileDataSource user=svcSolr password=SomePassword / document entity name=Archive processor=FileListEntityProcessor baseDir=E:\ArchiveRoot fileName=.zip$ recursive=true rootEntity=false dataSource=null onError=skip field column=fileSize name=size/ field column=file name=filename/ /entity /document /dataConfig the logs report this: INFO - 2013-07-01 16:45:57.317; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import WARN - 2013-07-01 16:45:57.333; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read: dataimport.properties -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
IIRC Zip files are not supported On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote: To answer the previous Post: I was not sure what datasource=binaryFile I took it from a PDF sample thinking that would help. after setting datasource=null I'm still gett the same errors... dataConfig dataSource type=BinFileDataSource user=svcSolr password=SomePassword / document entity name=Archive processor=FileListEntityProcessor baseDir=E:\ArchiveRoot fileName=.zip$ recursive=true rootEntity=false dataSource=null onError=skip field column=fileSize name=size/ field column=file name=filename/ /entity /document /dataConfig the logs report this: INFO - 2013-07-01 16:45:57.317; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import WARN - 2013-07-01 16:45:57.333; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read: dataimport.properties -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
Re: dataconfig to index ZIP Files
I'm using the Tika plugin to do so and according to http://tika.apache.org/0.5/formats.html it does *ZIP archive (application/zip) Tika uses Java's built-in Zip classes to parse ZIP files. Support for ZIP was added in Tika 0.2.* -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
not sure if this will help any. Here's the verbose log INFO - 2013-07-01 23:17:08.632; org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration: tika-data-config.xml INFO - 2013-07-01 23:17:08.648; org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded successfully INFO - 2013-07-01 23:17:08.663; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={optimize=falseclean=falseindent=truecommit=falseverbose=trueentity=Archivecommand=full-importdebug=falsewt=json} status=0 QTime=31 INFO - 2013-07-01 23:17:08.663; org.apache.solr.handler.dataimport.DataImporter; Starting Full Import INFO - 2013-07-01 23:17:08.679; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720628679wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:08.679; org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read dataimport.properties INFO - 2013-07-01 23:17:09.552; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720629552wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:11.580; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720631577wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:13.593; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720633593wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:15.247; org.apache.solr.handler.dataimport.DocBuilder; Time taken = 0:0:6.553 INFO - 2013-07-01 23:17:15.247; org.apache.solr.update.processor.LogUpdateProcessor; [tika] webapp=/solr path=/dataimport params={optimize=falseclean=falseindent=truecommit=falseverbose=trueentity=Archivecommand=full-importdebug=falsewt=json} status=0 QTime=31 {} 0 31 INFO - 2013-07-01 23:17:15.621; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720635621wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:17.259; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720637256wt=json} status=0 QTime=0 INFO - 2013-07-01 23:17:17.649; org.apache.solr.core.SolrCore; [tika] webapp=/solr path=/dataimport params={indent=truecommand=status_=1372720637645wt=json} status=0 QTime=0 -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074498.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
Hi, Maybe fileName=*.zip instead of .*zip ? Steve On Jun 28, 2013, at 2:20 PM, ericrs22 ericr...@yahoo.com wrote: So I thought I had it correctly setup but I'm receiveing the following response to my Data Config Last Update: 18:17:52 (Duration: 07s) Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s) Started: 13 minutes ago Here's my Data config. dataConfig dataSource type=FileDataSource / document entity name=Archive processor=FileListEntityProcessor baseDir=E:\ArchiveRoot fileName=.*zip recursive=true rootEntity=false dataSource=binaryFile onError=skip field column=fileSize name=size/ field column=file name=filename/ /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
unfortunately not. I had tried that before with the logs saying: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 With .*zip i get this: WARN SimplePropertiesWriter Unable to read: dataimport.properties -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074009.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataconfig to index ZIP Files
What is dataSource=binaryFile? I don't see any such data source defined in your configuration. On Fri, Jun 28, 2013 at 11:50 PM, ericrs22 ericr...@yahoo.com wrote: So I thought I had it correctly setup but I'm receiveing the following response to my Data Config Last Update: 18:17:52 (Duration: 07s) Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s) Started: 13 minutes ago Here's my Data config. dataConfig dataSource type=FileDataSource / document entity name=Archive processor=FileListEntityProcessor baseDir=E:\ArchiveRoot fileName=.*zip recursive=true rootEntity=false dataSource=binaryFile onError=skip field column=fileSize name=size/ field column=file name=filename/ /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.