Re: dataconfig to index ZIP Files

2013-07-01 Thread Bernd Fehling
Try setting dataSource=null for your toplevel entity and
use filename=\.zip$ as filename selector.



Am 28.06.2013 23:14, schrieb ericrs22:
 unfortunately not. I had tried that before with the logs saying:
 
 Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
 java.util.regex.PatternSyntaxException: Dangling meta character '*' near
 index 0 
 
 
 With .*zip i get this:
 
 
 WARN
  
 SimplePropertiesWriter
  
 Unable to read: dataimport.properties
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074009.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
To answer the previous Post:

I was not sure what datasource=binaryFile I took it from a PDF sample
thinking that would help.

after setting datasource=null I'm still gett the same errors...

dataConfig
dataSource type=BinFileDataSource user=svcSolr
password=SomePassword /
document
entity name=Archive
  processor=FileListEntityProcessor baseDir=E:\ArchiveRoot
fileName=.zip$ recursive=true rootEntity=false dataSource=null
onError=skip

field column=fileSize name=size/
field column=file 
name=filename/

/entity

/document
/dataConfig

the logs report this:

 
INFO  - 2013-07-01 16:45:57.317;
org.apache.solr.handler.dataimport.DataImporter; Starting Full Import
WARN  - 2013-07-01 16:45:57.333;
org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read:
dataimport.properties




--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataconfig to index ZIP Files

2013-07-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
IIRC Zip files are not supported


On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote:

 To answer the previous Post:

 I was not sure what datasource=binaryFile I took it from a PDF sample
 thinking that would help.

 after setting datasource=null I'm still gett the same errors...

 dataConfig
 dataSource type=BinFileDataSource user=svcSolr
 password=SomePassword /
 document
 entity name=Archive
   processor=FileListEntityProcessor baseDir=E:\ArchiveRoot
 fileName=.zip$ recursive=true rootEntity=false dataSource=null
 onError=skip

 field column=fileSize name=size/
 field column=file
 name=filename/

 /entity

 /document
 /dataConfig

 the logs report this:


 INFO  - 2013-07-01 16:45:57.317;
 org.apache.solr.handler.dataimport.DataImporter; Starting Full Import
 WARN  - 2013-07-01 16:45:57.333;
 org.apache.solr.handler.dataimport.SimplePropertiesWriter; Unable to read:
 dataimport.properties




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074399.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
-
Noble Paul


Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
I'm using the Tika plugin to do so and according to
http://tika.apache.org/0.5/formats.html it does


*ZIP archive (application/zip) Tika uses Java's built-in Zip classes to
parse ZIP files.
Support for ZIP was added in Tika 0.2.*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074421.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
not sure if this will help any.

Here's the verbose log 

INFO  - 2013-07-01 23:17:08.632;
org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration:
tika-data-config.xml
INFO  - 2013-07-01 23:17:08.648;
org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded
successfully
INFO  - 2013-07-01 23:17:08.663; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={optimize=falseclean=falseindent=truecommit=falseverbose=trueentity=Archivecommand=full-importdebug=falsewt=json}
status=0 QTime=31 
INFO  - 2013-07-01 23:17:08.663;
org.apache.solr.handler.dataimport.DataImporter; Starting Full Import
INFO  - 2013-07-01 23:17:08.679; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720628679wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:08.679;
org.apache.solr.handler.dataimport.SimplePropertiesWriter; Read
dataimport.properties
INFO  - 2013-07-01 23:17:09.552; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720629552wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:11.580; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720631577wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:13.593; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720633593wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:15.247;
org.apache.solr.handler.dataimport.DocBuilder; Time taken = 0:0:6.553
INFO  - 2013-07-01 23:17:15.247;
org.apache.solr.update.processor.LogUpdateProcessor; [tika] webapp=/solr
path=/dataimport
params={optimize=falseclean=falseindent=truecommit=falseverbose=trueentity=Archivecommand=full-importdebug=falsewt=json}
status=0 QTime=31 {} 0 31
INFO  - 2013-07-01 23:17:15.621; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720635621wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:17.259; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720637256wt=json} status=0 QTime=0 
INFO  - 2013-07-01 23:17:17.649; org.apache.solr.core.SolrCore; [tika]
webapp=/solr path=/dataimport
params={indent=truecommand=status_=1372720637645wt=json} status=0 QTime=0 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074498.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataconfig to index ZIP Files

2013-06-28 Thread Steve Rowe
Hi,

Maybe fileName=*.zip instead of .*zip ?

Steve

On Jun 28, 2013, at 2:20 PM, ericrs22 ericr...@yahoo.com wrote:

 So I thought I had it correctly setup but I'm receiveing the following
 response to my Data Config
 
 Last Update: 18:17:52
 
 (Duration: 07s)
 
 Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)
 
 Started: 13 minutes ago
 
 Here's my Data config.
 
 dataConfig
dataSource type=FileDataSource /
document
entity name=Archive
  processor=FileListEntityProcessor baseDir=E:\ArchiveRoot
 fileName=.*zip recursive=true rootEntity=false dataSource=binaryFile
 onError=skip
 
field column=fileSize name=size/
  field column=file name=filename/
 
/entity
/document
 /dataConfig
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: dataconfig to index ZIP Files

2013-06-28 Thread ericrs22
unfortunately not. I had tried that before with the logs saying:

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near
index 0 


With .*zip i get this:


WARN
 
SimplePropertiesWriter
 
Unable to read: dataimport.properties



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965p4074009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataconfig to index ZIP Files

2013-06-28 Thread Shalin Shekhar Mangar
What is dataSource=binaryFile? I don't see any such data source
defined in your configuration.

On Fri, Jun 28, 2013 at 11:50 PM, ericrs22 ericr...@yahoo.com wrote:
 So I thought I had it correctly setup but I'm receiveing the following
 response to my Data Config

 Last Update: 18:17:52

  (Duration: 07s)

 Requests: 0 (0/s), Fetched: 0 (0/s), Skipped: 0, Processed: 0 (0/s)

 Started: 13 minutes ago

 Here's my Data config.

 dataConfig
 dataSource type=FileDataSource /
 document
 entity name=Archive
   processor=FileListEntityProcessor baseDir=E:\ArchiveRoot
 fileName=.*zip recursive=true rootEntity=false dataSource=binaryFile
 onError=skip

 field column=fileSize name=size/
field column=file name=filename/

 /entity
 /document
 /dataConfig




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/dataconfig-to-index-ZIP-Files-tp4073965.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Regards,
Shalin Shekhar Mangar.