[
https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-2416:
---------------------------
Affects Version/s: (was: 4.0)
1.4.1
Fix Version/s: 3.2
Summary: Solr Cell fails to index Zip file contents (was: Solr
Cell & DataImport Tika handler broken - fails to index Zip file contents)
I'm not sure what exactly jayendra is referring to by "was addressed some time
back ... seems to have reappeared" (i couldn't find any issues that looked
similar) but i just tested and confirmed that in 1.4.1 SolrCell only indexed
the metadata about *.zip files, not the contents of the zip.
the behavior in the 3.1rc1 solr release candidate is consistent with 1.4.1 -
only info about the zip file itself is extracted, not the contents (although in
3.1 we actually extract more metadata then we did in 1.4.1) so this definitely
isn't a 3.1 blocker (some people were wondering on IRC)
I'm not personally even clear if this is really a bug, or if it should be
request option driven -- perhaps some users only want the data about the zip
file, not it's contents; and what should the beahvior be if zip file contains
multiple files, and the request specifies a literal id?
> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
> Key: SOLR-2416
> URL: https://issues.apache.org/jira/browse/SOLR-2416
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler, contrib - Solr Cell (Tika
> extraction)
> Affects Versions: 1.4.1
> Reporter: Jayendra Patil
> Fix For: 3.2
>
> Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr
> Cell (ExtractingDocumentLoader.java) and Data Import handler
> (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have
> reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase -
> https://issues.apache.org/jira/browse/SOLR-2332.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]