[ 
https://issues.apache.org/jira/browse/TIKA-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096663#comment-15096663
 ] 

Uwe Schindler commented on TIKA-1830:
-------------------------------------

bq. Speaking of integration with Solr, would you have a chance/any interest in 
offering feedback on our initial restructuring of the parser bundles for Tika 
2.0 (TIKA-1824)? Or more generally, do you and your Solr colleagues have any 
wishes for the 2.0 roadmap?

As already stated in the past, we would like to only bundle parsers for text 
document formats, because images, class files or else are not really useful for 
indexing by default. Users that want to do this, can still add the missing 
parser bundles and SPI will do the rest. Currently we have disabled some 
parsers by removing the JAR files (like asm-all.jar, netcdf.jar), so TIKA's SPI 
will disable them automatically (because of ClassNotFoundEx). This was a bit 
rude, but worked.

The reason for this was partly also some version incompatibilities (ASM was old 
in TIKA, Lucene needs newest one), but ASM is not really useful for indexing 
anyways!

In Solr we don't use transitive dependencies in Ivy, so we decide for each JAR 
file which one gets bundled, so we check every release anyways during update.

> Upgrade to PDFBox 1.8.11 when available
> ---------------------------------------
>
>                 Key: TIKA-1830
>                 URL: https://issues.apache.org/jira/browse/TIKA-1830
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>         Attachments: reports_pdfbox_1_8_11-rc1.zip
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to