[ https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242378#comment-13242378 ]
Uwe Schindler commented on TIKA-888: ------------------------------------ Thanks Chris, we are already planning to remove all parsers not useful for Solr (including NetCDF). We don't use transitive dependencies at the moment, because we want to be sure what libs are added and for the binary distribution we need to add license notes (which cannot be generated by Ivy) for every single JAR. So we would simply remove the dependency to ucar. The question is: The parser is still listed in META-INF, so when a Java 5 users tries to parse a NetCDF file, he gets a ClassNotFound by the NetCDF parser. Whats the best way to handle that? tika-config.xml is horrible to us, it would be good to pass a META-INF like list to the AutoDetectParser (I implemented that for another non-solr project we use at PANGAEA, where i used the META-INF list of Tika, deleted all unused parsers and passed them somehow to TIKA). This needed extra coding. Pointing e.g. AutodetectParser to a custom parser list would be nice and easy to manage in Solr (for me, too). On the mailing list we already discussed about better possibilities for Solr (Solr is only interested in full text, the metadata is mostly ignored), so parsing mp3 files is simply useless. A good idea for TIKA would be to have several tika-parsers packages, maybe one with "office document parsers", "images",... Are there any plans to split the parser package? This would make it easier for users to download a subset with all transitive dependencies and not get ClassNotFoundException if removing the wrong JAR files by hand. bq. We disable the tests for NetCDF java if the code isn't being compiled on a 1.6 platform, so all of the Tika code compiles just fine I tried this a few weeks ago and with JDK 1.5, tests were failing. > NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, > although TIKA is Java 1.5 > ------------------------------------------------------------------------------------------------------ > > Key: TIKA-888 > URL: https://issues.apache.org/jira/browse/TIKA-888 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.0 > Reporter: Uwe Schindler > Assignee: Chris A. Mattmann > > Lucene/Solr developers ran this tool before releasing Lucene/Solr 3.6 (Solr > 3.6 is still required to run on Java 1.5, see SOLR-3295): > http://code.google.com/p/versioncheck/ > {noformat} > Major.Minor Version : 50.0 JAVA compatibility : Java 1.6 > platform: 45.3-50.0 > Number of classes : 60 > Classes are: > c:\Work\lucene-solr\.\solr\contrib\extraction\lib\netcdf-4.2-min.jar [:] > ucar/unidata/geoloc/Bearing.class > ... > {noformat} > TIKA should use a 1.5 version of this class and especially do some Java 5 > tests before releasing (as it's build dependencies says, it's minimum Java5). > I tried to compile and run TIKA tests with Java 1.5 -> crash (Invalid class > file format). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira