[ 
https://issues.apache.org/jira/browse/TIKA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242378#comment-13242378
 ] 

Uwe Schindler commented on TIKA-888:
------------------------------------

Thanks Chris,

we are already planning to remove all parsers not useful for Solr (including 
NetCDF). We don't use transitive dependencies at the moment, because we want to 
be sure what libs are added and for the binary distribution we need to add 
license notes (which cannot be generated by Ivy) for every single JAR. So we 
would simply remove the dependency to ucar.

The question is: The parser is still listed in META-INF, so when a Java 5 users 
tries to parse a NetCDF file, he gets a ClassNotFound by the NetCDF parser. 
Whats the best way to handle that? tika-config.xml is horrible to us, it would 
be good to pass a META-INF like list to the AutoDetectParser (I implemented 
that for another non-solr project we use at PANGAEA, where i used the META-INF 
list of Tika, deleted all unused parsers and passed them somehow to TIKA). This 
needed extra coding. Pointing e.g. AutodetectParser to a custom parser list 
would be nice and easy to manage in Solr (for me, too).

On the mailing list we already discussed about better possibilities for Solr 
(Solr is only interested in full text, the metadata is mostly ignored), so 
parsing mp3 files is simply useless. A good idea for TIKA would be to have 
several tika-parsers packages, maybe one with "office document parsers", 
"images",... Are there any plans to split the parser package? This would make 
it easier for users to download a subset with all transitive dependencies and 
not get ClassNotFoundException if removing the wrong JAR files by hand.

bq. We disable the tests for NetCDF java if the code isn't being compiled on a 
1.6 platform, so all of the Tika code compiles just fine

I tried this a few weeks ago and with JDK 1.5, tests were failing.
                
> NetCDF parser uses Java 6 JAR file and test/compilation fails with Java 1.5, 
> although TIKA is Java 1.5
> ------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-888
>                 URL: https://issues.apache.org/jira/browse/TIKA-888
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Uwe Schindler
>            Assignee: Chris A. Mattmann
>
> Lucene/Solr developers ran this tool before releasing Lucene/Solr 3.6 (Solr 
> 3.6 is still required to run on Java 1.5, see SOLR-3295): 
> http://code.google.com/p/versioncheck/
> {noformat}
> Major.Minor Version : 50.0             JAVA compatibility : Java 1.6 
> platform: 45.3-50.0
> Number of classes : 60
> Classes are: 
> c:\Work\lucene-solr\.\solr\contrib\extraction\lib\netcdf-4.2-min.jar [:] 
> ucar/unidata/geoloc/Bearing.class
> ...
> {noformat}
> TIKA should use a 1.5 version of this class and especially do some Java 5 
> tests before releasing (as it's build dependencies says, it's minimum Java5). 
> I tried to compile and run TIKA tests with Java 1.5 -> crash (Invalid class 
> file format).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to