[ https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639393#comment-14639393 ]
Sebastian Nagel commented on NUTCH-2048: ---------------------------------------- Hi, it's not as trivial. There are more duplicates: {noformat} % perl -lne 'push @{$h{$2}}, $1 if /<library name="((.+?)-[.0-9]*\.jar)"/; END { for (keys %h) { print $_, ": ", join(", ", @{$h{$_}}) if @{$h{$_}} > 1 }}' src/plugin/parse-tika/plugin.xml jhighlight: jhighlight-1.0.2.jar, jhighlight-1.0.jar commons-compress: commons-compress-1.8.1.jar, commons-compress-1.9.jar metadata-extractor: metadata-extractor-2.6.2.jar, metadata-extractor-2.8.0.jar commons-codec: commons-codec-1.6.jar, commons-codec-1.9.jar slf4j-api: slf4j-api-1.6.1.jar, slf4j-api-1.7.12.jar fontbox: fontbox-1.8.8.jar, fontbox-1.8.9.jar jempbox: jempbox-1.8.8.jar, jempbox-1.8.9.jar pdfbox: pdfbox-1.8.8.jar, pdfbox-1.8.9.jar tika-parsers: tika-parsers-1.7.jar, tika-parsers-1.8.jar {noformat} What shall be exactly listed in the plugin.xml? All libs placed by ant/ivy in runtime/local/plugins/parse-tika? That's currently 66! If yes, there is even more to do. That's the difference between plugin.xml (left), jars onlyin the plugin folder (middle) and common jars (right): {noformat} % ls runtime/local/plugins/parse-tika/ | grep -v '^parse-tika\.jar$' | grep -v plugin.xml | sort >/tmp/tika_jars.txt % perl -lne 'print $1 if /<library name="((.+?)-[.0-9]*\.jar)"/' src/plugin/parse-tika/plugin.xml | sort | comm - /tmp/tika_jars.txt apache-mime4j-core-0.7.2.jar apache-mime4j-dom-0.7.2.jar asm-debug-all-4.1.jar aspectjrt-1.8.0.jar bcmail-jdk15-1.45.jar bcmail-jdk15on-1.52.jar bcpkix-jdk15on-1.52.jar bcprov-jdk15-1.45.jar bcprov-jdk15on-1.52.jar boilerpipe-1.1.0.jar bzip2-0.9.1.jar c3p0-0.9.1.1.jar cdm-4.5.5.jar commons-codec-1.6.jar commons-codec-1.9.jar commons-compress-1.8.1.jar commons-compress-1.9.jar commons-csv-1.0.jar commons-httpclient-3.1.jar commons-logging-1.1.1.jar commons-logging-api-1.1.jar commons-vfs2-2.0.jar ehcache-core-2.6.2.jar fontbox-1.8.8.jar fontbox-1.8.9.jar grib-4.5.5.jar guava-10.0.1.jar httpclient-4.2.6.jar httpcore-4.2.5.jar httpmime-4.2.6.jar httpservices-4.5.5.jar isoparser-1.0.2.jar java-libpst-0.8.1.jar jcip-annotations-1.0.jar jcommander-1.35.jar jdom-1.0.jar jdom2-2.0.4.jar jempbox-1.8.8.jar jempbox-1.8.9.jar jhighlight-1.0.2.jar jhighlight-1.0.jar jj2000-5.2.jar jmatio-1.0.jar jna-4.1.0.jar joda-time-2.2.jar jsoup-1.7.2.jar jsr305-1.3.9.jar juniversalchardet-1.0.3.jar junrar-0.7.jar maven-scm-api-1.4.jar maven-scm-provider-svn-commons-1.4.jar maven-scm-provider-svnexe-1.4.jar metadata-extractor-2.6.2.jar metadata-extractor-2.8.0.jar netcdf-4.2.20.jar netcdf4-4.5.5.jar pdfbox-1.8.8.jar pdfbox-1.8.9.jar plexus-utils-1.5.6.jar poi-3.11.jar poi-3.12-beta1.jar poi-ooxml-3.11.jar poi-ooxml-3.12-beta1.jar poi-ooxml-schemas-3.11.jar poi-ooxml-schemas-3.12-beta1.jar poi-scratchpad-3.11.jar poi-scratchpad-3.12-beta1.jar protobuf-java-2.5.0.jar quartz-2.2.0.jar regexp-1.3.jar rome-0.9.jar slf4j-api-1.6.1.jar slf4j-api-1.7.12.jar sqlite-jdbc-3.8.6.jar tagsoup-1.2.1.jar tika-parsers-1.7.jar tika-parsers-1.8.jar udunits-4.5.5.jar unidataCommon-4.2.20.jar vorbis-java-core-0.6.jar vorbis-java-tika-0.6.jar xercesImpl-2.8.1.jar xml-apis-1.3.03.jar xmlbeans-2.6.0.jar xmpcore-5.1.2.jar xz-1.5.jar {noformat} > parse-tika: fix dependencies in plugin.xml > ------------------------------------------ > > Key: NUTCH-2048 > URL: https://issues.apache.org/jira/browse/NUTCH-2048 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.10 > Reporter: Sebastian Nagel > Priority: Trivial > Fix For: 1.11 > > Attachments: NUTCH-2048_Joyce_20150723.patch > > > Duplicate library dependencies listed in parse-tika's plugin.xml should be > cleaned up. There are a duplicates, only the version differs, e.g.: > {noformat} > tika-parsers-1.7.jar > tika-parsers-1.8.jar > {noformat} > Not critical because libs which are not present should be just ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)