[ https://issues.apache.org/jira/browse/TIKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074175#comment-16074175 ]
Gus Heck commented on TIKA-1367: -------------------------------- So when the dust settles here, how will one build a coherent, workable one-jar application that supports code like this that intends to make a best effort to parse any document that might be encountered: {code} Tika tika = new Tika(); tika.setMaxStringLength(document.getRawData().length); Metadata metadata = new Metadata(); try (ByteArrayInputStream bais = new ByteArrayInputStream(rawData)) { String textContent = tika.parseToString(bais, metadata); document.setRawData(textContent.getBytes(Charset.forName("UTF-8"))); for (String name : metadata.names()) { document.put(sanitize(name) + plusSuffix(), metadata.get(name)); } } catch (IOException | TikaException e) { log.warn("Tika processing failure!", e); // if tika can't parse it we certainly don't want random binary crap in the index document.setStatus(Status.DROPPED); } {code} Although I notice that this is not marked as fixed yet, in 1.15, the above code no-longer compiles... (and somehow there are no dependencies reported by gradle...) {code} compile - Dependencies for source set 'main'. +--- org.apache.tika:tika-parsers:1.15 +--- org.apache.solr:solr-solrj:5.5.0 | +--- commons-io:commons-io:2.4 {code} vs {code} +--- org.apache.tika:tika-parsers:1.12 | +--- org.apache.tika:tika-core:1.12 | +--- org.gagravarr:vorbis-java-tika:0.6 | | \--- org.apache.tika:tika-core:1.5 -> 1.12 | +--- com.healthmarketscience.jackcess:jackcess:2.1.2 {code} Which seems very much like it's totally going to break everything... if gradle doesn't see the deps, one-jar won't package them (all I did was change a 1.12 to a 1.15 in the gradle build to cause this) > Tika documentation should list tika-parsers parser dependencies > --------------------------------------------------------------- > > Key: TIKA-1367 > URL: https://issues.apache.org/jira/browse/TIKA-1367 > Project: Tika > Issue Type: Improvement > Components: documentation > Reporter: Sergey Beryozkin > Fix For: 1.16 > > > tika-parsers module has many strong transitive parser dependencies. Maven > users of tika-parsers have to exclude all the transitivie dependencies > manually. Documenting the list of the existing transitive dependencies and > keeping the list up to date will help developers exclude the libraries not > needed for a given project. -- This message was sent by Atlassian JIRA (v6.4.14#64029)