[ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034412#comment-16034412 ]
Sergey Beryozkin commented on BEAM-2328: ---------------------------------------- Hi JB, Tim re org.json dependencies, FYI, at the moment the only strong Tika dependency is tika-core. tika-parsers is a test dependency, it is not needed to compile, the current expectation is that the users of the future Tika Input component will add a tika-parsers dependency and as such Tika Parsers (including those that may depend on org.json) will not make it into the Beam distro. I reckon that can make it easier to align with the Tika 2.0-SNAPSHOT effort where a number of mainstream parsers (PDF, etc) is represented by individual modules. I guess an option to ship all of the tika-bundle with tika-io can also be considered but for a start having only a tika-core dependency seems workable to me...In this (current) case if the tika-core itself is org.json free then it should not be an issue. > Introduce Apache Tika Input component > ------------------------------------- > > Key: BEAM-2328 > URL: https://issues.apache.org/jira/browse/BEAM-2328 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas, sdk-java-extensions > Reporter: Sergey Beryozkin > Assignee: Sergey Beryozkin > Fix For: 2.1.0 > > > Apache Tika is a popular project that offers an extensive support for parsing > the variety of file formats. It is used in many projects including Lucene and > Elastic Search. > Supporting a Tika Input (Read) at the Beam level would be of major interest > to many users. > PR is to follow -- This message was sent by Atlassian JIRA (v6.3.15#6346)