[ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032843#comment-16032843 ]
Jean-Baptiste Onofré commented on BEAM-2328: -------------------------------------------- Thanks [~talli...@mitre.org] for the update about the {{org.json}} dependency ! For the {{TikaReader}}, we should produce Strings and use the {{StringUtf8Coder}} to serialize the element in the {{PCollection}}. Let me take a first glance. Thanks ! > Introduce Apache Tika Input component > ------------------------------------- > > Key: BEAM-2328 > URL: https://issues.apache.org/jira/browse/BEAM-2328 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas, sdk-java-extensions > Reporter: Sergey Beryozkin > Assignee: Sergey Beryozkin > Fix For: 2.1.0 > > > Apache Tika is a popular project that offers an extensive support for parsing > the variety of file formats. It is used in many projects including Lucene and > Elastic Search. > Supporting a Tika Input (Read) at the Beam level would be of major interest > to many users. > PR is to follow -- This message was sent by Atlassian JIRA (v6.3.15#6346)