[ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051839#comment-16051839 ]
ASF GitHub Bot commented on BEAM-2328: -------------------------------------- GitHub user sberyozkin opened a pull request: https://github.com/apache/beam/pull/3378 [BEAM-2328] Add TikaIO component R: @jbonofre Adding TikaSource and TikaReader tests Updating TikaReader to use TikaInputStream as suggested by Tim Allison Supporting the customization of TikaConfig Cleanup: Moving a 'tika' above 'xml' in io/pom.xml to keep the correct order Renaming TikaInput to TikaIO, adding Read.withOptions, throwing NoSuchElementException if the current is null Removing redundant test annotations Fixing TikaIO JavaDoc typo You can merge this pull request into a Git repository by running: $ git pull https://github.com/sberyozkin/beam tikaio Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/3378.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3378 ---- commit 8c63d91c0a088e2d90d5572051f736f24ea338b5 Author: Sergey Beryozkin <sberyoz...@gmail.com> Date: 2017-05-25T15:47:59Z Adding TikaIO component Enforcing that start is called before advance Adding TikaSource and TikaReader tests Updating TikaReader to use TikaInputStream as suggested by Tim Allison Supporting the customization of TikaConfig Moving a 'tika' above 'xml' in io/pom.xml to keep the correct order Renaming TikaInput to TikaIO, adding Read.withOptions, throwing NoSuchElementException if the current is null Removing redundant test annotations Fixing TikaIO JavaDoc typo ---- > Introduce Apache Tika Input component > ------------------------------------- > > Key: BEAM-2328 > URL: https://issues.apache.org/jira/browse/BEAM-2328 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas, sdk-java-extensions > Reporter: Sergey Beryozkin > Assignee: Sergey Beryozkin > Fix For: 2.1.0 > > > Apache Tika is a popular project that offers an extensive support for parsing > the variety of file formats. It is used in many projects including Lucene and > Elastic Search. > Supporting a Tika Input (Read) at the Beam level would be of major interest > to many users. > PR is to follow -- This message was sent by Atlassian JIRA (v6.4.14#64029)