[ 
https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049057#comment-16049057
 ] 

Sergey Beryozkin edited comment on BEAM-2328 at 6/14/17 11:09 AM:
------------------------------------------------------------------

Hi JB, All,
I'm now ready to create the initial PR. As I said earlier I realize it won't be 
perfect from a start and I have some tasks to do next once PR gets accepted 
(making common-compress 1.14 managed, a couple of possible refactorings which 
would affect the outer Beam source and help minimize the duplication of 
FileBased related utility code inside the Tika component) but for now I'm just 
trying to keep this initial contribution as simple as possible and also self 
contained.
The only immediate question I have is how should this artifact be really named, 
at the moment it is "beam-sdks-java-io-tika" but I wonder should it really be 
"beam-sdks-java-input-tika" given that the output can not be supported ?

Thanks 


was (Author: sergey_beryozkin):
Hi JB, All,
I'm now ready to create the initial PR. As I said earlier I realize it won't be 
perfect from a start and I have some tasks to do next once PR gets accepted 
(making common-compress 1.14 managed, a couple of possible refactorings which 
would affect the outer Beam source and help to minimize the duplication of 
FileBased related utility code inside the Tika component) but for now I'm just 
trying to keep this initial contribution as simple as possible and also self 
contained.
The only immediate question I have is how should this artifact be really named, 
at the moment it is "beam-sdks-java-io-tika" but I wonder should it really be 
"beam-sdks-java-input-tika" given that the output can not be supported ?

Thanks 

> Introduce Apache Tika Input component
> -------------------------------------
>
>                 Key: BEAM-2328
>                 URL: https://issues.apache.org/jira/browse/BEAM-2328
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas, sdk-java-extensions
>            Reporter: Sergey Beryozkin
>            Assignee: Sergey Beryozkin
>             Fix For: 2.1.0
>
>
> Apache Tika is a popular project that offers an extensive support for parsing 
> the variety of file formats. It is used in many projects including Lucene and 
> Elastic Search. 
> Supporting a Tika Input (Read) at the Beam level would be of major interest 
> to many users.
> PR is to follow



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to