[ https://issues.apache.org/jira/browse/BEAM-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032881#comment-16032881 ]
Sergey Beryozkin commented on BEAM-2328: ---------------------------------------- Hi JB, Tim Yes, TikaReader returns Strings, but as JB just pointed out the default coder is not used, so I'll fix it, thanks JB :-). Tim, the reason I mentioned that I do not expect 'anything but Strings' is because in many cases, as far as I can see, Beam readers can be typed for different types and custom Beam coders can support such conversions, but I agree in case of Tika is is really only about String as it is impossible to predict at the generic Tika API level what a given format parser can produce, etc... Tim - I also updated the reader to use TikaInputStream, thanks > Introduce Apache Tika Input component > ------------------------------------- > > Key: BEAM-2328 > URL: https://issues.apache.org/jira/browse/BEAM-2328 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas, sdk-java-extensions > Reporter: Sergey Beryozkin > Assignee: Sergey Beryozkin > Fix For: 2.1.0 > > > Apache Tika is a popular project that offers an extensive support for parsing > the variety of file formats. It is used in many projects including Lucene and > Elastic Search. > Supporting a Tika Input (Read) at the Beam level would be of major interest > to many users. > PR is to follow -- This message was sent by Atlassian JIRA (v6.3.15#6346)