[
https://issues.apache.org/jira/browse/STREAMS-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370242#comment-14370242
]
ASF GitHub Bot commented on STREAMS-293:
----------------------------------------
Github user jfrazee commented on the pull request:
https://github.com/apache/incubator-streams/pull/195#issuecomment-83785169
:+1: This would be/is really helpful for processing document only streams
on disk.
I would add though that it'd be cool if it was a little bit more flexible
maybe having the option of a user provided function or lambda to define how to
process the files -- problem for another day though.
> allow for missing metadata fields in streams-persist-hdfs
> ---------------------------------------------------------
>
> Key: STREAMS-293
> URL: https://issues.apache.org/jira/browse/STREAMS-293
> Project: Streams
> Issue Type: Improvement
> Reporter: Steve Blackmon
> Assignee: Steve Blackmon
>
> Currently streams-persist-hdfs writer creates (and reader expects) exactly
> four columns. this could be made much more flexible without too much effort.
>
> Update reader to support additional use cases:
> a) file paths containing one json document per line
> b) file paths containing just id and json on each line,
> c) file paths containing id timestamp and json document on each line
> Update writer support
> a) ids only
> b) ids and timestamp only
> c) ids timestamp and json only
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)