[
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182936#comment-15182936
]
Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------
[Yogi]
Ram,
Aim of this concrete operator is write incoming tuples to HDFS files.
Main use-case being : data is read from some source, processed tuple-by-tuple
by some operators and then given to this proposed concrete operator for writing
to HDFS.
As you pointed out, file operation is another common use-case; but we can work
out separate mechanism which handles the complexities explained in your post.
Priyanka has already posted about proposal for HDFS input module having
FileSplitter + BlockReader operator.
I will post another proposal for HDFS file copy module which would seamlessly
integrate with HDFS input module to solve file copy use-case.
Question:
Is it acceptable if we have concrete operator (current proposal) for
tuple-by-tuple writing and have separate module to take care of file copy
use-cases?
~ Yogi
> concrete operator for writing to HDFS file
> ------------------------------------------
>
> Key: APEXMALHAR-2009
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
> Project: Apache Apex Malhar
> Issue Type: Task
> Reporter: Yogi Devendra
> Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers.
> But, someone who is new to apex; would look for ready-made implementation
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar.
> Aim of this operator would be to serve the purpose of ready to use operator
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String.
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline
> character) between tuples.
> Discussion thread on mailing list here:
> http://mail-archives.apache.org/mod_mbox/apex-dev/201603.mbox/%3CCAHekGF_6KovS4cjYXzCLdU9En0iPaKO%2BBv%3DEJXbrCuhe9%2BtdrA%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)