[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Yogi Devendra (JIRA) Mon, 07 Mar 2016 04:12:19 -0800

    [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182941#comment-15182941
 ]


Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------


[Yogi]

Here is the summary of discussion till now:
Proposed operator is for concrete implementation for writing tuples to HDFS. 
All tuples will be written to same file. 
File copy operation will be handled using dedicated component for file copy. 
(Proposal for that will be over another email thread).
File rotation is handled in the following way:
Based on file size
Based on time (every X windows)
If both are specified then based on whichever happens first.
If nothing is specified then based on no new data for one application window. 
Conversions to json, csv, avro will be not be responsibility of this operator. 
Allowed inputs are byte[] or string. 
Custom separators should be allowed. Empty string should be valid separator.
Note that, this is just a first iteration implementation of this concrete 
operator. We can enhance it later in subsequent iterations.

Also, we expect that things will be more clear when we have first iteration of 
other related components ready. 

Thanks all for you valuable feedback.

> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the 
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. 
> But, someone who is new to apex; would look for ready-made implementation 
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. 
> Aim of this operator would be to serve the purpose of ready to use operator 
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline 
> character) between tuples.
> Discussion thread on mailing list here:
> http://mail-archives.apache.org/mod_mbox/apex-dev/201603.mbox/%3CCAHekGF_6KovS4cjYXzCLdU9En0iPaKO%2BBv%3DEJXbrCuhe9%2BtdrA%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Reply via email to