[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Yogi Devendra (JIRA) Mon, 07 Mar 2016 03:56:11 -0800

    [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182919#comment-15182919
 ]


Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------

[Chandni]

Hi Yogi,

Here is an example I wrote.
https://github.com/tweise/apex-samples/pulls

In the above example, the file is finalized when there are no more tuples
received in the window.

Finalization of file happens when the file is rotated (based on size/time).
However for example or demo purpose, we can finalize a file
if there aren't any input tuples received in a window. If there are more
tuples after some time, they need to be written to a different file.
Maybe this can be controlled by a property?

Let me know if you want me to put this in Malhar.

Thanks,
Chandni

> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the 
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. 
> But, someone who is new to apex; would look for ready-made implementation 
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. 
> Aim of this operator would be to serve the purpose of ready to use operator 
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline 
> character) between tuples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Reply via email to