[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Yogi Devendra (JIRA) Mon, 07 Mar 2016 03:55:03 -0800

    [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182918#comment-15182918
 ]


Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------

[Yogi]

Hi,

Currently, for writing to HDFS file we have AbstractFileOutputOperator in the 
malhar library.

It has following abstract methods :
1. protected abstract String getFileName(INPUT tuple)
2. protected abstract byte[] getBytesForTuple(INPUT tuple)

These methods are kept generic to give flexibility to the app developers. But, 
someone who is new to apex; would look for ready-made implementation instead of 
extending Abstract implementation.

Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim 
of this operator would be to serve the purpose of ready to use operator for 
most frequent use-cases.

Here are my key observations on most frequent use-cases:
------------------------------------------------------------------------------

1. Writing tuples of type byte[] or String. 
2. All tuples on a particular stream land up in the same output file.
3. App developer may want to add some custom tuple separator (e.g. newline 
character) between tuples.

Please mention your comments regarding :
--------------------------------------------------------

1. Will it be useful to have such concrete operator?

2. Do you think of any other datatype other than byte[], String that should be 
supported out of the box by this concrete operator? 
Currently, I am planning to include byte[], String, any other type having valid 
toString() as input tuples.

3. Do you think tuple separator should be configurable?

4. Any other feedback?


Proposed design:
----------------------

1. This concrete implementation will be extending AbstractFileOutputOperator 
with default implementation for abstract methods mentioned above. 

2. Filename , Tuple separator will be exposed as a operator property.

3. All incoming tuples will be written to same file mentioned in the property. 

4. This operator will be added to malhar library under package 
com.datatorrent.lib.io.fs where AbstractFileOutputOperator resides. 


> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the 
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. 
> But, someone who is new to apex; would look for ready-made implementation 
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. 
> Aim of this operator would be to serve the purpose of ready to use operator 
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline 
> character) between tuples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file

Reply via email to