[
https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182932#comment-15182932
]
Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------
[Ashwin]
Yogi,
Replies inline.
On Mar 5, 2016 7:50 PM, "Yogi Devendra" <[email protected]>
wrote:
>
> Ashwin,
>
> Please see my replies inline:
>
> On 5 March 2016 at 22:42, Ashwin Chandra Putta <[email protected]>
> wrote:
>
> > I think the concrete implementation should contain the following to
allow
> > for the most common use cases.
> >
> > 1. Take any java object as input and get the bytes of the string
returned
> > from toString method on the object.
> >
>
> Yes. It would allow any java object and byte[] will be derived from the
> toString(). If input is byte[]; then it would be passed on without any
> conversion.
>
>
> > 2. The separator should be configurable. Null separator should also be
> > valid.
> >
>
> Implementation will allow any String separator. Default would be newline.
> Even empty string will be supported.
> Are you referring to no-separator case by Null separator? How about using
> empty string for no-separator instead of Null to avoid any special
handling?
>
By null separator, I meant no separator. Basically, null value or empty
string value for the separator variable. However, we don't have to worry
about nulls if we make the variable @NotNull.
>
> > 3. Should have one time configurable file path and name.
> >
>
> Yes. Filepath and name will be configurable as a property.
>
>
>
> > 4. Should have configurable time based and size based rotation policy.
> >
>
> Do you mean rotate based on whichever happens first?
>
If both are specified, then whichever happens first. If any one if
specified, it should be honored.
> Size based rotation policy will be inherited from
> AbstractFileOutputOperator.
>
> For time based rotation, are you referring to write one file for X
windows?
> OR rotate if there is no new data for X windows?
>
I am referring to first scenario. Rotate once every few time units. Eg:
once every 3 minutes.
The second scenario is good to have, I think Chandni's finalization logic
seems to solve this scenario.
> In either case, can we say that set appropriate value X for
> APPLICATION_WINDOW_COUNT for this operator?
> OR should we expose another property rotationWindowCount for this?
>
Number of Windows as a unit is fine but would prefer time specific units.
>
> >
> > Regards,
> > Ashwin.
>
>
>
> ~ Yogi
> concrete operator for writing to HDFS file
> ------------------------------------------
>
> Key: APEXMALHAR-2009
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
> Project: Apache Apex Malhar
> Issue Type: Task
> Reporter: Yogi Devendra
> Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the
> malhar library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers.
> But, someone who is new to apex; would look for ready-made implementation
> instead of extending Abstract implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar.
> Aim of this operator would be to serve the purpose of ready to use operator
> for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String.
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline
> character) between tuples.
> Discussion thread on mailing list here:
> http://mail-archives.apache.org/mod_mbox/apex-dev/201603.mbox/%3CCAHekGF_6KovS4cjYXzCLdU9En0iPaKO%2BBv%3DEJXbrCuhe9%2BtdrA%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)