Chandni,

I agree with your original assessment that there shouldn't be a separate
operator if the new functionality falls under the "functionality domain" of
the original operator and the features should just be added to the original
operator. Based on your description, I agree with points 1. 2. and 3.

However if you delete an operator that is useful in some use cases, what is
the substitute for that knowledge? For example look like the
HDFSFileSplitter seems to ignore some commonly present temporary files. Do
everyone have to learn this themselves and figure it out?

Thanks

On Fri, May 6, 2016 at 4:44 PM, Chandni Singh <[email protected]>
wrote:

> Just saw that there is *HDFSFileSplitter* in the library as well.
> This sets *ignoreFilePatternRegularExp *to ".*._COPYING_"  and
> *unsupportedChar* to ":",
>
> IMO this class should be removed as well.
>
> Chandni
>
> On Fri, May 6, 2016 at 4:16 PM, Chandni Singh <[email protected]>
> wrote:
>
> > Hi,
> >
> > Recently there was FSFileSplitter added to Malhar library.
> > I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081 to
> > remove this operator and adds its functionality to the FileSplitterInput.
> >
> > The reason to do so is because this extension just adds 3 trivial
> features
> > which makes it difficult for the user to know which operator to use. It
> > adds more classes which essentially do the same thing.
> >
> > This operator add 3 properties to FileSplitterInput.
> >
> > 1. ignoreFilePatternRegularExp: regular expression that specifies which
> > files to ignore.
> > This is useful to have in the FileSplitterInput.
> >
> > 2. unsupportedChar: first of all this is a String. File having this
> String
> > will be ignored.
> > IMO this is redundant. #1 can be used to accomplish this.
> > I think this should be removed.
> >
> > 3. sequentialFileReader: when this property is set, the block metadata of
> > the same files have the same hashcode. This I think may have been done so
> > that all the block metadata of a particular file go to the same block
> > reader.
> >
> > IMO this is a  hacky way of accomplishing this. If an application needs
> > this then this should have been done using a StreamCodec.
> >
> > I think this should be removed.
> >
> > Thanks,
> > Chandni
> >
>

Reply via email to