Hi,

Recently there was FSFileSplitter added to Malhar library.
I have created https://issues.apache.org/jira/browse/APEXMALHAR-2081 to
remove this operator and adds its functionality to the FileSplitterInput.

The reason to do so is because this extension just adds 3 trivial features
which makes it difficult for the user to know which operator to use. It
adds more classes which essentially do the same thing.

This operator add 3 properties to FileSplitterInput.

1. ignoreFilePatternRegularExp: regular expression that specifies which
files to ignore.
This is useful to have in the FileSplitterInput.

2. unsupportedChar: first of all this is a String. File having this String
will be ignored.
IMO this is redundant. #1 can be used to accomplish this.
I think this should be removed.

3. sequentialFileReader: when this property is set, the block metadata of
the same files have the same hashcode. This I think may have been done so
that all the block metadata of a particular file go to the same block
reader.

IMO this is a  hacky way of accomplishing this. If an application needs
this then this should have been done using a StreamCodec.

I think this should be removed.

Thanks,
Chandni

Reply via email to