Hi All,

  Thanks Priyanka and Yogi for your suggestions.

  @Yogi: 1st option which you suggested is not feasible because in the
later versions of Hadoop library may support append operation. I feel 2nd
is the best option.

  If there are no comments/suggestions from community, I will go through
the 2nd option which yogi is suggested.

Regards,
Chaitanya

On Fri, Aug 26, 2016 at 12:21 PM, Yogi Devendra <[email protected]>
wrote:

> I propose alternate approach to than the 3 options mentioned above:
>
> In AbstractFileOutputOperator we can introduce one flag saying
> isFileSystemAppendSupported.
> This flag should be set based on the filePath in setup or activate method.
>
> It can be done in 2 ways:
> 1. Adding if else rules based on filesystem (e.g. true for HDFS, false for
> S3 etc.)
> 2. Attempt for append to temp file and catch the exception.
>
> This flag will decide openStream behavior. Advantage here is that the flow
> is predetermined rather than based on the exception handling.
>
>
> ~ Yogi
>
> On 25 August 2016 at 11:17, Priyanka Gugale <[email protected]>
> wrote:
>
> > I would suggest, we override "openStream" in GenericFileOutputOpeator, as
> > suggested in option 2 and then handle "append" in different way for FS
> > which doesn't support append. Or else create concrete classes for all
> file
> > systems which don't support append and override the required functions.
> >
> > -1 for modifying Abstract class to take care of unsupported operations.
> >
> > -Priyanka
> >
> > On Wed, Aug 24, 2016 at 6:21 PM, Chaitanya Chebolu <
> > [email protected]> wrote:
> >
> > > Hi All,
> > >
> > >     GenericFileOutputOpeator which is in Malhar repository works only
> for
> > > few file systems. GenericFileOutputOpeator is extended from
> > > AbstractFileOutputOperator.
> > >
> > > Reason: openStream() method which is in AbstractFileOutputOperator
> calls
> > > append operation. But, all the file systems doesn't support append
> > > operation. Some of the file systems which are not supported append()
> > > operation are FTP, S3.
> > >
> > >   If the GenericFileOutputOpeator used for file systems which are not
> > > supported append() operation and operator goes down & comes back then
> > file
> > > system throws exception "Not Supported".
> > >
> > > Solution: Following method needs to be called instead of fs.append():
> > >
> > >
> > > protected FSDataOutputStream openStreamForNonAppendFS(Path filepath)
> > throws
> > > IOException    {
> > >
> > > Path appendTmpFile = new Path(filepath + “_APPENDING”);
> > >
> > > rename(filepath, appendTmpFile);
> > >
> > > FSDataInputStream fsIn = fs.open(appendTmpFile);
> > >
> > > FSDataOutputStream fsOut = fs.create(filepath);
> > >
> > > IOUtils.copy(fsIn, fsOut);
> > >
> > > flush(fsOut);
> > >
> > > fs.delete(appendTmpFile);
> > >
> > > return fsOut;
> > >
> > > }
> > >
> > >
> > > Below are the options to fix this issue.
> > >
> > > (1) Fix it in AbstractFileOutputOperator - Catch the "Not Supported"
> > > exception and then call the openStreamForNonAppendFS() method.
> > >
> > > (2) Fix it in GenericFileOutputOpeator (Same as approach (1))
> > >
> > > (3) Create a new operator which extends from AbstractFileOutputOperator
> > and
> > > override the openStream() method. This new operator could be used only
> > for
> > > file systems which are not supported append operation.
> > >
> > > Please share your thoughts and vote on above approaches.
> > >
> > > Regards,
> > > Chaitanya
> > >
> >
>

Reply via email to