[ 
https://issues.apache.org/jira/browse/PIG-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979308#comment-14979308
 ] 

Daniel Dai commented on PIG-4704:
---------------------------------

bq. 2. The reason for having a config was to say even though the StoreFunc 
implements ErrorHandling I dont want to allow errors for this script/job
I don't have a strong opinion, if you think that helps, I am fine leave it there

bq. I left the configs in the implementation because there another 
implementation of ErrorHandler maynot use minErrors,threshold but some other 
algorithm to allow errors
Even if it is in implementation, we shall still expose it. Believe it will be 
the default implementation in most cases.

> Customizable Error Handling for Storers in Pig 
> -----------------------------------------------
>
>                 Key: PIG-4704
>                 URL: https://issues.apache.org/jira/browse/PIG-4704
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Siddhi Mehta
>            Assignee: Siddhi Mehta
>         Attachments: PIG-4704.patch, PIG-4704.patch, PIG-4704_3.patch
>
>
> On Thu, Oct 15, 2015 at 4:06 AM, Saggi Neumann <sa...@xplenty.com> wrote:
> You may also check these for ideas. It would be good to have them
> implemented:
> https://wiki.apache.org/pig/PigErrorHandlingInScripts
> https://issues.apache.org/jira/browse/PIG-2620
> --
> Saggi Neumann
> Co-founder and CTO, Xplenty
> M: +972-544-546102
> On Thu, Oct 15, 2015 at 12:17 AM, Siddhi Mehta <smehtau...@gmail.com> wrote:
> > Hello Everyone,
> >
> > Just wanted to follow up on the my earlier post and see if there are any
> > thoughts around the same.
> > I was planning to take a stab to implement the same.
> >
> > The approach I was planning to use for the same is
> > 1. Make the storer that wants error handling capability implement an
> > interface(ErrorHandlingStoreFunc).
> > 2. Using this interface the storer can define if the thresholds for
> > error.Each store func can determine what the threshold should be.For
> > example HbaseStorage can have a different threshold from ParquetStorage.
> > 3. Whenever the storer gets created in
> >
> > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc()
> > we intercept the called and give it a wrappedStoreFunc
> > 4. Every put next calls now gets delegated to the actual storer via the
> > delegate and we can listen in for error on putNext() and take care of the
> > allowing the error  if within threshold or re throwing from there.
> > 5. The client can get information about the threshold value from  the
> > counters to know if there was any data dropped.
> >
> > Thougts?
> >
> > Thanks,
> > Siddhi
> >
> >
> > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta <smehtau...@gmail.com>
> > wrote:
> >
> > > Hey Guys,
> > >
> > > Currently a Pig job fails when one record out of the billions records
> > > fails on STORE.
> > > This is not always desirable behavior when you are dealing with millions
> > > of records and only few fail.
> > > In certain use-cases its desirable to know how many such errors and have
> > > an accounting for the same.
> > > Is there a configurable limits that we can set for pig so that we can
> > > allow a threshold for bad records on STORE similar to the lines of the
> > JIRA
> > > for LOAD PIG-3059 <https://issues.apache.org/jira/browse/PIG-3059>
> > >
> > > Thanks,
> > > Siddhi
> > >
> >



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to