The proposed approach sounds good. If there are no objections, can you please go ahead and file a JIRA. I can take a look once you have a patch available.
On Wed, Oct 14, 2015 at 2:20 PM, Siddhi Mehta <[email protected]> wrote: > Sending to the pig developers group > > On Wed, Oct 14, 2015 at 2:17 PM, Siddhi Mehta <[email protected]> > wrote: > > > Hello Everyone, > > > > Just wanted to follow up on the my earlier post and see if there are any > > thoughts around the same. > > I was planning to take a stab to implement the same. > > > > The approach I was planning to use for the same is > > 1. Make the storer that wants error handling capability implement an > > interface(ErrorHandlingStoreFunc). > > 2. Using this interface the storer can define if the thresholds for > > error.Each store func can determine what the threshold should be.For > > example HbaseStorage can have a different threshold from ParquetStorage. > > 3. Whenever the storer gets created in > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc() > > we intercept the called and give it a wrappedStoreFunc > > 4. Every put next calls now gets delegated to the actual storer via the > > delegate and we can listen in for error on putNext() and take care of the > > allowing the error if within threshold or re throwing from there. > > 5. The client can get information about the threshold value from the > > counters to know if there was any data dropped. > > > > Thougts? > > > > Thanks, > > Siddhi > > > > > > On Mon, Oct 12, 2015 at 1:49 PM, Siddhi Mehta <[email protected]> > > wrote: > > > >> Hey Guys, > >> > >> Currently a Pig job fails when one record out of the billions records > >> fails on STORE. > >> This is not always desirable behavior when you are dealing with millions > >> of records and only few fail. > >> In certain use-cases its desirable to know how many such errors and have > >> an accounting for the same. > >> Is there a configurable limits that we can set for pig so that we can > >> allow a threshold for bad records on STORE similar to the lines of the > JIRA > >> for LOAD PIG-3059 <https://issues.apache.org/jira/browse/PIG-3059> > >> > >> Thanks, > >> Siddhi > >> > > > > >
