[ https://issues.apache.org/jira/browse/PIG-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921870#action_12921870 ]
Ashutosh Chauhan commented on PIG-1684: --------------------------------------- Actually what I said above is not entirely correct. OutputCommitter runs as a separate task at the end of job. This separate task can run on any node of cluster so there needs to be a separate instantiation of storefunc in output committer then from outputformat. As mandated by mapreduce framework Pig (and thus user's storefunc) should not maintain state between format and committer. Mridul, if I understand your use case correctly that is what you are trying to do. I dont see any straight forward workaround if thats the usecase. It will help if you can briefly explain what state you are wanting to maintain between different storefunc functions. > Inconsistent usage of store func. > --------------------------------- > > Key: PIG-1684 > URL: https://issues.apache.org/jira/browse/PIG-1684 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Environment: A custom StoreFuncInterface used to store data at the > reducer. > (Output of a group ) > Reporter: Mridul Muralidharan > > Pig seems to be using multiple instances of StoreFuncInterface in the reducer > inconsistently. > Some hadoop api calls are made to one instance and others made to other : > which makes state management very inconsistent and is requiring hacks on our > part to deal with it. > The call snippet below should hopefully indicate the issue. > The format is : > Instance.toString() method_call. > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputFormat() > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputCommitter > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 setupTask > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 init > com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getOutputFormat() > com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getRecordWriter > com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 init > com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 putNext() > ... > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 needsTaskCommit > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 commitTask > com.yahoo.psox.fish.pig.indexjoinst...@1be4777 finish() > As is obvious, two instances are used for different purposes - one to get the > record writer and do the actual write, and another to call the > OutputCommitter and its methods. > Since they are from different instances (StoreFuncInterface), the output > committer is unable to gracefully commit and cleanup. > I am not attaching the StoreFunc, but any user defined StoreFunc will exhibit > this behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.