[
https://issues.apache.org/jira/browse/PIG-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921870#action_12921870
]
Ashutosh Chauhan commented on PIG-1684:
---------------------------------------
Actually what I said above is not entirely correct. OutputCommitter runs as a
separate task at the end of job. This separate task can run on any node of
cluster so there needs to be a separate instantiation of storefunc in output
committer then from outputformat. As mandated by mapreduce framework Pig (and
thus user's storefunc) should not maintain state between format and committer.
Mridul, if I understand your use case correctly that is what you are trying to
do. I dont see any straight forward workaround if thats the usecase. It will
help if you can briefly explain what state you are wanting to maintain between
different storefunc functions.
> Inconsistent usage of store func.
> ---------------------------------
>
> Key: PIG-1684
> URL: https://issues.apache.org/jira/browse/PIG-1684
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.7.0
> Environment: A custom StoreFuncInterface used to store data at the
> reducer.
> (Output of a group )
> Reporter: Mridul Muralidharan
>
> Pig seems to be using multiple instances of StoreFuncInterface in the reducer
> inconsistently.
> Some hadoop api calls are made to one instance and others made to other :
> which makes state management very inconsistent and is requiring hacks on our
> part to deal with it.
> The call snippet below should hopefully indicate the issue.
> The format is :
> Instance.toString() method_call.
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputCommitter
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 setupTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getRecordWriter
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 putNext()
> ...
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 needsTaskCommit
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 commitTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 finish()
> As is obvious, two instances are used for different purposes - one to get the
> record writer and do the actual write, and another to call the
> OutputCommitter and its methods.
> Since they are from different instances (StoreFuncInterface), the output
> committer is unable to gracefully commit and cleanup.
> I am not attaching the StoreFunc, but any user defined StoreFunc will exhibit
> this behavior.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.