[ 
https://issues.apache.org/jira/browse/PIG-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921870#action_12921870
 ] 

Ashutosh Chauhan commented on PIG-1684:
---------------------------------------

Actually what I said above is not entirely correct. OutputCommitter runs as a 
separate task at the end of job. This separate task can run on any node of 
cluster so there needs to be a separate instantiation of storefunc in output 
committer then from outputformat. As mandated by mapreduce framework Pig (and 
thus user's storefunc) should not maintain state between  format and committer. 
Mridul, if I understand your use case correctly that is what you are trying to 
do. I dont see any straight forward workaround if thats the usecase. It will 
help if you can briefly explain what state you are wanting to maintain between 
different storefunc functions. 

> Inconsistent usage of store func.
> ---------------------------------
>
>                 Key: PIG-1684
>                 URL: https://issues.apache.org/jira/browse/PIG-1684
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>         Environment: A custom StoreFuncInterface used to store data at the 
> reducer.
> (Output of a group )
>            Reporter: Mridul Muralidharan
>
> Pig seems to be using multiple instances of StoreFuncInterface in the reducer 
> inconsistently.
> Some hadoop api calls are made to one instance and others made to other : 
> which makes state management very inconsistent and is requiring hacks on our 
> part to deal with it.
> The call snippet below should hopefully indicate the issue.
> The format is :
> Instance.toString()   method_call.
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputCommitter
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 setupTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getRecordWriter
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 putNext()
> ... 
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 needsTaskCommit
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 commitTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 finish()
> As is obvious, two instances are used for different purposes - one to get the 
> record writer and do the actual write, and another to call the 
> OutputCommitter and its methods.
> Since they are from different instances (StoreFuncInterface), the output 
> committer is unable to gracefully commit and cleanup.
> I am not attaching the StoreFunc, but any user defined StoreFunc will exhibit 
> this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to