[ 
https://issues.apache.org/jira/browse/PIG-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921815#action_12921815
 ] 

Ashutosh Chauhan commented on PIG-1684:
---------------------------------------

 I fixed  this multiple instantiations problem for loadfuncs in PIG-1363.  It 
needs to be fixed for storefuncs as well. I see no reason why same instance of 
storeFunc cant be shared across PigOutputCommitter and PigOutputFormat. Ideal 
solution will be to instantiate both loadfunc and storefunc exactly once on 
client side (during logical planning) and then ship it to the backend where 
this same instance is continued to be used. But that will require these 
interfaces to implement Serializable which will break backward compatibility. 
But atleast we need to make sure that they are instantiated exactly twice once 
in frontend and once in backend. As evident here, storefunc is getting 
instantiated multiple times in backend. 

Mridul, 
It will be great if you can provide a strip down version of your storeFunc, 
that will make it easier to write a unit test for it.

> Inconsistent usage of store func.
> ---------------------------------
>
>                 Key: PIG-1684
>                 URL: https://issues.apache.org/jira/browse/PIG-1684
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0
>         Environment: A custom StoreFuncInterface used to store data at the 
> reducer.
> (Output of a group )
>            Reporter: Mridul Muralidharan
>
> Pig seems to be using multiple instances of StoreFuncInterface in the reducer 
> inconsistently.
> Some hadoop api calls are made to one instance and others made to other : 
> which makes state management very inconsistent and is requiring hacks on our 
> part to deal with it.
> The call snippet below should hopefully indicate the issue.
> The format is :
> Instance.toString()   method_call.
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 getOutputCommitter
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 setupTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getOutputFormat()
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 getRecordWriter
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 init
> com.yahoo.psox.fish.pig.indexjoinst...@1429cb2 putNext()
> ... 
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 needsTaskCommit
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 commitTask
> com.yahoo.psox.fish.pig.indexjoinst...@1be4777 finish()
> As is obvious, two instances are used for different purposes - one to get the 
> record writer and do the actual write, and another to call the 
> OutputCommitter and its methods.
> Since they are from different instances (StoreFuncInterface), the output 
> committer is unable to gracefully commit and cleanup.
> I am not attaching the StoreFunc, but any user defined StoreFunc will exhibit 
> this behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to