[jira] [Commented] (PIG-2872) StoreFuncInterface.setStoreLocation get's a copy of a Configuration object

Taylor Finnell (JIRA) Thu, 10 Apr 2014 06:46:06 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965357#comment-13965357
 ]


Taylor Finnell commented on PIG-2872:
-------------------------------------

I was working with [~wattsinabox] when we experienced the issue. Our script is 
roughly as follows...

{code}
A = LOAD '...' USING CSVLoader ...;
STORE A INTO '/tmp/A-unused' USING DBStorage (org.postgresql.Driver, ..., 
INSERT INTO ....);
B = FOREACH A GENERATE X, Y, CONCAT(X, Y) as Z;
STORE B INTO '/tmp/B-unused' USING DBStorage (org.postgresql.Driver, ..., 
INSERT INTO ....);
{code}

Both DBStorage calls insert into different tables in the same database.

When the script is run both A, B are stored into their /tmp/ locations. 
However, the data never makes it into the database. We found two ways to get 
the data to make it into the database. The first, was to add a DUMP B command 
after the assignment of B. The second was to execute the script with the -M 
flag.



> StoreFuncInterface.setStoreLocation get's a copy of a Configuration object
> --------------------------------------------------------------------------
>
>                 Key: PIG-2872
>                 URL: https://issues.apache.org/jira/browse/PIG-2872
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11
>         Environment: Pig trunk, Hadoop 0.20.205 with Kerberos, ElasticSearch 
> trunk, Wonderdog trunk
>            Reporter: Evert Lammerts
>
> When an implementation of StoreFuncInterface.setStoreLocation is called from 
> JobControlCompiler.getJob, it is passed a copy of the Configuration that will 
> be used for the Job that will be submitted:
> {code:title=JobControlCompiler.java}
> sFunc.setStoreLocation(st.getSFile().getFileName(), new 
> org.apache.hadoop.mapreduce.Job(nwJob.getConfiguration()));
> {code}
> When a new org.apache.hadoop.mapreduce.Job is created it creates a copy of 
> the Configuration object, as far as I know. Thus anything added to the 
> Configuration object in the implementation of setStoreLocation will not be 
> included in the Configuration of nwJob in JobControlCompiler.getJob.
> I notice this goes wrong in Wonderdog, which needs to include the 
> Elasticsearch configuration file in the DistributedCache. It is added to 
> mapred.cache.files through setStoreLocation, but this setting doesn't make it 
> back into the Job returned by JobControlCompiler.getJob, and is therefore 
> never localized.
> This might be intentional semantics within Pig, but I'm not familiar enough 
> with StoreFuncs to know whether it is.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-2872) StoreFuncInterface.setStoreLocation get's a copy of a Configuration object

Reply via email to