[
https://issues.apache.org/jira/browse/PIG-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965357#comment-13965357
]
Taylor Finnell commented on PIG-2872:
-------------------------------------
I was working with [~wattsinabox] when we experienced the issue. Our script is
roughly as follows...
{code}
A = LOAD '...' USING CSVLoader ...;
STORE A INTO '/tmp/A-unused' USING DBStorage (org.postgresql.Driver, ...,
INSERT INTO ....);
B = FOREACH A GENERATE X, Y, CONCAT(X, Y) as Z;
STORE B INTO '/tmp/B-unused' USING DBStorage (org.postgresql.Driver, ...,
INSERT INTO ....);
{code}
Both DBStorage calls insert into different tables in the same database.
When the script is run both A, B are stored into their /tmp/ locations.
However, the data never makes it into the database. We found two ways to get
the data to make it into the database. The first, was to add a DUMP B command
after the assignment of B. The second was to execute the script with the -M
flag.
> StoreFuncInterface.setStoreLocation get's a copy of a Configuration object
> --------------------------------------------------------------------------
>
> Key: PIG-2872
> URL: https://issues.apache.org/jira/browse/PIG-2872
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.11
> Environment: Pig trunk, Hadoop 0.20.205 with Kerberos, ElasticSearch
> trunk, Wonderdog trunk
> Reporter: Evert Lammerts
>
> When an implementation of StoreFuncInterface.setStoreLocation is called from
> JobControlCompiler.getJob, it is passed a copy of the Configuration that will
> be used for the Job that will be submitted:
> {code:title=JobControlCompiler.java}
> sFunc.setStoreLocation(st.getSFile().getFileName(), new
> org.apache.hadoop.mapreduce.Job(nwJob.getConfiguration()));
> {code}
> When a new org.apache.hadoop.mapreduce.Job is created it creates a copy of
> the Configuration object, as far as I know. Thus anything added to the
> Configuration object in the implementation of setStoreLocation will not be
> included in the Configuration of nwJob in JobControlCompiler.getJob.
> I notice this goes wrong in Wonderdog, which needs to include the
> Elasticsearch configuration file in the DistributedCache. It is added to
> mapred.cache.files through setStoreLocation, but this setting doesn't make it
> back into the Job returned by JobControlCompiler.getJob, and is therefore
> never localized.
> This might be intentional semantics within Pig, but I'm not familiar enough
> with StoreFuncs to know whether it is.
--
This message was sent by Atlassian JIRA
(v6.2#6252)