[ https://issues.apache.org/jira/browse/PIG-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965357#comment-13965357 ]
Taylor Finnell commented on PIG-2872: ------------------------------------- I was working with [~wattsinabox] when we experienced the issue. Our script is roughly as follows... {code} A = LOAD '...' USING CSVLoader ...; STORE A INTO '/tmp/A-unused' USING DBStorage (org.postgresql.Driver, ..., INSERT INTO ....); B = FOREACH A GENERATE X, Y, CONCAT(X, Y) as Z; STORE B INTO '/tmp/B-unused' USING DBStorage (org.postgresql.Driver, ..., INSERT INTO ....); {code} Both DBStorage calls insert into different tables in the same database. When the script is run both A, B are stored into their /tmp/ locations. However, the data never makes it into the database. We found two ways to get the data to make it into the database. The first, was to add a DUMP B command after the assignment of B. The second was to execute the script with the -M flag. > StoreFuncInterface.setStoreLocation get's a copy of a Configuration object > -------------------------------------------------------------------------- > > Key: PIG-2872 > URL: https://issues.apache.org/jira/browse/PIG-2872 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.11 > Environment: Pig trunk, Hadoop 0.20.205 with Kerberos, ElasticSearch > trunk, Wonderdog trunk > Reporter: Evert Lammerts > > When an implementation of StoreFuncInterface.setStoreLocation is called from > JobControlCompiler.getJob, it is passed a copy of the Configuration that will > be used for the Job that will be submitted: > {code:title=JobControlCompiler.java} > sFunc.setStoreLocation(st.getSFile().getFileName(), new > org.apache.hadoop.mapreduce.Job(nwJob.getConfiguration())); > {code} > When a new org.apache.hadoop.mapreduce.Job is created it creates a copy of > the Configuration object, as far as I know. Thus anything added to the > Configuration object in the implementation of setStoreLocation will not be > included in the Configuration of nwJob in JobControlCompiler.getJob. > I notice this goes wrong in Wonderdog, which needs to include the > Elasticsearch configuration file in the DistributedCache. It is added to > mapred.cache.files through setStoreLocation, but this setting doesn't make it > back into the Job returned by JobControlCompiler.getJob, and is therefore > never localized. > This might be intentional semantics within Pig, but I'm not familiar enough > with StoreFuncs to know whether it is. -- This message was sent by Atlassian JIRA (v6.2#6252)