[
https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204753#comment-13204753
]
Thomas Weise commented on PIG-2508:
-----------------------------------
Zebra unit tests fail with the following error:
{code}
Unable to open iterator for alias records
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open
iterator for alias records
at org.apache.pig.PigServer.openIterator(PigServer.java:901)
at
org.apache.hadoop.zebra.pig.TestMapTableLoader.testReader(TestMapTableLoader.java:136)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias
records
at org.apache.pig.PigServer.storeEx(PigServer.java:1000)
at org.apache.pig.PigServer.store(PigServer.java:963)
at org.apache.pig.PigServer.openIterator(PigServer.java:876)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043:
Unexpected error during execution.
at org.apache.pig.PigServer.launchPlan(PigServer.java:1325)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
at org.apache.pig.PigServer.storeEx(PigServer.java:996)
Caused by: java.lang.IllegalStateException: Variable substitution depth too
large: 20 ${fs.default.name}
at
org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:399)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:469)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:131)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:242)
at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:225)
at
org.apache.hadoop.mapred.LocalJobRunner.<init>(LocalJobRunner.java:418)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:472)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:125)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
{code}
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
> Key: PIG-2508
> URL: https://issues.apache.org/jira/browse/PIG-2508
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Anupam Seth
> Assignee: Thomas Weise
> Priority: Blocker
> Fix For: 0.10, 0.9.3
>
> Attachments: PIG-2508.3.patch, PIG-2508.4.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably
> ignore them and override them with values provided in the defaults due to a
> "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed
> as HADOOP-7993 so as for it to fall in the right component bucket of the code
> being fixed. This JIRA fixed the bug on the Hadoop side of the code that
> caused older deprecated config options to be ignored when they were also
> specified in the defaults xml file with the newer config name or vice versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was
> filed to address the issue.
> A careful step-by-step execution of the code in a debugger reveals an second
> overlapping bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in
> HExecutionEngine.java#recomputeProperties currently mashes together the
> default Hadoop configs and the user-specified properties into a Properties
> object. Given that it uses a HashTable to store the properties, if we have a
> config called "old.config.name" which is now deprecated and replaced by
> "new.config.name" and if one type is specified in the defaults and another by
> the user, we get a strange condition in which the repopulated Properties
> object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the
> ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and
> tries to resolve all old configs. Because the ordering is not guaranteed (and
> because in the case of compress, the hash function consistently gives the new
> config loaded from the defaults after the old one), the user-specified config
> is ignored in favor of the default config (which from the point of view of
> the Hadoop Configuration object is expected standard behavior to replace an
> earlier specification of a config value with a later one).
> The fix for this is probably straightforward, but will require a re-write of
> the a chunk of code in HExecutionEngine.java. Instead of mashing together a
> JobConf object and a Properties object into a Configuration object that is
> finally re-converted into a JobConf object, the code simply needs to
> consistently and correctly populate a JobConf / Configuration object that can
> handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to
> honor only mapreduce.job.queuename parameter for specifying queue name and
> ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking
> this as a blocker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira