[jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options

Thomas Weise (Commented) (JIRA) Thu, 09 Feb 2012 11:24:22 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204753#comment-13204753
 ]


Thomas Weise commented on PIG-2508:
-----------------------------------

Zebra unit tests fail with the following error:

{code}

Unable to open iterator for alias records
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias records
        at org.apache.pig.PigServer.openIterator(PigServer.java:901)
        at 
org.apache.hadoop.zebra.pig.TestMapTableLoader.testReader(TestMapTableLoader.java:136)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias 
records
        at org.apache.pig.PigServer.storeEx(PigServer.java:1000)
        at org.apache.pig.PigServer.store(PigServer.java:963)
        at org.apache.pig.PigServer.openIterator(PigServer.java:876)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: 
Unexpected error during execution.
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1325)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        at org.apache.pig.PigServer.storeEx(PigServer.java:996)
Caused by: java.lang.IllegalStateException: Variable substitution depth too 
large: 20 ${fs.default.name}
        at 
org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:399)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:469)
        at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:131)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:242)
        at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:225)
        at 
org.apache.hadoop.mapred.LocalJobRunner.<init>(LocalJobRunner.java:418)
        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:472)
        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:125)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)


{code}
                
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
>                 Key: PIG-2508
>                 URL: https://issues.apache.org/jira/browse/PIG-2508
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Anupam Seth
>            Assignee: Thomas Weise
>            Priority: Blocker
>             Fix For: 0.10, 0.9.3
>
>         Attachments: PIG-2508.3.patch, PIG-2508.4.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably 
> ignore them and override them with values provided in the defaults due to a 
> "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed 
> as HADOOP-7993 so as for it to fall in the right component bucket of the code 
> being fixed. This JIRA fixed the bug on the Hadoop side of the code that 
> caused older deprecated config options to be ignored when they were also 
> specified in the defaults xml file with the newer config name or vice versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was 
> filed to address the issue. 
> A careful step-by-step execution of the code in a debugger reveals an second 
> overlapping bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in 
> HExecutionEngine.java#recomputeProperties currently mashes together the 
> default Hadoop configs and the user-specified properties into a Properties 
> object. Given that it uses a HashTable to store the properties, if we have a 
> config called "old.config.name" which is now deprecated and replaced by 
> "new.config.name" and if one type is specified in the defaults and another by 
> the user, we get a strange condition in which the repopulated Properties 
> object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the 
> ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and 
> tries to resolve all old configs. Because the ordering is not guaranteed (and 
> because in the case of compress, the hash function consistently gives the new 
> config loaded from the defaults after the old one), the user-specified config 
> is ignored in favor of the default config (which from the point of view of 
> the Hadoop Configuration object is expected standard behavior to replace an 
> earlier specification of a config value with a later one).
> The fix for this is probably straightforward, but will require a re-write of 
> the a chunk of code in HExecutionEngine.java. Instead of mashing together a 
> JobConf object and a Properties object into a Configuration object that is 
> finally re-converted into a JobConf object, the code simply needs to 
> consistently and correctly populate a JobConf / Configuration object that can 
> handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to 
> honor only mapreduce.job.queuename parameter for specifying queue name and 
> ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking 
> this as a blocker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options

Reply via email to