[jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options

Anupam Seth (Commented) (JIRA) Tue, 07 Feb 2012 15:15:28 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202924#comment-13202924
 ]


Anupam Seth commented on PIG-2508:
----------------------------------

Tested Thomas' new patch on a 10-node cluster and ee the following:

On 0.23:
========
In local mode while setting configuration with deprecated name in script:
-------------------------------------------------------------------------
Fails with Kerberos exception as follows

{code}
2012-02-07 22:00:08,696 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /homes/<user>/pig_1328652008690.log
2012-02-07 22:00:09,010 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.used.genericoptionsparser is deprecated. Instead, use 
mapreduce.client.genericoptionsparser.used
2012-02-07 22:00:09,011 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:00:09,011 [main] WARN  org.apache.hadoop.conf.Configuration - 
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:00:09,011 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:00:09,011 [main] WARN  org.apache.hadoop.conf.Configuration - 
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:00:09,011 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2012-02-07 22:00:09,404 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.output.compress is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress
2012-02-07 22:00:09,405 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.output.compression.codec is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress.codec
2012-02-07 22:00:10,386 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig features used in the script: UNKNOWN
2012-02-07 22:00:10,540 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.textoutputformat.separator is deprecated. Instead, use 
mapreduce.output.textoutputformat.separator
2012-02-07 22:00:10,553 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
6000: 
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed 
for: 'file:///homes/ghimport/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer
Details at logfile: /homes/<user>/pig_1328652008690.log
{code}

Contents of pig log file
{code}
Pig Stack Trace
---------------
ERROR 6000:
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed 
for: 'file:///homes/<user>/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer

org.apache.pig.impl.plan.VisitorException: ERROR 6000:
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed 
for: 'file:///homes/<user>/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer
        at 
org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:95)
        at 
org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
        at 
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
        at 
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at 
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
        at 
org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
        at 
org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
        at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
        at org.apache.pig.PigServer.compilePp(PigServer.java:1360)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1297)
        at org.apache.pig.PigServer.execute(PigServer.java:1289)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
        at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:561)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
Caused by: java.io.IOException: Can't get Master Kerberos principal for use as 
renewer
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:104)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:87)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
        at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
        at 
org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)
        ... 23 more
================================================================================
{code}

In cluster mode while setting configuration with deprecated name in script:
---------------------------------------------------------------------------
Passes

In local mode while setting configuration with new name in script:
------------------------------------------------------------------
Same issue as with local mode above

In cluster mode while setting configuration with new name in script:
--------------------------------------------------------------------
Fails as below
{code}
2012-02-07 22:08:27,164 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /homes/<user>/pig_1328652507159.log
2012-02-07 22:08:27,663 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.used.genericoptionsparser is deprecated. Instead, use 
mapreduce.client.genericoptionsparser.used
2012-02-07 22:08:27,665 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:08:27,665 [main] WARN  org.apache.hadoop.conf.Configuration - 
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:08:27,665 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:08:27,665 [main] WARN  org.apache.hadoop.conf.Configuration - 
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:08:27,665 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://<host>
2012-02-07 22:08:32,052 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig features used in the script: UNKNOWN
2012-02-07 22:08:32,349 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.textoutputformat.separator is deprecated. Instead, use 
mapreduce.output.textoutputformat.separator
2012-02-07 22:08:32,360 [main] INFO  org.apache.hadoop.hdfs.DFSClient - Created 
HDFS_DELEGATION_TOKEN token 28 for <user> on <host>
2012-02-07 22:08:32,360 [main] INFO  
org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://<host>
2012-02-07 22:08:32,787 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File 
concatenation threshold: 100 optimistic? false
2012-02-07 22:08:32,964 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 1
2012-02-07 22:08:32,964 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
2012-02-07 22:08:33,919 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig script settings are added to the job
2012-02-07 22:08:33,963 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use 
mapreduce.reduce.markreset.buffer.percent
2012-02-07 22:08:33,963 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-02-07 22:08:33,963 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use 
mapreduce.reduce.markreset.buffer.percent
2012-02-07 22:08:33,963 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.output.compress is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress
2012-02-07 22:08:33,964 [main] WARN  org.apache.hadoop.conf.Configuration - 
mapred.output.compression.codec is deprecated. Instead, use 
mapreduce.output.fileoutputformat.compress.codec
2012-02-07 22:08:33,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
0: 'mapred.output.compress' is set but no value is specified for 
'mapred.output.compression.codec'.
Details at logfile: /homes/<user>/pig_1328652507159.log
{code}

Contents of log file:
{code}
Pig Stack Trace
---------------
ERROR 0: 'mapred.output.compress' is set but no value is specified for 
'mapred.output.compression.codec'.

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
 ERROR 0: 'mapred.output.compress' is set but no value is specified for 
'mapred.output.compression.codec'.
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:365)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:150)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
        at org.apache.pig.PigServer.execute(PigServer.java:1289)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
        at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191)
        at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:561)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
================================================================================
{code}

In local mode while setting configuration with deprecated name from cmd line:
-----------------------------------------------------------------------------
Same issue as with local mode above

In cluster mode while setting configuration with deprecated name from cmd line:
-------------------------------------------------------------------------------
Passes

In local mode while setting configuration with new name from cmd line:
----------------------------------------------------------------------
Same issue as with local mode above

In cluster mode while setting configuration with new name from cmd line:
------------------------------------------------------------------------
Passes


On 0.20.2xx:
============
Cannot get it to work at all (tried removing my ivy2 directory, doing ant 
clean, and then re-compiling the tarball for 0.20 - still, it smells like I 
have 0.23 libs being referenced somewhere!)

{code}
2012-02-07 23:08:46,237 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /homes/ghimport/pig_1328656126229.log
2012-02-07 23:08:46,716 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2012-02-07 23:08:47,140 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /homes/<user>/pig_1328656126229.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize 
class org.apache.pig.tools.pigstats.PigStatsUtil
        at org.apache.pig.Main.run(Main.java:593)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{code}

Contents of pig log file:
{code}Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. 
org/apache/hadoop/mapreduce/task/JobContextImpl

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
        at 
org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:54)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
        at org.apache.pig.Main.run(Main.java:561)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.mapreduce.task.JobContextImpl
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        ... 9 more
================================================================================
{code}


                
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
>                 Key: PIG-2508
>                 URL: https://issues.apache.org/jira/browse/PIG-2508
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2, 0.10
>            Reporter: Anupam Seth
>            Assignee: Thomas Weise
>            Priority: Blocker
>             Fix For: 0.10, 0.9.3
>
>         Attachments: PIG-2508.3.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably 
> ignore them and override them with values provided in the defaults due to a 
> "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed 
> as HADOOP-7993 so as for it to fall in the right component bucket of the code 
> being fixed. This JIRA fixed the bug on the Hadoop side of the code that 
> caused older deprecated config options to be ignored when they were also 
> specified in the defaults xml file with the newer config name or vice versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was 
> filed to address the issue. 
> A careful step-by-step execution of the code in a debugger reveals an second 
> overlapping bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in 
> HExecutionEngine.java#recomputeProperties currently mashes together the 
> default Hadoop configs and the user-specified properties into a Properties 
> object. Given that it uses a HashTable to store the properties, if we have a 
> config called "old.config.name" which is now deprecated and replaced by 
> "new.config.name" and if one type is specified in the defaults and another by 
> the user, we get a strange condition in which the repopulated Properties 
> object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the 
> ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and 
> tries to resolve all old configs. Because the ordering is not guaranteed (and 
> because in the case of compress, the hash function consistently gives the new 
> config loaded from the defaults after the old one), the user-specified config 
> is ignored in favor of the default config (which from the point of view of 
> the Hadoop Configuration object is expected standard behavior to replace an 
> earlier specification of a config value with a later one).
> The fix for this is probably straightforward, but will require a re-write of 
> the a chunk of code in HExecutionEngine.java. Instead of mashing together a 
> JobConf object and a Properties object into a Configuration object that is 
> finally re-converted into a JobConf object, the code simply needs to 
> consistently and correctly populate a JobConf / Configuration object that can 
> handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to 
> honor only mapreduce.job.queuename parameter for specifying queue name and 
> ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking 
> this as a blocker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2508) PIG can unpredictably ignore deprecated Hadoop config options

Reply via email to