[
https://issues.apache.org/jira/browse/PIG-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202924#comment-13202924
]
Anupam Seth commented on PIG-2508:
----------------------------------
Tested Thomas' new patch on a 10-node cluster and ee the following:
On 0.23:
========
In local mode while setting configuration with deprecated name in script:
-------------------------------------------------------------------------
Fails with Kerberos exception as follows
{code}
2012-02-07 22:00:08,696 [main] INFO org.apache.pig.Main - Logging error
messages to: /homes/<user>/pig_1328652008690.log
2012-02-07 22:00:09,010 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration -
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:00:09,011 [main] WARN org.apache.hadoop.conf.Configuration -
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:00:09,011 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
2012-02-07 22:00:09,404 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress
2012-02-07 22:00:09,405 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.output.compression.codec is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress.codec
2012-02-07 22:00:10,386 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: UNKNOWN
2012-02-07 22:00:10,540 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.textoutputformat.separator is deprecated. Instead, use
mapreduce.output.textoutputformat.separator
2012-02-07 22:00:10,553 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
6000:
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed
for: 'file:///homes/ghimport/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer
Details at logfile: /homes/<user>/pig_1328652008690.log
{code}
Contents of pig log file
{code}
Pig Stack Trace
---------------
ERROR 6000:
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed
for: 'file:///homes/<user>/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer
org.apache.pig.impl.plan.VisitorException: ERROR 6000:
<file script2-hadoop.pig, line 7, column 0> Output Location Validation Failed
for: 'file:///homes/<user>/script2-hadoop-results More info to follow:
Can't get Master Kerberos principal for use as renewer
at
org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:95)
at
org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
at
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at
org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at
org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
at org.apache.pig.PigServer.compilePp(PigServer.java:1360)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1297)
at org.apache.pig.PigServer.execute(PigServer.java:1289)
at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:561)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
Caused by: java.io.IOException: Can't get Master Kerberos principal for use as
renewer
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:104)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:87)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
at
org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:80)
... 23 more
================================================================================
{code}
In cluster mode while setting configuration with deprecated name in script:
---------------------------------------------------------------------------
Passes
In local mode while setting configuration with new name in script:
------------------------------------------------------------------
Same issue as with local mode above
In cluster mode while setting configuration with new name in script:
--------------------------------------------------------------------
Fails as below
{code}
2012-02-07 22:08:27,164 [main] INFO org.apache.pig.Main - Logging error
messages to: /homes/<user>/pig_1328652507159.log
2012-02-07 22:08:27,663 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration -
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2012-02-07 22:08:27,665 [main] WARN org.apache.hadoop.conf.Configuration -
fs.default.name is deprecated. Instead, use fs.defaultFS
2012-02-07 22:08:27,665 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: hdfs://<host>
2012-02-07 22:08:32,052 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: UNKNOWN
2012-02-07 22:08:32,349 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.textoutputformat.separator is deprecated. Instead, use
mapreduce.output.textoutputformat.separator
2012-02-07 22:08:32,360 [main] INFO org.apache.hadoop.hdfs.DFSClient - Created
HDFS_DELEGATION_TOKEN token 28 for <user> on <host>
2012-02-07 22:08:32,360 [main] INFO
org.apache.hadoop.mapreduce.security.TokenCache - Got dt for hdfs://<host>
2012-02-07 22:08:32,787 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File
concatenation threshold: 100 optimistic? false
2012-02-07 22:08:32,964 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-02-07 22:08:32,964 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-02-07 22:08:33,919 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig script settings are added to the job
2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent
2012-02-07 22:08:33,963 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent
2012-02-07 22:08:33,963 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress
2012-02-07 22:08:33,964 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.output.compression.codec is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress.codec
2012-02-07 22:08:33,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
0: 'mapred.output.compress' is set but no value is specified for
'mapred.output.compression.codec'.
Details at logfile: /homes/<user>/pig_1328652507159.log
{code}
Contents of log file:
{code}
Pig Stack Trace
---------------
ERROR 0: 'mapred.output.compress' is set but no value is specified for
'mapred.output.compression.codec'.
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 0: 'mapred.output.compress' is set but no value is specified for
'mapred.output.compression.codec'.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:365)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:150)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1314)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1299)
at org.apache.pig.PigServer.execute(PigServer.java:1289)
at org.apache.pig.PigServer.executeBatch(PigServer.java:360)
at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:130)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:191)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:561)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
================================================================================
{code}
In local mode while setting configuration with deprecated name from cmd line:
-----------------------------------------------------------------------------
Same issue as with local mode above
In cluster mode while setting configuration with deprecated name from cmd line:
-------------------------------------------------------------------------------
Passes
In local mode while setting configuration with new name from cmd line:
----------------------------------------------------------------------
Same issue as with local mode above
In cluster mode while setting configuration with new name from cmd line:
------------------------------------------------------------------------
Passes
On 0.20.2xx:
============
Cannot get it to work at all (tried removing my ivy2 directory, doing ant
clean, and then re-compiling the tarball for 0.20 - still, it smells like I
have 0.23 libs being referenced somewhere!)
{code}
2012-02-07 23:08:46,237 [main] INFO org.apache.pig.Main - Logging error
messages to: /homes/ghimport/pig_1328656126229.log
2012-02-07 23:08:46,716 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
2012-02-07 23:08:47,140 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl
Details at logfile: /homes/<user>/pig_1328656126229.log
Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize
class org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.Main.run(Main.java:593)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{code}
Contents of pig log file:
{code}Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error.
org/apache/hadoop/mapreduce/task/JobContextImpl
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at
org.apache.pig.tools.pigstats.PigStatsUtil.<clinit>(PigStatsUtil.java:54)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
at org.apache.pig.Main.run(Main.java:561)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 9 more
================================================================================
{code}
> PIG can unpredictably ignore deprecated Hadoop config options
> -------------------------------------------------------------
>
> Key: PIG-2508
> URL: https://issues.apache.org/jira/browse/PIG-2508
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.10
> Reporter: Anupam Seth
> Assignee: Thomas Weise
> Priority: Blocker
> Fix For: 0.10, 0.9.3
>
> Attachments: PIG-2508.3.patch, PIG-2508.patch
>
>
> When deprecated config options are passed to a Pig job, it can unpredictably
> ignore them and override them with values provided in the defaults due to a
> "race condition"-like issue.
> This problem was first noticed as part of MAPREDUCE-3665, which was re-filed
> as HADOOP-7993 so as for it to fall in the right component bucket of the code
> being fixed. This JIRA fixed the bug on the Hadoop side of the code that
> caused older deprecated config options to be ignored when they were also
> specified in the defaults xml file with the newer config name or vice versa.
> However, the problem seemed to persist with Pig jobs and HADOOP-8021 was
> filed to address the issue.
> A careful step-by-step execution of the code in a debugger reveals an second
> overlapping bug because of the way PIG is dealing with the configs.
> Not sure how / why this was not seen earlier, but the code in
> HExecutionEngine.java#recomputeProperties currently mashes together the
> default Hadoop configs and the user-specified properties into a Properties
> object. Given that it uses a HashTable to store the properties, if we have a
> config called "old.config.name" which is now deprecated and replaced by
> "new.config.name" and if one type is specified in the defaults and another by
> the user, we get a strange condition in which the repopulated Properties
> object has [in an unpredictable ordering] the following:
> {code}
> config1.name=config1.value
> config2.name=config2.value
> ...
> old.config.name=old.config.value
> ...
> new.config.name=new.config.value
> ...
> configx.name=configx.value
> {code}
> When this Properties object gets converted into a Configuration object by the
> ConfigurationUtil#toConfiguration() routine, the deprecation kicks in and
> tries to resolve all old configs. Because the ordering is not guaranteed (and
> because in the case of compress, the hash function consistently gives the new
> config loaded from the defaults after the old one), the user-specified config
> is ignored in favor of the default config (which from the point of view of
> the Hadoop Configuration object is expected standard behavior to replace an
> earlier specification of a config value with a later one).
> The fix for this is probably straightforward, but will require a re-write of
> the a chunk of code in HExecutionEngine.java. Instead of mashing together a
> JobConf object and a Properties object into a Configuration object that is
> finally re-converted into a JobConf object, the code simply needs to
> consistently and correctly populate a JobConf / Configuration object that can
> handle deprecation instead of a "dumb" Java Properties object.
> We recently saw another potential occurrence of this bug where Pig seems to
> honor only mapreduce.job.queuename parameter for specifying queue name and
> ignores the parameter mapred.job.queue.name.
> Since this can break a lot of existing jobs that run fine on 0.20, marking
> this as a blocker.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira