[ https://issues.apache.org/jira/browse/PIG-3469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784580#comment-13784580 ]
Xuefu Zhang commented on PIG-3469: ---------------------------------- +1 Patch looks good. Will commit after running tests. > Skewed join can cause unrecoverable NullPointerException when one of its > inputs is missing. > ------------------------------------------------------------------------------------------- > > Key: PIG-3469 > URL: https://issues.apache.org/jira/browse/PIG-3469 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11 > Environment: Apache Pig version 0.11.0-cdh4.4.0 > Happens in both local execution environment (os x) and cluster environment > (linux) > Reporter: Christon DeWan > Assignee: Jarek Jarcec Cecho > Attachments: PIG-3469.patch, PIG-3469.patch, PIG-3469.patch > > > Run this script in the local execution environment (affects cluster mode too): > {noformat} > %declare DATA_EXISTS /tmp/test_data_exists.tsv > %declare DATA_MISSING /tmp/test_data_missing.tsv > %declare DUMMY `bash -c '(for (( i=0; \$i < 10; i++ )); do echo \$i; done) > > /tmp/test_data_exists.tsv; true'` > exists = LOAD '$DATA_EXISTS' AS (a:long); > missing = LOAD '$DATA_MISSING' AS (a:long); > missing = FOREACH ( GROUP missing BY a ) GENERATE $0 AS a, COUNT_STAR($1); > joined = JOIN exists BY a, missing BY a USING 'skewed'; > STORE joined INTO '/tmp/test_out.tsv'; > {noformat} > Results in NullPointerException which halts entire pig execution, including > unrelated jobs. Expected: only dependencies of the error'd LOAD statement > should fail. > Error: > {noformat} > 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2017: Internal error creating job configuration. > 2013-09-18 11:42:31,518 [main] ERROR org.apache.pig.tools.grunt.Grunt - > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: > ERROR 2017: Internal error creating job configuration. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:848) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:294) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:177) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1266) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251) > at org.apache.pig.PigServer.execute(PigServer.java:1241) > at org.apache.pig.PigServer.executeBatch(PigServer.java:335) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:137) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > Caused by: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.adjustNumReducers(JobControlCompiler.java:868) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:480) > ... 17 more > {noformat} > Script above is as small as I can make it while still reproducing the issue. > Removing the group-foreach causes the join to fail harmlessly (not stopping > pig execution), as does using the default join. Did not occur on 0.10.1. -- This message was sent by Atlassian JIRA (v6.1#6144)