[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915037#action_12915037 ] Daniel Dai commented on PIG-1643: - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All tests pass. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, PIG-1643.4.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914126#action_12914126 ] Daniel Dai commented on PIG-1643: - +1 if tests pass. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913842#action_12913842 ] Thejas M Nair commented on PIG-1643: In case of replicated join, the error was - java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.setUpHashMap(POFRJoin.java:343) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.getNext(POFRJoin.java:212) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.