[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1643: Attachment: PIG-1643.2.patch Attach a fix. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch, PIG-1643.2.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1643: Attachment: PIG-1643.3.patch PIG-1643.3.patch is more general than PIG-1643.2.patch. It solves this null schema issue for all expressions. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1643: --- Attachment: PIG-1643.4.patch PIG-1643.4.patch is PIG-1643.3.patch + test case join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch, PIG-1643.2.patch, PIG-1643.3.patch, PIG-1643.4.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1643: --- Attachment: PIG-1643.1.patch PIG-1643.1.patch There was a code path that lead to fields having NULL datatype instead of the default datatype of BYTEARRAY. That was causing these failures. Test-patch has succeeded, unit tests are running. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1643: --- Status: Patch Available (was: Open) join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1643) join fails for a query with input having 'load using pigstorage without schema' + 'foreach'
[ https://issues.apache.org/jira/browse/PIG-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1643: --- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Tests passed. Patch committed to 0.8 branch and trunk. join fails for a query with input having 'load using pigstorage without schema' + 'foreach' --- Key: PIG-1643 URL: https://issues.apache.org/jira/browse/PIG-1643 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1643.1.patch {code} l1 = load 'std.txt'; l2 = load 'std.txt'; f1 = foreach l1 generate $0 as abc, $1 as def; -- j = join f1 by $0, l2 by $0 using 'replicated'; -- j = join l2 by $0, f1 by $0 using 'replicated'; j = join l2 by $0, f1 by $0 ; dump j; {code} the error - {code} 2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2044: The type null cannot be collected as a Key type {code} The MR plan from explain - {code} #-- # Map Reduce Plan #-- MapReduce node scope-21 Map Plan Union[tuple] - scope-22 | |---j: Local Rearrange[tuple]{bytearray}(false) - scope-11 | | | | | Project[bytearray][0] - scope-12 | | | |---l2: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-0 | |---j: Local Rearrange[tuple]{NULL}(false) - scope-13 | | | Project[NULL][0] - scope-14 | |---f1: New For Each(false,false)[bag] - scope-6 | | | Project[bytearray][0] - scope-2 | | | Project[bytearray][1] - scope-4 | |---l1: Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage) - scope-1 Reduce Plan j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18 | |---POJoinPackage(true,true)[tuple] - scope-23 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.