join fails for a query with input having 'load using pigstorage without schema'
+ 'foreach'
-------------------------------------------------------------------------------------------
Key: PIG-1643
URL: https://issues.apache.org/jira/browse/PIG-1643
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0
{code}
l1 = load 'std.txt';
l2 = load 'std.txt';
f1 = foreach l1 generate $0 as abc, $1 as def;
-- j = join f1 by $0, l2 by $0 using 'replicated';
-- j = join l2 by $0, f1 by $0 using 'replicated';
j = join l2 by $0, f1 by $0 ;
dump j;
{code}
the error -
{code}
2010-09-22 16:24:48,584 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
2044: The type null cannot be collected as a Key type
{code}
The MR plan from explain -
{code}
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-21
Map Plan
Union[tuple] - scope-22
|
|---j: Local Rearrange[tuple]{bytearray}(false) - scope-11
| | |
| | Project[bytearray][0] - scope-12
| |
| |---l2:
Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
- scope-0
|
|---j: Local Rearrange[tuple]{NULL}(false) - scope-13
| |
| Project[NULL][0] - scope-14
|
|---f1: New For Each(false,false)[bag] - scope-6
| |
| Project[bytearray][0] - scope-2
| |
| Project[bytearray][1] - scope-4
|
|---l1:
Load(file:///Users/tejas/pig_obyfail/trunk/std.txt:org.apache.pig.builtin.PigStorage)
- scope-1--------
Reduce Plan
j: Store(/tmp/x:org.apache.pig.builtin.PigStorage) - scope-18
|
|---POJoinPackage(true,true)[tuple] - scope-23--------
Global sort: false
----------------
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.