Matthew Hayes created DATAFU-41:
-----------------------------------
Summary: BagGroup does not name bag field in some cases
Key: DATAFU-41
URL: https://issues.apache.org/jira/browse/DATAFU-41
Project: DataFu
Issue Type: Bug
Reporter: Matthew Hayes
For this test:
{code}
/**
define BagSum datafu.pig.bags.BagSum();
define BagGroup datafu.pig.bags.BagGroup();
data = LOAD 'input' USING PigStorage(',') AS (id:int, key:chararray, val:int);
describe data;
data2 = GROUP data BY id;
describe data2;
data3 = FOREACH data2 GENERATE group as id, BagGroup(data,data.key) as
grouped;
describe data3;
data4 = FOREACH data3 {
summed = FOREACH grouped GENERATE group as key, SUM($1.val) as total;
ordered = ORDER summed BY key;
GENERATE id, ordered;
}
describe data4;
STORE data4 INTO 'output';
*/
@Multiline
private String bagSumTest;
@Test
public void bagSumTest() throws Exception
{
PigTest test = createPigTestFromString(bagSumTest);
writeLinesToFile("input", "1,A,1","1,B,2","2,A,3","3,A,4","1,C,5","1,C,6",
"3,A,7","2,B,8","1,A,9","2,A,10");
test.runScript();
assertOutput(test, "data4",
"(1,{(A,10),(B,2),(C,11)})",
"(2,{(A,13),(B,8)})",
"(3,{(A,11)})");
}
{code}
{{data3}} is described as:
{code}
data3: {id: int,grouped: {(group: chararray,data: {(id: int,key: chararray,val:
int)})}}
{code}
However, if we change {{data}} to {{data.(key,val)}} then {{data3}} is
described as:
{code}
data3: {id: int,grouped: {(group: chararray,{(key: chararray,val: int)})}}
{code}
Note that there is no name, so you have to reference it by {{$1}}. There is a
separate issues, DATAFU-40, where even when it has the name {{data}} you can
run into problems later.
--
This message was sent by Atlassian JIRA
(v6.2#6252)