Matthew Hayes created DATAFU-41: ----------------------------------- Summary: BagGroup does not name bag field in some cases Key: DATAFU-41 URL: https://issues.apache.org/jira/browse/DATAFU-41 Project: DataFu Issue Type: Bug Reporter: Matthew Hayes
For this test: {code} /** define BagSum datafu.pig.bags.BagSum(); define BagGroup datafu.pig.bags.BagGroup(); data = LOAD 'input' USING PigStorage(',') AS (id:int, key:chararray, val:int); describe data; data2 = GROUP data BY id; describe data2; data3 = FOREACH data2 GENERATE group as id, BagGroup(data,data.key) as grouped; describe data3; data4 = FOREACH data3 { summed = FOREACH grouped GENERATE group as key, SUM($1.val) as total; ordered = ORDER summed BY key; GENERATE id, ordered; } describe data4; STORE data4 INTO 'output'; */ @Multiline private String bagSumTest; @Test public void bagSumTest() throws Exception { PigTest test = createPigTestFromString(bagSumTest); writeLinesToFile("input", "1,A,1","1,B,2","2,A,3","3,A,4","1,C,5","1,C,6", "3,A,7","2,B,8","1,A,9","2,A,10"); test.runScript(); assertOutput(test, "data4", "(1,{(A,10),(B,2),(C,11)})", "(2,{(A,13),(B,8)})", "(3,{(A,11)})"); } {code} {{data3}} is described as: {code} data3: {id: int,grouped: {(group: chararray,data: {(id: int,key: chararray,val: int)})}} {code} However, if we change {{data}} to {{data.(key,val)}} then {{data3}} is described as: {code} data3: {id: int,grouped: {(group: chararray,{(key: chararray,val: int)})}} {code} Note that there is no name, so you have to reference it by {{$1}}. There is a separate issues, DATAFU-40, where even when it has the name {{data}} you can run into problems later. -- This message was sent by Atlassian JIRA (v6.2#6252)