Sam created DATAFU-38: ------------------------- Summary: BagGroup merges rows Key: DATAFU-38 URL: https://issues.apache.org/jira/browse/DATAFU-38 Project: DataFu Issue Type: Bug Reporter: Sam
load {code} 1,a,A,1 1,b,A,2 1,a,B,3 2,c,C,4 2,b,B,5 2,b,C,6 {code} using {{tmp_datafu = load 'test' using PigStorage(',') as (id:chararray, domain:chararray, keyword:chararray, weight:int);}} and do {code} tmp_roll = foreach (group tmp_datafu by id) generate group as id, CountEach(tmp_datafu.domain) as domains, BagGroup(tmp_datafu.(keyword,weight),tmp_datafu.keyword) as keywords; {code} the result is {code} (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})}) (2,{(c,1),(b,2)},{(B,{(B,3),(B,5)}),(A,{(A,1),(A,2)}),(C,{(C,4),(C,6)})}) {code} instead of {code} (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})}) (2,{(c,1),(b,2)},{(B,{(B,5)}),(C,{(C,4),(C,6)})}) {code} see also http://stackoverflow.com/questions/22945236/how-do-i-accumulate-vectors-into-a-map -- This message was sent by Atlassian JIRA (v6.2#6252)