[ 
https://issues.apache.org/jira/browse/DATAFU-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983086#comment-13983086
 ] 

Matthew Hayes commented on DATAFU-42:
-------------------------------------

The current implementation mimics how GROUP works in Pig.  When you group a 
relation by a key, that key is available as {{group}} but it is also included 
in the bag as well.  I'll think about this some more but I'm not sure it makes 
sense to do this.  If one is concerned about additional space being used by 
this key it should be possible to project it out after BagGroup is invoked.

> Simplify BagGroup output
> ------------------------
>
>                 Key: DATAFU-42
>                 URL: https://issues.apache.org/jira/browse/DATAFU-42
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Sam Steingold
>
> {{BagGroup}} keeps the redundant {{group}} information in its output.
> E.g., see [DATAFU-38]:
> {code}
> (1,{(b,1),(a,2)},{(B,{(B,3)}),(A,{(A,1),(A,2)})})
> (2,{(c,1),(b,2)},{(B,{(B,3),(B,5)}),(A,{(A,1),(A,2)}),(C,{(C,4),(C,6)})})
> {code}
> can be
> {code}
> (1,{(b,1),(a,2)},{(B,{3}),(A,{1,2})})
> (2,{(c,1),(b,2)},{(B,{3,5}),(A,{1,2}),(C,{4,6})})
> {code}
> without loss of information
> Given that the bug [DATAFU-38] rendered this function quite useless and it 
> was fixed just last week, I think {{BagGroup}} has not been used before, so 
> this backward-incompatible change will not break any existing code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to