I've been ruminating on this for a while, and I don't have an answer yet.  So 
I'll just give the feedback I've thought of.

This further blurs the already line between a relation (which can go on the 
left side of an assignment) and a bag (which cannot).  I don't know if that's a 
good thing or not.  We have already significantly blurred that with casting of 
relations to scalars and with the way bags convert to relations and bag to bags 
with nested foreach.

More comments inlined.


On Feb 21, 2012, at 4:59 PM, Jonathan Coveney wrote:

> 
> 
> ie
> 
> b = group a.(x,y) by x;
> 
> or anything. The case of group is somewhat problematic, however, because if 
> you describe that, you'll get...
> 
> b: {group: int,1-2: {(x: int,y: int)}}

The problem shouldn't just show up in describe, but in trying to use the 
resulting bag, since pig names the bag based on the relation that was grouped.  
E.g.

b = group a.(x,y) by x;
c = foreach b generate SUM(???.y);

I have no name to give the bag in the SUM operation.  Users can get around this 
by using positional parameters.

> 
> 
> More broadly...
> - Is it worth thinking about how to make this go deeper? Currently you can do 
> b = distinct a.x, but not b = distinct a.x.$0 (if it were appropriate). There 
> are issues with this (and in fact there is an outstanding but w.r.t. b = 
> foreach (group a by $0) generate $1.$0.$0.$0.$0; <== this works!).

The problem with this is that distinct a.x and distinct a.x.$0 are really 
different things.

b = distinct a.x

would be short hand for 

b1 = foreach a generate x;
b = distinct b1;

whereas

b = distinct a.x.$0

would be short hand for

b = foreach a {
        b1 = x.$0;
        b2 = distinct b1;
        generate b2;
}

Maybe that's ok, I'm not sure.

> - Is the strategy of the syntactic sugar ok? I think in this case it should 
> be (the relation name issue notwithstanding), but could see arguments either 
> way.
> 
> Find a super small patch with no tests attached... I wanted to get some 
> thoughts before making yet another JIRA?

The Apache lists don't allow attachments.  You'll have to send it in the mail 
itself or post it somewhere and link to it in your mail.

Alan.

Reply via email to