[
https://issues.apache.org/jira/browse/PIG-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460821#comment-13460821
]
Julien Le Dem commented on PIG-2536:
------------------------------------
Basically this is making the bag projection syntax work on relations.
If we want to do this we should look at all the places where a relation can be
used and make sure we can have a consistent syntax.
This type of shortcut is useful only if it consistent, otherwise it is
confusing.
For some it is straightforward:
B = distinct A.x; => B = distinct (foreach A generate x);
B = limit A.x 10; => B = limit (foreach A generate x) 10;
B = sample A.x 0.1; => B = sample (foreach A generate x) 0.1;
For other operators it could be tricky:
B = ORDER A.x BY x; => B = ORDER (FOREACH A GENERATE x) BY x;
B = ORDER A.(y,z) BY x; => B = FOREACH (ORDER A BY x) GENERATE y,z;
Same for group by:
B = GROUP A.(y,z) BY x; => B = FOREACH (GROUP A BY x) GENERATE group, A.(y,z);
And Filter
B = FILTER A.(y,z) BY x=0 => B = FOREACH (FILTER A BY x=0) GENERATE y,z;
For Split, Join, cogroup it becomes trickier.
> Extend pig to support DISTINCT x.(project)
> ------------------------------------------
>
> Key: PIG-2536
> URL: https://issues.apache.org/jira/browse/PIG-2536
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Jonathan Coveney
> Priority: Minor
> Fix For: 0.11
>
> Attachments: PIG-2436-0.patch
>
>
> Currently, pig does not allow this syntax:
> {code}
> A = load 'thing' (x:int, y:int, z:int);
> B = distinct A.x;
> C = distinct A.(y,z)
> D = distinct C.$0;
> {code}
> and so on. With this patch, it does. I should probably add more tests, though
> it's a simple patch... it just turns distinct rel.proj into syntactic sugar
> for distinct (foreach rel generate proj)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira