[ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3010:
----------------------------------

    Attachment: PIG-3010-0.patch

Here is a patch that does this. The changes are further reaching than they 
otherwise might need to be, but this is because this is a good time to 
futureproof flatten by using an enum approach instead.

A nice side effect is that you can implement FLATTEN as a UDF (though this 
isn't necessarily desirable as it is going to add some overhead...still, the 
fact that it _can be done_ is quite powerful). That UDF is 
src/org/apache/pig/builtin/UdfFlatten.java

This let's you do a lot of really neat stuff, such as:

{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(x,y);
describe b;
{code}

which results in:
{code}
b: {x: int,y: int}
{code}

Woah! Previously, this was impossible. What happens if you dump? The result is
{code}
(1,10)
(4,11)
(5,10)
{code}

Woah!

You can even do the following:

{code}
a = load 'data2' as (x:int,y:int);
b = foreach a generate UdfFlatten(TOTUPLE(x,y));
dump b;
{code}

And it works for bags as well. The uses are obvious IMHO.
                
> Allow UDF's to flatten themselves
> ---------------------------------
>
>                 Key: PIG-3010
>                 URL: https://issues.apache.org/jira/browse/PIG-3010
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3010-0.patch
>
>
> This is something I thought would be cool for a while, so I sat down and did 
> it because I think there are some useful debugging tools it'd help with.
> The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
> you output will be flattened. This is quite powerful. A very common pattern 
> is:
> a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
> This would let you just do:
> a = foreach data generate MyUdf(thing);
> With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to