[ 
https://issues.apache.org/jira/browse/DATAFU-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225796#comment-14225796
 ] 

Matthew Hayes commented on DATAFU-79:
-------------------------------------

Yea that was an issue.  I recommend using something like a pipe as the 
separator and then you can use the following:

{code}
data = LOAD 'input' using PigStorage('|') AS (B1: bag {T:tuple(a:INT,b:INT)}, 
B2: bag {U:tuple(c:INT,d:INT)});
{code}

There are some other issues with the test data.  {{zipBagsTest}} for example 
should assert on this instead:

{code}
assertOutput(test, "zipped", "({(1,2,7,8),(3,4,9,10),(5,6,11,12)})");
{code}

I recommend breaking out the zip tests into a separate {{ZipBagTests}} so it is 
quicker to run them.  This will make it faster to fix the issues.  We've 
allowed BagTests to grow far too large and should start splitting it up.

> Add ZipBags UDF
> ---------------
>
>                 Key: DATAFU-79
>                 URL: https://issues.apache.org/jira/browse/DATAFU-79
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Aaron
>            Priority: Minor
>         Attachments: DATAFU-79.patch
>
>
> In pig specifically it can be much easier to work with 1 bag of tuples rather 
> than a bunch of separate same length bags. I think a UDF like ZipBags is 
> extremely useful in situations in which you are left with separate bags of 
> the same length as it can be difficult or impossible in pig to express 
> certain operations on multiple bags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to