[
https://issues.apache.org/jira/browse/DATAFU-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225796#comment-14225796
]
Matthew Hayes commented on DATAFU-79:
-------------------------------------
Yea that was an issue. I recommend using something like a pipe as the
separator and then you can use the following:
{code}
data = LOAD 'input' using PigStorage('|') AS (B1: bag {T:tuple(a:INT,b:INT)},
B2: bag {U:tuple(c:INT,d:INT)});
{code}
There are some other issues with the test data. {{zipBagsTest}} for example
should assert on this instead:
{code}
assertOutput(test, "zipped", "({(1,2,7,8),(3,4,9,10),(5,6,11,12)})");
{code}
I recommend breaking out the zip tests into a separate {{ZipBagTests}} so it is
quicker to run them. This will make it faster to fix the issues. We've
allowed BagTests to grow far too large and should start splitting it up.
> Add ZipBags UDF
> ---------------
>
> Key: DATAFU-79
> URL: https://issues.apache.org/jira/browse/DATAFU-79
> Project: DataFu
> Issue Type: Improvement
> Reporter: Aaron
> Priority: Minor
> Attachments: DATAFU-79.patch
>
>
> In pig specifically it can be much easier to work with 1 bag of tuples rather
> than a bunch of separate same length bags. I think a UDF like ZipBags is
> extremely useful in situations in which you are left with separate bags of
> the same length as it can be difficult or impossible in pig to express
> certain operations on multiple bags.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)