[ https://issues.apache.org/jira/browse/DATAFU-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225796#comment-14225796 ]
Matthew Hayes commented on DATAFU-79: ------------------------------------- Yea that was an issue. I recommend using something like a pipe as the separator and then you can use the following: {code} data = LOAD 'input' using PigStorage('|') AS (B1: bag {T:tuple(a:INT,b:INT)}, B2: bag {U:tuple(c:INT,d:INT)}); {code} There are some other issues with the test data. {{zipBagsTest}} for example should assert on this instead: {code} assertOutput(test, "zipped", "({(1,2,7,8),(3,4,9,10),(5,6,11,12)})"); {code} I recommend breaking out the zip tests into a separate {{ZipBagTests}} so it is quicker to run them. This will make it faster to fix the issues. We've allowed BagTests to grow far too large and should start splitting it up. > Add ZipBags UDF > --------------- > > Key: DATAFU-79 > URL: https://issues.apache.org/jira/browse/DATAFU-79 > Project: DataFu > Issue Type: Improvement > Reporter: Aaron > Priority: Minor > Attachments: DATAFU-79.patch > > > In pig specifically it can be much easier to work with 1 bag of tuples rather > than a bunch of separate same length bags. I think a UDF like ZipBags is > extremely useful in situations in which you are left with separate bags of > the same length as it can be difficult or impossible in pig to express > certain operations on multiple bags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)