[ 
https://issues.apache.org/jira/browse/DATAFU-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225647#comment-14225647
 ] 

Matthew Hayes commented on DATAFU-79:
-------------------------------------

We haven't got the multiline annotation working with intellij yet.  It works 
with eclipse though.

You can run the bag tests with the following command:

{code}
./gradlew :datafu-pig:test -Dtest.single=BagTests
{code}

zipBagTests and zipUnevenBagsTest both fail with:

{code}
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. <line 313, column 48>  Syntax error, unexpected symbol at or near 'INT'
{code}

This should get rid of the error.

{code}
data = LOAD 'input' AS (B1: bag {T:tuple(a:INT,b:INT)}, B2: bag 
{U:tuple(c:INT,d:INT)});
{code}

Also note there is another issue.  You need to use {{data}} instead of 
{{input}}:

{code}
zipped = FOREACH data GENERATE ZipBags(B1,B2);
{code}

Can you add the standard Apache license header?  See other files for an example.

After making these changes though I see this error below.  I haven't figured 
out the cause yet.

{code}
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while 
executing [POUserFunc (Name: POUserFunc(datafu.pig.bags.ZipBags)[bag] - scope-5 
Operator Key: scope-5) children: null at []]: 
java.lang.IllegalArgumentException: Expected all fields to be bags
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.IllegalArgumentException: Expected all fields to be bags
        at datafu.pig.bags.ZipBags.exec(ZipBags.java:47)
        at datafu.pig.bags.ZipBags.exec(ZipBags.java:31)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDataBag(POUserFunc.java:374)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:309)
        ... 9 more
{code}

Note that the logs are being suppressed, which makes debugging harder.  To turn 
it on open PigTests and comment out this line:

{code}
Logger.getRootLogger().removeAllAppenders();
{code}

I'll open another issue to remove this line.  This isn't needed anymore because 
the logs are not being dumped on the console like before and can easily be 
viewed in the html test output.  Before it was just too much.

Also please open a review board at https://reviews.apache.org/dashboard/ and 
add the link here as it makes it easier to provide feedback.

> Add ZipBags UDF
> ---------------
>
>                 Key: DATAFU-79
>                 URL: https://issues.apache.org/jira/browse/DATAFU-79
>             Project: DataFu
>          Issue Type: Improvement
>            Reporter: Aaron
>            Priority: Minor
>         Attachments: DATAFU-79.patch
>
>
> In pig specifically it can be much easier to work with 1 bag of tuples rather 
> than a bunch of separate same length bags. I think a UDF like ZipBags is 
> extremely useful in situations in which you are left with separate bags of 
> the same length as it can be difficult or impossible in pig to express 
> certain operations on multiple bags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to