[ 
https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235810#comment-13235810
 ] 

Scott Carey commented on PIG-2511:
----------------------------------

This one is annoying me again today.

I have a relation inbound with ~20 fields.  One of them is a bag of about 100 
tuples.  All I want to do is flatten it and project out two tuples.

B = FOREACH A GENERATE *, FLATTEN(x.(foo, bar)) as flatx;

Ok, now I have a problem:

The bag of 100 is still in the relation, copied 100 times.  To get rid of it I 
need to list every field one by one instead of use *.  No, PIG-1693 is not 
useful.  The field order is subject to change.   This chunk needs to be 
resilient to changes in the inbound aliases that do not change the semantic 
meaning of fields.

Then the next step is to project out the foo and bar from flatx, which will 
require listing the 20 fields AGAIN.

This issue is generally worse when you are using FLATTEN than simple 
projection, since it is much more important to drop the fields for performance 
reasons.  Some sane syntax here could easily cut the size of most of my scripts 
by more than half!
                
> Enable '*' to skip any fields that have already been generated and cast in 
> other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE 
> manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2511
>                 URL: https://issues.apache.org/jira/browse/PIG-2511
>             Project: Pig
>          Issue Type: New Feature
>          Components: grunt, parser
>    Affects Versions: 0.9.1
>            Reporter: Russell Jurney
>              Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE, 
> dd MMM yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: 
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
> <line 8, column 30> Duplicate schema alias: date
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
>       at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
>       at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
>       at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>       at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>       at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>       at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>       at org.apache.pig.Main.run(Main.java:495)
>       at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to