[
https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205569#comment-13205569
]
Russell Jurney commented on PIG-2511:
-------------------------------------
The semantics I laid out are simple: don't overload a field if you want it
auto-included.
Taking a step back... let me state the problem: my pig scripts look more
complicated than they are. When I join, or manipulate my data, if I don't
elicitly do:
foreach my foo generate foo as foo, bar as bar, etc. then the DESCRIBE of that
relation is unreadable. I don't want to rename fields each and every foreach
if the field's functiion/identity hasn't changed. So I can't use *. Listing out
fields to generate AS themselves feels wrong because it is so verbose. And yet
I need my clean DESCRIBES and consistent column names. So my code balloons. I
don't know a better way, but maybe there is one.
There are a couple issues in there, but that is what I'd like to address.
What about a udf? rest(), or others() as in: generate LOWER(foo) as foo,
others(); Can the UDF get the input schema and... emit those fields? Don't
know it a udf can do that.
> Enable '*' to skip any fields that have already been generated and cast in
> other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE
> manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-2511
> URL: https://issues.apache.org/jira/browse/PIG-2511
> Project: Pig
> Issue Type: New Feature
> Components: grunt, parser
> Affects Versions: 0.9.1
> Reporter: Russell Jurney
> Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE,
> dd MMM yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1108:
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108:
> <line 8, column 30> Duplicate schema alias: date
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
> at
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
> at
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira