[
https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205084#comment-13205084
]
Dmitriy V. Ryaboy commented on PIG-2511:
----------------------------------------
Couple of things:
First, you are using the tag "newbie" incorrectly :). I know you are trying to
be self-effacing and say "I am a newbie" but actually that tag is intended to
be "If I am a newbie and want to contribute to Pig, what JIRAs should I tackle
first while I get the lay of the land?". This is clearly not one of them.
Second, I think this feature would be oh-my-god confusing to users. The example
Thejas used above illustrates the point nicely, actually -- we use a udf of a,
b, c, and some constant to get a field called c, then project "the rest" --
with "the rest" being defined as "anything that doesn't have a name conflict".
But "the rest" could just as easily mean "the rest of the columns I didn't use"
(so, just d). It also changes the script if you rename one alias -- say, you
realize you didn't want to call the result of the udf c, but instead want to
call it processed_c, and all of a sudden the number of columns produced
changes, and their respective ordinals shift. It'll be a nightmare.
Just use a new name when generating your derived column. It's derived, after
all.
I'd be ok with some syntax that would indicate columns to *not* generate
("generate *^a^b"?), but the proposed syntax is fraught with peril.
> Enable '*' to skip any fields that have already been generated and cast in
> other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE
> manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-2511
> URL: https://issues.apache.org/jira/browse/PIG-2511
> Project: Pig
> Issue Type: New Feature
> Components: grunt, parser
> Affects Versions: 0.9.1
> Reporter: Russell Jurney
> Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE,
> dd MMM yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
> 1108:
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108:
> <line 8, column 30> Duplicate schema alias: date
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
> at
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
> at
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
> at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
> at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
> at
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
> at org.apache.pig.Main.run(Main.java:495)
> at org.apache.pig.Main.main(Main.java:111)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira