[jira] [Commented] (PIG-2511) Enable '' to skip any fields that have already been generated and cast in other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE manipulate(foo1) as foo1, ;

Scott Carey (Commented) (JIRA) Fri, 10 Feb 2012 11:03:22 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205642#comment-13205642
 ]


Scott Carey commented on PIG-2511:
----------------------------------

I have the same pain point.  65% of the LOC of my pig (1300+ lines in one 
script ...) are projection related for alias renaming ONLY.  

Here is a real chunk with only slightly obscured/shortened names:
{noformat}
P_SOP1 = FOREACH P_SOP GENERATE 
    F_P2::s as s, o as o, day as day, hour as hour,
    datetime as datetime, c as c, ex_s as ex_s,
    u as u, ex_u as ex_u, mvtt as mvtt, tid as tid,
    vgid as vgid, vcid as vcid, ex_pid as ex_pid,
    pid as pid, pgid as pgid, p_q as q, pc as pc,
    p_tc as tc;
{noformat}

In order for a script to be maintainable certain aliases need to be 'stable' 
and NOT contain any XYZPDQ:: prefixes.  Otherwise, downstream consumers of the 
alias will BREAK if the upstream data flow related to XYZPDQ change at all.

I am not so sure I like the exact semantics of the proposed @ operator, but I 
would use it.  The 'continuation' style projections are for a different purpose 
entirely.  One problem is "I need to project 20 of these 40 fields and create a 
couple derived ones".  Another is "alias name cleanup and normalization" -- 
when you do not want to remove or add fields at all, but need to rename a 
couple and keep the rest.  The same feature should not try and do both, it will 
end up being confusing.  Do one thing and do it well.  One feature for easy 
relabeling, one for column pruning/projection.  These might combine together to 
do both in one step, but it wouldn't be so bad if they were two steps as long 
as they were both very easy to use.

My instinct also says to be careful introducing another operator starting with 
a new character.  Perhaps "**" is better than "@".  All alias manipulation 
built-ins could start with "*" as a classification hint.

                
> Enable '*' to skip any fields that have already been generated and cast in 
> other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE 
> manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2511
>                 URL: https://issues.apache.org/jira/browse/PIG-2511
>             Project: Pig
>          Issue Type: New Feature
>          Components: grunt, parser
>    Affects Versions: 0.9.1
>            Reporter: Russell Jurney
>              Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE, 
> dd MMM yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: 
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
> <line 8, column 30> Duplicate schema alias: date
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
>       at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
>       at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
>       at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>       at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>       at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>       at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>       at org.apache.pig.Main.run(Main.java:495)
>       at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2511) Enable '*' to skip any fields that have already been generated and cast in other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE manipulate(foo1) as foo1, *;

Reply via email to

[jira] [Commented] (PIG-2511) Enable '' to skip any fields that have already been generated and cast in other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE manipulate(foo1) as foo1, ;