[ 
https://issues.apache.org/jira/browse/PIG-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205927#comment-13205927
 ] 

Scott Carey commented on PIG-2511:
----------------------------------

{quote}
Pig should already "just work" when you use field aliases without a prefix, 
except for cases where there is a conflict without the deduplication. File bugs 
when you encounter cases where it doesn't.
{quote}

Part of it is that in the past, when the bulk of our large scripts were 
written, it didn't just work.  Rather than wrestle with figuring out where the 
issues were coming from, it became easier to just rename after every group or 
join.  
Now, if all of those bugs are gone there is still an issue -- an alias with no 
ambiguities at all (or after any inner join) you must still proactively re-name 
columns in order for the script to be maintainable and not leak alias names 
past their usefulness.

{quote}
We should be able to drop the prefixes up front when they are not needed. 
That's a good suggestion, let's do that.
{quote}
That would be awesome, although there are some script backwards-compatibility 
details to work out.  Combine that with * exclusion syntax and at least you can 
have an upper bound of N/2 columns to type when only dropping fields. 

Might I suggest a syntax more like:
>  generate foo::x as x, *^(foo::x, bar::x, baz::x);

instead of
>  generate foo::x as x, *^foo::x^bar::x^baz::x;

This is because column names often appear in comma separated order in pig, 
sometimes within parenthesis.  Delimiting by ^ is less consistent and pig-like. 
 Also, you can copy-paste lists of columns this way, or break them across lines 
if there are many.  

Another bit of sugar might be wildcards inside that:
>  generate foo::x as X, *^::x;

or even
>  generate foo::x as x, *^x;

(english -> foo::x as x, and everything else without the short name x)

{quote}
I agree 'renaming a just a couple of fields out of many fields' is a major 
pain. PIG-1693 should help a lot.
{quote}

PIG-1693 looks nice for positional field users, but for those that use names it 
isn't nearly as powerful.  names and positional order don't mix.  Naming is 
great because in larger scripts you can address fields by name and not have to 
worry about fields being added or removed or re-ordered.   

We have aliases that take >100 lines of pig to create, that are shared by many 
downstream users each with 10 to 100 line scripts.  The contract of the alias 
is its fields and their names.  As long as downstream users use names and not 
positions, upstream changes are safe.  Only downstream users that need to know 
about additional or removed fields are impacted by changes to the upstream 
script.  
In order to use PIG-1693, order would have to become part of the contract, 
which is unacceptable for maintenance purposes in very large script collections.

What if you had 30 columns and need to remove 10 of them, randomly distributed? 
 PIG-1693 doesn't work so well with that.  A 'project all except' operator 
would be far more straightforward and clear as to what is happening: *^(drop 
these columns).


                
> Enable '*' to skip any fields that have already been generated and cast in 
> other parts of the GENERATE, as in: foo = FOREACH my_relation GENERATE 
> manipulate(foo1) as foo1, *;
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2511
>                 URL: https://issues.apache.org/jira/browse/PIG-2511
>             Project: Pig
>          Issue Type: New Feature
>          Components: grunt, parser
>    Affects Versions: 0.9.1
>            Reporter: Russell Jurney
>              Labels: grunt, latin, newbie, pig
>
> This should work:
> grunt> good_dates = foreach filtered generate CustomFormatToISO(date, 'EEE, 
> dd MMM yyyy HH:mm:ss Z') AS date, *;
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1108: 
> <line 8, column 30> Duplicate schema alias: date
> 2012-02-06 14:56:23,286 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
> <line 8, column 30> Duplicate schema alias: date
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.validate(SchemaAliasVisitor.java:74)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:104)
>       at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:240)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at 
> org.apache.pig.newplan.logical.visitor.SchemaAliasVisitor.visit(SchemaAliasVisitor.java:99)
>       at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:74)
>       at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>       at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>       at org.apache.pig.PigServer$Graph.compile(PigServer.java:1661)
>       at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1610)
>       at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1582)
>       at org.apache.pig.PigServer.registerQuery(PigServer.java:584)
>       at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:942)
>       at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
>       at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>       at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
>       at org.apache.pig.Main.run(Main.java:495)
>       at org.apache.pig.Main.main(Main.java:111)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to