[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327602#comment-16327602
 ] 

Will Lauer edited comment on PIG-4608 at 1/16/18 7:08 PM:
----------------------------------------------------------

While up in the middle of the night dealing with a sick child, I realized there 
was way to make the parsing sane if updates, adds and deletes were to be 
included all in a single statement. How does this syntax look?

{code}
a = load 'input' using mock.Storage() as (x:chararray, y:chararray, z:long);
b = foreach a generate x+y as q, y, z:long;
c = foreach a update "prefix"+x as x, (chararray)(z+1) as z:charrarray;
d = foreach a delete x, z;
e = foreach a {
           nextInt = z+1;
           update nextInt as z:int
    }
f = foreach a
    add {
           1+oldCol as new:long,
           somethingElse as new2
     } delete {
           colToRemove,
           otherColToRemove
     } update {
           1+oldCol2 as updatedCol,
           "1"+oldCol2 as updatedTypeCol:chararray
     };
g = foreach a {
       nextInt = z+1;
       add {
              1+oldCol as new:long,
              somethingElse as new2
       } delete {
              colToRemove,
              otherColToRemove
       } update {
              1+oldCol2 as updatedCol,
              "1"+oldCol2 as updatedTypeCol:chararray
       };
    }
{code}

In this case, the surrounding curly braces would be required if putting 
multiple clauses in a single FOREACH. Add, delete, or update could all be 
included alone without the extra curly braces, but if you want to combine them, 
the curly braces would be required.


was (Author: wla...@yahoo-inc.com):
While up in the middle of the night dealing with a sick child, I realized there 
was way to make the parsing sane if updates, adds and deletes were to be 
included all in a single statement. How does this syntax look?

{code}
a = load 'input' using mock.Storage() as (x:chararray, y:chararray, z:long);
b = foreach a generate x+y as q, y, z:long;
c = foreach a update "prefix"+x as x, (chararray)(z+1) as z:charrarray;
d = foreach a delete x, z;
e = foreach a {
           nextInt = z+1;
           update nextInt as z:int
    }
f = foreach a
    add {
           1+oldCol as new:long,
           somethingElse as new2
     } delete {
           colToRemove,
           otherColToRemove
     } update {
           1+oldCol2 as updatedCol,
           "1"+oldCol2 as updatedTypeCol:chararray
     };
g = foreach a {
           nextInt = z+1;
           add {
                  1+oldCol as new:long,
                  somethingElse as new2
           } delete {
                  colToRemove,
                  otherColToRemove
           } update {
                  1+oldCol2 as updatedCol,
                  "1"+oldCol2 as updatedTypeCol:chararray
           };
    }
{code}

In this case, the surrounding curly braces would be required if putting 
multiple clauses in a single FOREACH. Add, delete, or update could all be 
included alone without the extra curly braces, but if you want to combine them, 
the curly braces would be required.

> FOREACH ... UPDATE
> ------------------
>
>                 Key: PIG-4608
>                 URL: https://issues.apache.org/jira/browse/PIG-4608
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Haley Thrapp
>            Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to