[
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327602#comment-16327602
]
Will Lauer commented on PIG-4608:
---------------------------------
While up in the middle of the night dealing with a sick child, I realized there
was way to make the parsing sane if updates, adds and deletes were to be
included all in a single statement. How does this syntax look?
{code}
a = load 'input' using mock.Storage() as (x:chararray, y:chararray, z:long);
b = foreach a generate x+y as q, y, z:long;
c = foreach a update "prefix"+x as x, (chararray)(z+1) as z:charrarray;
d = foreach a delete x, z;
e = foreach a {
nextInt = z+1;
update nextInt as z:int
}
f = foreach a
add {
1+oldCol as new:long,
somethingElse as new2
} delete {
colToRemove,
otherColToRemove
} update {
1+oldCol2 as updatedCol,
"1"+oldCol2 as updatedTypeCol:chararray
};
g = foreach a {
nextInt = z+1;
add {
1+oldCol as new:long,
somethingElse as new2
} delete {
colToRemove,
otherColToRemove
} update {
1+oldCol2 as updatedCol,
"1"+oldCol2 as updatedTypeCol:chararray
};
}
{code}
In this case, the surrounding curly braces would be required if putting
multiple clauses in a single FOREACH. Add, delete, or update could all be
included alone without the extra curly braces, but if you want to combine them,
the curly braces would be required.
> FOREACH ... UPDATE
> ------------------
>
> Key: PIG-4608
> URL: https://issues.apache.org/jira/browse/PIG-4608
> Project: Pig
> Issue Type: New Feature
> Reporter: Haley Thrapp
> Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large
> number of fields (in the 20-200 range). Often, we need to only make
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the
> developer to focus on the actual logical changes instead of having to list
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe
> this can be done with changes to the parser and the creation of a new
> LOUpdate. No physical plan changes should be needed because we will leverage
> what LOGenerate does.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)