Add UDF function chaining syntax
--------------------------------
Key: PIG-2490
URL: https://issues.apache.org/jira/browse/PIG-2490
Project: Pig
Issue Type: Improvement
Reporter: David Ciemiewicz
Nested function/UDF calls make for very convoluted data transformations:
{code}
business1 9:00 AM - 4:00 PM
{code}
{code}
B = foreach A generate
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(hours,' AM','a'), ' PM', 'p'), ' *-
*', '-') as hours_normalized.
{code}
Yes, you could recast this as but it's still rather convoluted.
{code}
B = foreach A {
hours1 = REGEXREPLACE(hours,' AM\\b','a');
hours2 = REGEXREPLACE(hours1,' PM\\b','p');
hours3 = REGEXREPLACE(hours2,' *- *','-');
generate
hours3 as hours_normalized;
};
{code}
I suggest an "object-style" function chaining enhancement to the grammar a la
Java, JavaScript, etc.
{code}
B = foreach A generate
REGEXREPLACE(hours,' AM\\b','a').REGEXREPLACE(' PM\\b','p').REGEXREPLACE('
*- *','-') as hours_normalized;
{code}
This chaining notation makes it much clearer as to the sequence of actions
without the convoluted nesting.
In the case of the "object-method" style dot (.) notation, the result of the
prior expression is just used as the first value in the tuple passed to the
function call.
In other words, the following two expressions would be equivalent:
{code}
f(a,b)
a.f(b)
{code}
As such, I don't think there are any requirements to modify existing UDFs.
I think this is just a syntactic "sugar" enhancement that should be fairly
trivial to implement, yet would make coding complex data transformations with
Pig UDFs "cleaner".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira