[ https://issues.apache.org/jira/browse/PIG-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheolsoo Park updated PIG-3268: ------------------------------- Attachment: PIG-3268-5.patch The problem was that one LogicalExpression object (case expr) was shared by multiple BinCondExpression objects (when exprs). To fix it, I clone 1st expression following CASE token and insert it before every when expression in QueryParser. Then, I construct a new LogicalExpression object per BinCondExpression in LogicalPlanGenerator. In shorts, {code} CASE e1 WHEN e2 THEN e3 WHEN e4 THEN e5 ELSE e6 END {code} => {code} ^(CASE e1, e2, e3, e1, e4, e5, e6) // Note that there are two e1's. {code} => {code} e1 == e4 ? e5 : (e1 == e2 ? e3 : e6) {code} I updated unit tests. I also verified that the explain output of case statement is identical to that of hand-written nested bincond expressions. Thanks! > Case statement support > ---------------------- > > Key: PIG-3268 > URL: https://issues.apache.org/jira/browse/PIG-3268 > Project: Pig > Issue Type: New Feature > Components: internal-udfs, parser > Affects Versions: 0.11 > Reporter: Cheolsoo Park > Assignee: Cheolsoo Park > Fix For: 0.12 > > Attachments: PIG-3268-2.patch, PIG-3268-3.patch, PIG-3268-4.patch, > PIG-3268-5.patch, PIG-3268.patch > > > Currently, Pig has no support for case statement. To mimic it, users often > use nested bincond operators. However, that easily becomes unreadable when > there are multiple levels of nesting. > For example, > {code} > a = LOAD '1.txt' USING PigStorage(',') AS (i:int); > b = FOREACH a GENERATE ( > i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2') > ); > {code} > This can be re-written much more nicely using case statement as follows: > {code} > a = LOAD '1.txt' USING PigStorage(',') AS (i:int); > b = FOREACH a GENERATE ( > CASE i % 3 > WHEN 0 THEN '3n' > WHEN 1 THEN '3n + 1' > ELSE '3n + 2' > END > ); > {code} > I propose that we implement case statement in the following manner: > * Add built-in UDFs that take expressions as args. Take for example the > aforementioned case statement, we can define a UDF such as {{builtInUdf(i % > 3, 0, '3n', 1, '3n + 1', '3n + 2')}}. > * Add syntactical sugar for these built-in UDFs. > In fact, I burrowed this idea from HIVE-164. > One downside of this approach is that all the possible args schemas of these > UDFs must be pre-computed. Specifically, we need to populate the full list of > possible args schemas in {{EvalFunc.getArgToFuncMapping}}. > In particular, since we obviously cannot support infinitely long args, it is > necessary to impose a limit on the size of when branches. For now, I > arbitrarily chose 50, but it can be easily changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira