[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258233#comment-15258233 ] Chon Ju Kim commented on PIG-3000: -- I encountered this issue with a little bit different code in our project. Here is a code snippet. {code} B = FOREACH A { a = foo(); b = SUM(a.x); GENERATE a, b, (t is null ? c : d); } {code} foo is called twice. Note that t is defined outside of the foreach. > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: Mona Chitnis > Attachments: PIG-3000-6.patch, unit_tests.patch > > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205160#comment-15205160 ] Daniel Dai commented on PIG-3000: - I don't think [~chitnis] is working on that. We will need to find a new owner for the issue. > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: Mona Chitnis > Attachments: PIG-3000-6.patch, unit_tests.patch > > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204921#comment-15204921 ] Kevin J. Price commented on PIG-3000: - Did this patch just get dropped? This is still a serious problem. > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: Mona Chitnis > Attachments: PIG-3000-6.patch, unit_tests.patch > > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036599#comment-14036599 ] Mona Chitnis commented on PIG-3000: --- thanks for taking a peek Daniel. I will rebase my patch to trunk Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Mona Chitnis Attachments: PIG-3000-6.patch, unit_tests.patch In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957211#comment-13957211 ] Daniel Dai commented on PIG-3000: - Hi, Mona, your last patch does not include the changes other than NestedForEachUserFunc.java, is it based on early patches? Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Mona Chitnis Attachments: PIG-3000-6.patch, unit_tests.patch In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13951028#comment-13951028 ] Mona Chitnis commented on PIG-3000: --- Updated rev-5 on RB: Working code for simple case - nested foreach userfunc loading single argument and generate operating on same argument. commented out code is to make it work for complex cases 1. multiple arguments 2. userfunc having a subset of generate arguments - how to pass through from initial load to initial foreach Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Mona Chitnis Attachments: unit_tests.patch In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947434#comment-13947434 ] Mona Chitnis commented on PIG-3000: --- [~daijy] Daniel Dai, can you please review the patch on reviewboard (revision 4)? https://reviews.apache.org/r/17376/diff/4/ I have described the stage I've reached with it. thanks Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Mona Chitnis Attachments: unit_tests.patch In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902256#comment-13902256 ] Mona Chitnis commented on PIG-3000: --- Patch updated to RB. Patch updated to handle the Projection with nothing to reference issue which was coming from the innerLoad of the altered ForEach. Doing an explain on new plan gives correct new optimized plan. The commented out part in the patch is because I observed that this was getting automatically done by SchemaPatcher and ProjectionPatcher listeners in the LogicalPlanOptimizer. However, this gives variable results for the uids and following error - Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: Couldn't find matching uid -1 for project org.apache.pig.builtin.upper_17:(Name: Project Type: chararray Uid: 38 Input: 0 Column: 0) at org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215) (Where upper_17 is an example unique alias generated for the UserFuncExpression operator in new plan) Any help is appreciated. This patch excludes unit tests and will upload all in next patch after fixing this issue. Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding Assignee: Mona Chitnis In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13889911#comment-13889911 ] Mona Chitnis commented on PIG-3000: --- Can someone assign this JIRA to me? Optimize nested foreach --- Key: PIG-3000 URL: https://issues.apache.org/jira/browse/PIG-3000 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Richard Ding In this Pig script: {code} A = load 'data' as (a:chararray); B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') ? 1 : 0); } {code} The Eval function UPPER is called twice for each record. This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v6.1.5#6160)