[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024946#comment-16024946 ] Koji Noguchi commented on PIG-5224: --- bq. That's only if user write "foreach" statement carefully. If he project a column but never used in the script, Column pruner will still think this is a column should remove. Ah, you're right (as always) :) Committing pig-5224-v2.patch. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, > pig-5224-v2.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024919#comment-16024919 ] Daniel Dai commented on PIG-5224: - bq. Well, if next LOForEach is not removing all the columns which are not used, then essentially those columns are being used and therefore ColumnPruner would not have tried to prune them in the first place? That's only if user write "foreach" statement carefully. If he project a column but never used in the script, Column pruner will still think this is a column should remove. +1 for pig-5224-v2.patch. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, > pig-5224-v2.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024320#comment-16024320 ] Daniel Dai commented on PIG-5224: - The inserted LOForEach remove all the columns which are not used in the scripts going forward. The next LOForEach is not necessary doing that. I believe this is not for performance reason (The performance gain for removing several columns might be debatable), this is to make ColumnPruner simpler. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974882#comment-15974882 ] Koji Noguchi commented on PIG-5224: --- [~daijy], probably a silly question. In what situation do we still want to insert LOForEach for columnpruning when next operation is also a LOForEach? > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971309#comment-15971309 ] Koji Noguchi commented on PIG-5224: --- Stepping back a bit. Giving a little more details. For the script in description, PhysicalPlan looks like {noformat} D: Store(/tmp/delteme:org.apache.pig.builtin.PigStorage) - scope-18 | |---D: New For Each(false)[bag] - scope-17 | | | POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] - scope-13 | | | |---RelationToExpressionProject[bag][*] - scope-12 | | | |---o: POSort[bag]() - scope-16 | | | | | Project[int][0] - scope-15 | | | |---Project[bag][0] - scope-14 | |---C: New For Each(false)[bag] - scope-11 | | | Project[bag][1] - scope-9 | |---C: Package(Packager)[tuple]{int} - scope-6 {noformat} where the "C" with "\*\*\*\*" is the extra foreach from columnpruning and this foreach conflicts with {{AccumulatorOptimizerUtil.addAccumulator}} where it looks at immediate successor of POPackage which is "\*\*\*\* C: New For Each instead of "D: New For Each" that we want. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971298#comment-15971298 ] Koji Noguchi commented on PIG-5224: --- {quote} If this is a problem of extra foreach after LOCogroup, why not adding the same check in ColumnPruneVisitor.visit(LOCogroup cg) instead of addForEachIfNecessary? {quote} That would work. I followed the logic in LOLoad where we had {code} // if there is already a LOForEach after load, we don't need to // add another LOForEach if (next instanceof LOForEach) { return; } {code} and assumed this would apply to all other foreach insertion cases. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969841#comment-15969841 ] Daniel Dai commented on PIG-5224: - If this is a problem of extra foreach after LOCogroup, why not adding the same check in ColumnPruneVisitor.visit(LOCogroup cg) instead of addForEachIfNecessary? > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)