[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-05-25 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024946#comment-16024946
 ] 

Koji Noguchi commented on PIG-5224:
---

bq. That's only if user write "foreach" statement carefully. If he project a 
column but never used in the script, Column pruner will still think this is a 
column should remove.

Ah, you're right (as always) :)
Committing pig-5224-v2.patch.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, 
> pig-5224-v2.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-05-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024919#comment-16024919
 ] 

Daniel Dai commented on PIG-5224:
-

bq. Well, if next LOForEach is not removing all the columns which are not used, 
then essentially those columns are being used and therefore ColumnPruner would 
not have tried to prune them in the first place?
That's only if user write "foreach" statement carefully. If he project a column 
but never used in the script, Column pruner will still think this is a column 
should remove.

+1 for pig-5224-v2.patch.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, 
> pig-5224-v2.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-05-25 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024320#comment-16024320
 ] 

Daniel Dai commented on PIG-5224:
-

The inserted LOForEach remove all the columns which are not used in the scripts 
going forward. The next LOForEach is not necessary doing that. I believe this 
is not for performance reason (The performance gain for removing several 
columns might be debatable), this is to make ColumnPruner simpler.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-04-19 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974882#comment-15974882
 ] 

Koji Noguchi commented on PIG-5224:
---

[~daijy], probably a silly question.  
In what situation do we still want to insert LOForEach for columnpruning when 
next operation is also a LOForEach? 

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-04-17 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971309#comment-15971309
 ] 

Koji Noguchi commented on PIG-5224:
---

Stepping back a bit.  Giving a little more details.

For the script in description, PhysicalPlan looks like 
{noformat}
D: Store(/tmp/delteme:org.apache.pig.builtin.PigStorage) - scope-18
|
|---D: New For Each(false)[bag] - scope-17
|   |
|   POUserFunc(org.apache.pig.test.utils.AccumulatorBagCount)[int] - 
scope-13
|   |
|   |---RelationToExpressionProject[bag][*] - scope-12
|   |
|   |---o: POSort[bag]() - scope-16
|   |   |
|   |   Project[int][0] - scope-15
|   |
|   |---Project[bag][0] - scope-14
|
|---C: New For Each(false)[bag] - scope-11
|   |
|   Project[bag][1] - scope-9
|
|---C: Package(Packager)[tuple]{int} - scope-6
{noformat}
where the "C" with "\*\*\*\*" is the extra foreach from columnpruning and this 
foreach conflicts with {{AccumulatorOptimizerUtil.addAccumulator}} where it 
looks at immediate successor of POPackage which is "\*\*\*\* C: New For Each 
instead of "D: New For Each" that we want.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-04-17 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971298#comment-15971298
 ] 

Koji Noguchi commented on PIG-5224:
---

{quote}
If this is a problem of extra foreach after LOCogroup, why not adding the same 
check in ColumnPruneVisitor.visit(LOCogroup cg) instead of 
addForEachIfNecessary?
{quote}
That would work.
I followed the logic in LOLoad where we had 
{code}
  // if there is already a LOForEach after load, we don't need 
to
  // add another LOForEach
  if (next instanceof LOForEach) {
  return;
  }
{code}
and assumed this would apply to all other foreach insertion cases. 


> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-04-15 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969841#comment-15969841
 ] 

Daniel Dai commented on PIG-5224:
-

If this is a problem of extra foreach after LOCogroup, why not adding the same 
check in ColumnPruneVisitor.visit(LOCogroup cg) instead of 
addForEachIfNecessary?

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)