[
https://issues.apache.org/jira/browse/PIG-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896368#comment-16896368
]
Jeffrey Brownlow commented on PIG-4449:
---------------------------------------
{code:java}
grouped_data_set = group data_set by id;
capped_data_set = foreach grouped_data_set
{
ordered = order joined_data_set by timestamp desc;
capped = limit ordered $num;
generate order, flatten(capped);
};{code}
Included the sorted alias in the generate statement fires off this error:
{code:java}
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias
itaConversionsFinal
at org.apache.pig.PigServer.storeEx(PigServer.java:1127)
at org.apache.pig.PigServer.store(PigServer.java:1086)
at org.apache.pig.PigServer.openIterator(PigServer.java:999)
... 26 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000:
Error processing rule NestedLimitOptimizer. Try -t NestedLimitOptimizer
at
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125)
at
org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:281)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1462)
at org.apache.pig.PigServer.storeEx(PigServer.java:1123)
... 28 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225:
Projection with nothing to reference!
at
org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:430)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:281)
at
org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
at
org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:215)
at
org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
at
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
at
org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:263)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at
org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
at
org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at
org.apache.pig.newplan.logical.optimizer.SchemaPatcher.transformed(SchemaPatcher.java:43)
at
org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:116)
... 31 more{code}
> Optimize the case of Order by + Limit in nested foreach
> -------------------------------------------------------
>
> Key: PIG-4449
> URL: https://issues.apache.org/jira/browse/PIG-4449
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Priority: Major
> Labels: Performance
> Fix For: 0.18.0
>
>
> This is one of the very frequently used patterns
> {code}
> grouped_data_set = group data_set by id;
> capped_data_set = foreach grouped_data_set
> {
> ordered = order joined_data_set by timestamp desc;
> capped = limit ordered $num;
> generate flatten(capped);
> };
> {code}
> But this performs very poorly when there are millions of rows for a key in
> the groupby with lot of spills. This can be easily optimized by pushing the
> limit into the InternalSortedBag and maintain only $num records any time and
> avoid memory pressure.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)