zabetak commented on code in PR #6202:
URL: https://github.com/apache/hive/pull/6202#discussion_r2655505625
##########
ql/src/test/results/clientpositive/llap/lateral_view.q.out:
##########
@@ -100,22 +90,17 @@ STAGE PLANS:
Limit
Number of rows: 1
Statistics: Num rows: 1 Data size: 178 Basic stats: COMPLETE
Column stats: COMPLETE
- Top N Key Operator
- sort order: ++
- keys: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1
(type: int)
- null sort order: zz
+ Select Operator
+ expressions: KEY.reducesinkkey0 (type: string), VALUE._col0
(type: string), KEY.reducesinkkey1 (type: int)
+ outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 178 Basic stats: COMPLETE
Column stats: COMPLETE
- top n: 1
- Select Operator
- expressions: KEY.reducesinkkey0 (type: string),
VALUE._col0 (type: string), KEY.reducesinkkey1 (type: int)
- outputColumnNames: _col0, _col1, _col2
+ Reduce Output Operator
+ key expressions: _col0 (type: string), _col2 (type: int)
+ null sort order: zz
+ sort order: ++
Statistics: Num rows: 1 Data size: 178 Basic stats:
COMPLETE Column stats: COMPLETE
- Reduce Output Operator
- key expressions: _col0 (type: string), _col2 (type: int)
- null sort order: zz
- sort order: ++
- Statistics: Num rows: 1 Data size: 178 Basic stats:
COMPLETE Column stats: COMPLETE
- value expressions: _col1 (type: string)
+ TopN Hash Memory Usage: 0.1
Review Comment:
When the plan contains a `Limit` operator all subsequent TopN optimizations
in the same mapper/reducer seem redundant. It's out of scope of the current PR
but it may be worth logging a separate JIRA ticket as a potential improvement.
##########
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java:
##########
@@ -141,6 +141,9 @@ private ReducerTraits(int trait) {
// used to decide whether global order is needed
private transient boolean hasOrderBy = false;
+ // used to decide whether topn key optimisation can be applied
+ private transient boolean hasOnlyOrderByLimit = false;
Review Comment:
Do we really need this new indicator here? IT seems to be a hint about the
structure of the plan rather than a property/quality of the `ReduceSink`
operator.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyProcessor.java:
##########
@@ -71,6 +71,13 @@ public Object process(Node nd, Stack<Node> stack,
NodeProcessorCtx procCtx,
return null;
}
+ // Skip the current optimization when a simple global ORDER BY...LIMIT is
present
+ // (topN > -1 and hasOnlyOrderByLimit()).
+ // This plan structure is handled more efficiently by the specialized
'TopN In Reducer' optimization.
+ if (reduceSinkDesc.getTopN() > -1 && reduceSinkDesc.hasOnlyOrderByLimit())
{
Review Comment:
It seems that we don't want to introduce a `TopNKeyOperator` if the path
between TS and RS does not contain a JOIN or GBY operator. We can add such
checks at some point here or maybe better modify the respective `RuleRegExp` to
only trigger when there is a GBY or JOIN in the path.
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/LimitPushdownOptimizer.java:
##########
@@ -117,18 +118,29 @@ private static class TopNReducer implements
SemanticNodeProcessor {
@Override
public Object process(Node nd, Stack<Node> stack,
NodeProcessorCtx procCtx, Object... nodeOutputs)
throws SemanticException {
+ boolean hasOnlyOrderByLimit = true;
ReduceSinkOperator rs = null;
for (int i = stack.size() - 2 ; i >= 0; i--) {
Operator<?> operator = (Operator<?>) stack.get(i);
- if (operator.getNumChild() != 1) {
- return false; // multi-GBY single-RS (TODO)
- }
- if (operator instanceof ReduceSinkOperator) {
- rs = (ReduceSinkOperator) operator;
- break;
+
+ if (operator instanceof GroupByOperator || operator instanceof
JoinOperator) {
+ hasOnlyOrderByLimit = false;
Review Comment:
The decision to introduce or not a `TopNKeyOperator` is a responsibility of
the the `TopNKeyProcessor` so if we need to make decisions based on the
structure of the plan it makes more sense to do put the GBY/JOIN checks there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]