Alex Behm has posted comments on this change.

Change subject: IMPALA-4731/IMPALA-397/IMPALA-4728: Materialize sort exprs
......................................................................


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 212:   // True if this Expr is known to have a deterministic value, false 
if it is
True if this exprs always returns the same value given the same input. False if 
...


Line 213:   // non-determinstic or unknown (eg. UDFs). Set at the end of 
analyze() and valid if
e.g.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 292:     // We currently don't have a way to indicate if a UDF is 
deterministic, so just always
Ideally we should not conflate the UDF and deterministic concepts. For example, 
one might reasonably think that FunctionCallExpr.isNondeterministicBuiltinFn() 
should return the opposite of as Expr.isDeterministic() - but they don't and 
that could lead to confusion.

I suggest having two separate members.

We can revisit the treatment of UDFs with respect to determinism sometime later.


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

Line 253:       if (!(smap.getLhs().get(i) instanceof SlotRef)
use {} for multi-line if


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/main/java/org/apache/impala/analysis/SortInfo.java
File fe/src/main/java/org/apache/impala/analysis/SortInfo.java:

Line 39:   // TODO: run performance tests to determine this value.
Remove TODO, instead comment where this value came from, maybe with some 
examples of exprs that cost 5, also mention what the effect of this tuning 
parameter is (i.e. all ordering exprs with cost > this will be materialized)


Line 46:   private List<Expr> materializedOrderingExprs_;
needs a brief comment, in particular why we need to store them


Line 172:     TreeNode.collect(orderingExprs_, 
Predicates.instanceOf(SlotRef.class), sourceSlots);
We should collect the SlotRefs only for non-materialized ordering exprs.


Line 180:     // substOrderBy is the mapping from slot refs in the sort node's 
input to slot refs in
update comment to reflect the new contents of the substOrderBy smap


Line 196:     // The ordering exprs still point to the old slot refs and need 
to be replaced with
update comment, we are not only substituting slot refs


Line 208:    * SlotRefs to 'sortTupleExprs'.
Say something about the args of this function. Also mention side-effects 
(populates materializedOrderingExprs_)


Line 210:    * Materialize ordering exprs that are non-deterministic, are more 
expensive than a cost
make this the first sentence in this comment


Line 211:    * threshold, or don't have a cost set. Also obey the 
'materialize_sort' query option,
no more hint, right?


Line 214:    * Returns an ExprSubstitutionMap from the new SlotRefs to the 
original exprs.
from the materialized sort exprs to the new SlotRefs


Line 216:   public ExprSubstitutionMap createMaterializedOrderExprs(
Instead of returning a new smap and combining with the other one, why not pass 
in the existing smap and directly add entries to it? The 'sortTupleExprs' param 
is just the lhs of the existing smap, so I think this interface can be made a 
little simpler / more direct.

Then you could use the return value to return the exprs that you chose to 
materialize. Fewer side-effects are easier to reason about.


Line 217:       TupleDescriptor sortTupleDesc, List<Expr> sortTupleExprs, 
Analyzer analyzer) {
analyzer needed?


Line 219:     //List<Expr> materializedOrderingExprs = Lists.newArrayList();
remove


Line 228:         // LiteralExprs don't affect the sort order. TODO: consider 
removing the sort node
Why TODO? What happens when there are no sort exprs at all left?


http://gerrit.cloudera.org:8080/#/c/6322/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 374:   public void testSortMaterialization() {
testSortExprMaterialization


http://gerrit.cloudera.org:8080/#/c/6322/1/testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/sort-materialization.test:

Line 19: # literal order expr and cheap order expr don't get materialized
I think these tests need more examples cheap and expensive exprs to make sure 
we mostly do the right thing. For example, there is no test with a naked 
SlotRef. I suggest cramming more exprs into each individual test case.

We also need to test more than 2 sort exprs.

Also: Needs to test UDFs and ideally also UDAs (inside the ORDER BY of an 
analytic function). Let's talk if this is tricky to do with the current test 
infra.


Line 37: select last_value(id) over (order by if(int_col = 0, id, int_col), 
bool_col) from functional.alltypes
Can you ask Greg about some specific expensive ordering exprs that have come up 
recently? We should make sure that we do the right thing for them.


Line 106: |  materialized: log2(id)
It doesn't seem right to print this here because the merging exchange does not 
actually materialize the expr. It relies on the preceding Top-N/Sort whatever 
to do that.


http://gerrit.cloudera.org:8080/#/c/6322/1/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

Line 165:     # "order by random()" should return results in a different order 
from unordered.
Confusing sentence, what is 'unordered'?
I think you can remove this and the next two lines


Line 168:     results = self.execute_query("select id from 
functional.alltypestiny")
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/6322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifefdaff8557a30ac44ea82ed428e6d1ffbca2e9e
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to