Alex Behm has posted comments on this change.

Change subject: IMPALA-5547: Rework FK/PK join detection.
......................................................................


Patch Set 1:

(20 comments)

I still need to add the targeted planner tests and adjust new tests after the 
rebase. In the meantime, you can continue looking at the code changes.

http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) {
             :           output.append("assumed fk/pk");
> I haven't read the entire patch yet, so I am curious what this case represe
In this case we have no equi-join conjuncts that we can reason about, so we are 
optimistic and assume fk/pk with a join selectivity of 1.

We cannot reason about an equi-join conjunct if the underlying table/columns 
have no stats or if the conjunct involves non-trivial expressions (anything 
that is not a SlotRef).


PS1, Line 178: for (int j = 0; j < slots.size(); ++j) {
             :               output.append(slots.get(j).toString());
             :               if (j + 1 != slots.size()) output.append(", ");
             :             }
> Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots)
Better. Done.


http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

PS1, Line 37: import 
org.apache.kudu.client.shaded.com.google.common.collect.Maps;
> ?
Sigh. Fixed.


Line 91:   protected List<List<EqJoinConjunctScanSlots>> fkPkEqJoinConjuncts_;
> Class member is only used in one function in the base class.  Maybe documen
Added comment. Not sure it's worth protecting against accidental modifications 
in inheriting classes. I think it's fair to assume that subclasses should treat 
protected members carefully. Subclasses can already muck with pretty much 
anything here, and adding getters that wrap members with immutables seems 
cumbersome, and generates objects "unnecessarily".


Line 253:       return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, 
rhsCard);
> This computes estimates based on both PK conjuncts and non-PK conjuncts usi
1. Changed the code to only pass those FK/PK conjuncts because that makes the 
behavior and code clearer. I'm not sure I follow your described scenario. If 
the RHS is very selective, then it's correct to apply that to the join 
cardinality - if we believe that FK/PK is indeed the case. Are you concerned 
that our FK/PK assumption could be wrong and we might underestimate the join 
cardinality

2. Good idea. A flattened, ordered list makes things simpler in various places. 
Done.


PS1, Line 271: scanSlotsByTids
> nit: Each pair represents two joined tables, no? So, maybe rename this to '
Done


Line 275:     for (List<EqJoinConjunctScanSlots> fkPkCandi: 
scanSlotsByTids.values()) {
> fkPkCandi - this could use a better name, it isn't clear to me what this is
I renamed it to fkPkCandidate, but I doubt that's what you had in mind. Can you 
help me come up with a better name? This thing is a list of equi-join conjuncts 
between the same two tables. These conjuncts could represent a single- or 
multi-column FK/PK join condition.


PS1, Line 281: numRows
> rhsTableCardinality?
Used rhsNumRows because we typically use 'numRows' to indicate values that come 
directly from stats, although technically table cardinality is also correct of 
course.


Line 290: 
> nit: extra line
Done


PS1, Line 294: conjuncts
> conjunct slots
Done


PS1, Line 300:     Preconditions.checkState(joinOp_.isInnerJoin() || 
joinOp_.isOuterJoin());
> This private function is only called in L253? I think you can remove this c
Done


PS1, Line 312: lhsNdv
> Can this ever be 0?
Probably. Better be defensive. Fixed here and elsewhere.


PS1, Line 313: rhsNumRows
> Same here, can this be 0?
Probably. Better be defensive. Fixed here and elsewhere.


PS1, Line 318: result = Math.min(result, joinCard);
> That part I think is very important and I am not sure it is explained anywh
This part applies to both methods (generic and FK/PK). I expanded the comment 
on getJoinCardinality() to explain why we take the min().

We should certainly try (in a subsequent patch) whether estimating the joint 
selectivity of multiple predicates with exponential backoff or similar works 
well.


PS1, Line 329: conjuncts
> conjunct slots
Done


PS1, Line 341: slots.lhs().getStats().getNumDistinctValues();
> nit: these are used in couple of places. Since you already have the EqJoinC
Done


PS1, Line 343: slots.lhs().getParent().getTable().getNumRows();
> same comment as above.
Done


PS1, Line 373: Preconditions
> I don't think these checks are useful. This private constructor is only cal
Done


PS1, Line 382: .
> nit: remove
Done


PS1, Line 399: tupleDesc
> nit: inline
Removed tupleDesc instead, otherwise the if condition below becomes multi-line 
and harder to read imo.


-- 
To view, visit http://gerrit.cloudera.org:8080/7257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to