Alex Behm has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. ......................................................................
Patch Set 1: (20 comments) I still need to add the targeted planner tests and adjust new tests after the rebase. In the meantime, you can continue looking at the code changes. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) { : output.append("assumed fk/pk"); > I haven't read the entire patch yet, so I am curious what this case represe In this case we have no equi-join conjuncts that we can reason about, so we are optimistic and assume fk/pk with a join selectivity of 1. We cannot reason about an equi-join conjunct if the underlying table/columns have no stats or if the conjunct involves non-trivial expressions (anything that is not a SlotRef). PS1, Line 178: for (int j = 0; j < slots.size(); ++j) { : output.append(slots.get(j).toString()); : if (j + 1 != slots.size()) output.append(", "); : } > Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots) Better. Done. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS1, Line 37: import org.apache.kudu.client.shaded.com.google.common.collect.Maps; > ? Sigh. Fixed. Line 91: protected List<List<EqJoinConjunctScanSlots>> fkPkEqJoinConjuncts_; > Class member is only used in one function in the base class. Maybe documen Added comment. Not sure it's worth protecting against accidental modifications in inheriting classes. I think it's fair to assume that subclasses should treat protected members carefully. Subclasses can already muck with pretty much anything here, and adding getters that wrap members with immutables seems cumbersome, and generates objects "unnecessarily". Line 253: return getFkPkJoinCardinality(eqJoinConjunctSlots, lhsCard, rhsCard); > This computes estimates based on both PK conjuncts and non-PK conjuncts usi 1. Changed the code to only pass those FK/PK conjuncts because that makes the behavior and code clearer. I'm not sure I follow your described scenario. If the RHS is very selective, then it's correct to apply that to the join cardinality - if we believe that FK/PK is indeed the case. Are you concerned that our FK/PK assumption could be wrong and we might underestimate the join cardinality 2. Good idea. A flattened, ordered list makes things simpler in various places. Done. PS1, Line 271: scanSlotsByTids > nit: Each pair represents two joined tables, no? So, maybe rename this to ' Done Line 275: for (List<EqJoinConjunctScanSlots> fkPkCandi: scanSlotsByTids.values()) { > fkPkCandi - this could use a better name, it isn't clear to me what this is I renamed it to fkPkCandidate, but I doubt that's what you had in mind. Can you help me come up with a better name? This thing is a list of equi-join conjuncts between the same two tables. These conjuncts could represent a single- or multi-column FK/PK join condition. PS1, Line 281: numRows > rhsTableCardinality? Used rhsNumRows because we typically use 'numRows' to indicate values that come directly from stats, although technically table cardinality is also correct of course. Line 290: > nit: extra line Done PS1, Line 294: conjuncts > conjunct slots Done PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin()); > This private function is only called in L253? I think you can remove this c Done PS1, Line 312: lhsNdv > Can this ever be 0? Probably. Better be defensive. Fixed here and elsewhere. PS1, Line 313: rhsNumRows > Same here, can this be 0? Probably. Better be defensive. Fixed here and elsewhere. PS1, Line 318: result = Math.min(result, joinCard); > That part I think is very important and I am not sure it is explained anywh This part applies to both methods (generic and FK/PK). I expanded the comment on getJoinCardinality() to explain why we take the min(). We should certainly try (in a subsequent patch) whether estimating the joint selectivity of multiple predicates with exponential backoff or similar works well. PS1, Line 329: conjuncts > conjunct slots Done PS1, Line 341: slots.lhs().getStats().getNumDistinctValues(); > nit: these are used in couple of places. Since you already have the EqJoinC Done PS1, Line 343: slots.lhs().getParent().getTable().getNumRows(); > same comment as above. Done PS1, Line 373: Preconditions > I don't think these checks are useful. This private constructor is only cal Done PS1, Line 382: . > nit: remove Done PS1, Line 399: tupleDesc > nit: inline Removed tupleDesc instead, otherwise the if condition below becomes multi-line and harder to read imo. -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes