Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5547: Rework FK/PK join detection. ......................................................................
Patch Set 1: (17 comments) I have one comment about testing. We have lots of test changes but it's not easy to verify if the end result makes sense or not. I think it may worth having a planner test with explain_level > 2 that covers a few interesting join cases so that we can see what kind of pk/fk conjuncts are detected and what the impact is on estimated join cardinalities. http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java: PS1, Line 170: else if (fkPkEqJoinConjuncts_.isEmpty()) { : output.append("assumed fk/pk"); I haven't read the entire patch yet, so I am curious what this case represents. PS1, Line 178: for (int j = 0; j < slots.size(); ++j) { : output.append(slots.get(j).toString()); : if (j + 1 != slots.size()) output.append(", "); : } Can't you use the Guava's Joiner class here? Joiner.on(",").join(slots) http://gerrit.cloudera.org:8080/#/c/7257/1/fe/src/main/java/org/apache/impala/planner/JoinNode.java File fe/src/main/java/org/apache/impala/planner/JoinNode.java: PS1, Line 37: import org.apache.kudu.client.shaded.com.google.common.collect.Maps; ? PS1, Line 271: scanSlotsByTids nit: Each pair represents two joined tables, no? So, maybe rename this to 'scanSlotsByJoinedTids'? A bit more explicit on what this thing represents. PS1, Line 281: numRows rhsTableCardinality? Line 290: nit: extra line PS1, Line 294: conjuncts conjunct slots PS1, Line 300: Preconditions.checkState(joinOp_.isInnerJoin() || joinOp_.isOuterJoin()); This private function is only called in L253? I think you can remove this check. PS1, Line 312: lhsNdv Can this ever be 0? PS1, Line 313: rhsNumRows Same here, can this be 0? PS1, Line 318: result = Math.min(result, joinCard); That part I think is very important and I am not sure it is explained anywhere. Maybe expand the comment in L209? PS1, Line 329: conjuncts conjunct slots PS1, Line 341: slots.lhs().getStats().getNumDistinctValues(); nit: these are used in couple of places. Since you already have the EqJoinConjunctScanSlots class, why don't your wrap them in some nice util functions there (e.g. getLhsNdvs() or something like that). PS1, Line 343: slots.lhs().getParent().getTable().getNumRows(); same comment as above. PS1, Line 373: Preconditions I don't think these checks are useful. This private constructor is only called through the static function that has tight control on what gets passed in this ctor. PS1, Line 382: . nit: remove PS1, Line 399: tupleDesc nit: inline -- To view, visit http://gerrit.cloudera.org:8080/7257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I49074fe743a28573cff541ef7dbd0edd88892067 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Zach Amsden <zams...@cloudera.com> Gerrit-HasComments: Yes