Taewoo Kim has posted comments on this change. Change subject: Applied the multiway fuzzyjoin based on the prefix-based join and the selectFuzzyJoin testCases. ......................................................................
Patch Set 39: (15 comments) A few more comments. https://asterix-gerrit.ics.uci.edu/#/c/1076/39/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java File asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/FuzzyJoinRule.java: Line 138: @Override public boolean rewritePost(Mutable<ILogicalOperator> opRef, IOptimizationContext context) public boolean ... should start at new line. Line 141: // current operator is join --> current operator should be a join. Line 147: // Find GET_ITEM function. --> Find GET_ITEM function in the join condition. Line 149: Mutable<ILogicalExpression> expRef = joinOp.getCondition(); expRef -> exprRef Line 190: Are we always sure that we always see two variables, not some expressions? Line 202: // leftInputPKs in currrentPKs extract all the PKs derived from the left branch in the newest fuzzyjoin. Could you explain the meaning of "newest fuzzy join"? Line 212: IAType leftType = (IAType) context.getOutputTypeEnvironment(leftInputOp).getVarType(leftInputVar); You calculate this again here. leftType was already calculated in regardAsPrefixFuzzyJoin(). Line 258: // 2. Otherwise, we can apply this rule to its branches to trigger a prefix-based fuzzyjoin. For the above comments, I would like to suggest the following. Please check whether my understanding is correct. /** * To handle multiple fuzzy-join conditions on a same pair of datasets, this rule checks the PKs in bottom-up way. * The previousPKs list incrementally maintains the PKs from a previous fuzzy-join operator's input branches. * In addition, the given fuzzy-join operator has been successfully translated into a prefix-based fuzzy join * sub-plan of the current fuzzy-join operator. There are two cases: * 1. If the previousPKs list contains the currentPKs list (the PKs from the input branches of the current * fuzzy-join operator), this means that the current fuzzy-join condition has no new input branch. This case * SHOULD BE regarded as a SELECT over one of the previous fuzzy-joins. * 2. Otherwise, we can apply this rule to the current fuzzy-join operator to a new prefix-based fuzzy-join plan. */ Line 259: private boolean regardAsPrefixFuzzyJoin(IOptimizationContext context, ILogicalOperator leftInputOp, regardAsPrefixFuzzyJoin -> isPrefixFuzzyJoin might be better? Line 265: // If PKs derived from the both branches are SAME as a previous fuzzyjoin, we treat this ~= as a select. we treat this ~= as a select ==> we treat this as a select over a fuzzy-join. Line 272: //Suppose we want to query on the same table on the different fields, i.e. A.a1 ~= B.b1 AND A.a2 ~= B.b2 table -> dataset. Please apply this to the all places. Line 275: // Avoid the duplicated PK generation in findPrimaryKeysInSubplan, especially for multiway fuzzy join. Avoid --> Avoids Line 278: // Fail if primary keys could not be inferred. Fail --> Fails Line 285: // left-hand side and right-hand side of fuzzyjoin has the same type has the same type --> should be the same type. Line 464: private Mutable<ILogicalExpression> getSimilarityExpression(Mutable<ILogicalExpression> expRef) { expRef -> exprRef -- To view, visit https://asterix-gerrit.ics.uci.edu/1076 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8736f104905eeda763d39709e002c2b9629278cc Gerrit-PatchSet: 39 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Wenhai Li <lwhaym...@yahoo.com> Gerrit-Reviewer: Chen Li <che...@gmail.com> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Taewoo Kim <wangs...@yahoo.com> Gerrit-Reviewer: Wenhai Li <lwhaym...@yahoo.com> Gerrit-HasComments: Yes