Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8322 )
Change subject: IMPALA-1422: support a constant on LHS of IN predicates. ...................................................................... Patch Set 6: (12 comments) main change is to simplify the current version to not handle correlated subqueries + "not in". adding this support so that the rewrite here is simpler requires additional support for inline view references to outer blocks that are more than one nesting level away. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94 PS2, Line 94: // TODO: derived slot refs (e.g., star-expanded) will not have rawPath set. > Why can't this be addressed in this patch? Not needed any longer... I think its useful to keep the comment around in case another rewrite hits the issue. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287 PS2, Line 287: * rewritten, null is returned. If 'inPred' is rewritten, the resulting expression > ... and the RHS is a subquery. Done http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299 PS2, Line 299: * 2) Predicate is NOT IN and RHS returns a single row. > What if the RHS subquery is correlated? correlation in the RHS is currently checked for and banned for limit queries. for an IN query with an aggregate, we're currently banning this, but for NOT IN and aggregation (interestingly) we're grouping by the join condition and pushing the NOT IN check to the having clause. I don't see a correctness issue with this. We may want to look into the IN handling to see if we're overly restrictive there. the comment about correlation on L303 is invalid with regard to the change in this implementation, so I've removed it. http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS2, Line 319: * NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) > Is the transformation even correct if RHS is correlated? I think its correct, but I've removed this case since it would be more cleanly done using the inline view approach of 3b. I'm not using that approach here since we don't handle references from inline view to outer blocks that are more than one level away. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312 PS6, Line 312: * a) No group by or analytic function in subquery. > Ideally, we should not have to distinguish between cases 3a and 3b, and we Simplified all this by removing the case for correlated not in. We can bring it back pending additional support for multi-block references. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316 PS6, Line 316: * REWRITE: > Use IFNULL() instead of CASE to simplify. done, thanks for the suggestion. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319 PS6, Line 319: * NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL THEN C ELSE e END) = C) > I think there's another case where this rewrite is not correct: when the su good catch-- I forgot to handle order/limit in subqueries. that said, its easy to change the condition to handle limit in 3b (correlation is not supported for those cases anyways). however, we're duplicating a bit of push-down logic which is cleaner *not* to do here, so these issues do not come up anymore. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321 PS6, Line 321: * Example: > The following query gives me an IllegalStateException: this change-- thanks for finding it. its handled now. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328 PS6, Line 328: * Note: it useful to think of C NOT IN (RHS) as the boolean expansion: > it is useful Done http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340 PS6, Line 340: * b) Group by or analytic function in subquery. > extra space after "by" Done http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345 PS6, Line 345: * NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS NOT NULL) > What does C=t mean here? goofed on the use of "t". re'orgd. http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354 PS6, Line 354: * - Assumes that subquery is uncorrelated, which is currently a requirement. > What does this note refer to? Above you state that "All cases apply to both reorgd all this. -- To view, visit http://gerrit.cloudera.org:8080/8322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb Gerrit-Change-Number: 8322 Gerrit-PatchSet: 6 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Tue, 07 Nov 2017 08:42:34 +0000 Gerrit-HasComments: Yes