Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8322 )

Change subject: IMPALA-1422: support a constant on LHS of IN predicates.
......................................................................


Patch Set 6:

(12 comments)

main change is to simplify the current version to not handle correlated 
subqueries + "not in". adding this support so that the rewrite here is simpler 
requires additional support for inline view references to outer blocks that are 
more than one nesting level away.

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@94
PS2, Line 94:     // TODO: derived slot refs (e.g., star-expanded) will not 
have rawPath set.
> Why can't this be addressed in this patch?
Not needed any longer... I think its useful to keep the comment around in case 
another rewrite hits the issue.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@287
PS2, Line 287:    * rewritten, null is returned. If 'inPred' is rewritten, the 
resulting expression
> ... and the RHS is a subquery.
Done


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@299
PS2, Line 299:    * 2) Predicate is NOT IN and RHS returns a single row.
> What if the RHS subquery is correlated?
correlation in the RHS is currently checked for and banned for limit queries. 
for an IN query with an aggregate, we're currently banning this, but for NOT IN 
and aggregation (interestingly) we're grouping by the join condition and 
pushing the NOT IN check to the having clause. I don't see a correctness issue 
with this. We may want to look into the IN handling to see if we're overly 
restrictive there.

the comment about correlation on L303 is invalid with regard to the change in 
this implementation, so I've removed it.


http://gerrit.cloudera.org:8080/#/c/8322/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS2, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
> Is the transformation even correct if RHS is correlated?
I think its correct, but I've removed this case since it would be more cleanly 
done using the inline view approach of 3b. I'm not using that approach here 
since we don't handle references from inline view to outer blocks that are more 
than one level away.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@312
PS6, Line 312:    *    a) No group by or analytic function in subquery.
> Ideally, we should not have to distinguish between cases 3a and 3b, and we
Simplified all this by removing the case for correlated not in. We can bring it 
back pending additional support for multi-block references.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@316
PS6, Line 316:    *    REWRITE:
> Use IFNULL() instead of CASE to simplify.
done, thanks for the suggestion.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@319
PS6, Line 319:    *     NOT EXISTS (RHS AND (C IS NULL OR (CASE WHEN e IS NULL 
THEN C ELSE e END) = C)
> I think there's another case where this rewrite is not correct: when the su
good catch-- I forgot to handle order/limit in subqueries. that said, its easy 
to change the condition to handle limit in 3b (correlation is not supported for 
those cases anyways). however, we're duplicating a bit of push-down logic which 
is cleaner *not* to do here, so these issues do not come up anymore.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@321
PS6, Line 321:    *    Example:
> The following query gives me an IllegalStateException:
this change-- thanks for finding it. its handled now.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@328
PS6, Line 328:    *    Note: it useful to think of C NOT IN (RHS) as the 
boolean expansion:
> it is useful
Done


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@340
PS6, Line 340:    *    b) Group by  or analytic function in subquery.
> extra space after "by"
Done


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@345
PS6, Line 345:    *      NOT EXISTS (SELECT x FROM (RHS) t WHERE C = t or t IS 
NOT NULL)
> What does C=t mean here?
goofed on the use of "t". re'orgd.


http://gerrit.cloudera.org:8080/#/c/8322/6/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@354
PS6, Line 354:    *     - Assumes that subquery is uncorrelated, which is 
currently a requirement.
> What does this note refer to? Above you state that "All cases apply to both
reorgd all this.



--
To view, visit http://gerrit.cloudera.org:8080/8322
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d69889a3c72e90be9d4ccf47d2816819ae32acb
Gerrit-Change-Number: 8322
Gerrit-PatchSet: 6
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 08:42:34 +0000
Gerrit-HasComments: Yes

Reply via email to