GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/20283

    [SPARK-23095][SQL] Decorrelation of scalar subquery fails with 
java.util.NoSuchElementException

    ## What changes were proposed in this pull request?
    The following SQL involving scalar correlated query returns a map exception.
    ``` SQL
    SELECT t1a
    FROM   t1
    WHERE  t1a = (SELECT   count(*)
                  FROM     t2
                  WHERE    t2c = t1c
                  HAVING   count(*) >= 1)
    ```
    ``` SQL
    key not found: ExprId(278,786682bb-41f9-4bd5-a397-928272cc8e4e)
    java.util.NoSuchElementException: key not found: 
ExprId(278,786682bb-41f9-4bd5-a397-928272cc8e4e)
            at scala.collection.MapLike$class.default(MapLike.scala:228)
            at scala.collection.AbstractMap.default(Map.scala:59)
            at scala.collection.MapLike$class.apply(MapLike.scala:141)
            at scala.collection.AbstractMap.apply(Map.scala:59)
            at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$.org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$evalSubqueryOnZeroTups(subquery.scala:378)
            at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$$anonfun$org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$constructLeftJoins$1.apply(subquery.scala:430)
            at 
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery$$anonfun$org$apache$spark$sql$catalyst$optimizer$RewriteCorrelatedScalarSubquery$$constructLeftJoins$1.apply(subquery.scala:426)
    ```
    
    In this case, after evaluating the HAVING clause "count(*) > 1" statically
    against the binding of aggregtation result on empty input, we determine
    that this query will not have a the count bug. We should simply return
    the evalSubqueryOnZeroTups with empty value.
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    A new test was added in the Subquery bucket.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark scalar-count-defect

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20283
    
----
commit 5fa80dec8e2aee07b5c04f7ad01abaccae3b6aff
Author: Dilip Biswal <dbiswal@...>
Date:   2017-09-21T23:12:22Z

    [SPARK-23095] Decorrelation of scalar subquery fails with 
java.util.NoSuchElementException.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to