[GitHub] spark pull request #22696: [SPARK-25708][SQL] HAVING without GROUP BY means ...

hvanhovell Thu, 11 Oct 2018 13:19:07 -0700

Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22696#discussion_r224590474
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1894,6 +1894,8 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
     
       - In PySpark, when creating a `SparkSession` with 
`SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, 
the builder was trying to update the `SparkConf` of the existing `SparkContext` 
with configurations specified to the builder, but the `SparkContext` is shared 
by all `SparkSession`s, so we should not update them. Since 3.0, the builder 
come to not update the configurations. This is the same behavior as Java/Scala 
API in 2.3 and above. If you want to update them, you need to update them prior 
to creating a `SparkSession`.
     
    +  - In Spark version 2.4 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 3.0. Since Spark 3.0, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row.
    --- End diff --
    
    You will need to feature flag it if you port it to 2.4. People might rely 
on its current behavior.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22696: [SPARK-25708][SQL] HAVING without GROUP BY means ...

Reply via email to