[jira] [Created] (SPARK-37387) Allow nondeterministic expression in aggregate function

Leona Yoda (Jira) Thu, 18 Nov 2021 23:07:52 -0800

Leona Yoda created SPARK-37387:
----------------------------------

             Summary: Allow nondeterministic expression in aggregate function
                 Key: SPARK-37387
                 URL: https://issues.apache.org/jira/browse/SPARK-37387
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Leona Yoda



Nondeterministic expression in aggregate function is not allow in spark, so we 
cannot execute query like

 
{code:java}
SELECT COUNT(RANDOM());
{code}
 

and raise {{nondeterministic expression ... should not appear in the arguments 
of an aggregate function. }}error message.
[related code 
section|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L298]

Hence other DB like PostgreSQL, we can call the SQL.

 
{code:java}
postgres=# SELECT COUNT(RANDOM());
 count
-------
     1
(1 row) {code}
 

I tried to remove the error message section, then I found spark could execute 
the query. 

 
{code:java}
scala> spark.sql("SELECT COUNT(RANDOM())").show()
+-------------+
|count(rand())|
+-------------+
|            1|
+-------------+ {code}
 

 

It could be useful for spark users to be able to execute those kinds of queries 
because they can simply call
{code:java}
spark.sql("SELECT COUNT(DISTINCT(INPUT_FILE_NAME())) FROM table WHERE ...") 
{code}
to find target files, for example.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-37387) Allow nondeterministic expression in aggregate function

Reply via email to