Leona Yoda created SPARK-37387:
----------------------------------
Summary: Allow nondeterministic expression in aggregate function
Key: SPARK-37387
URL: https://issues.apache.org/jira/browse/SPARK-37387
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.0
Reporter: Leona Yoda
Nondeterministic expression in aggregate function is not allow in spark, so we
cannot execute query like
{code:java}
SELECT COUNT(RANDOM());
{code}
and raise {{nondeterministic expression ... should not appear in the arguments
of an aggregate function. }}error message.
[related code
section|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L298]
Hence other DB like PostgreSQL, we can call the SQL.
{code:java}
postgres=# SELECT COUNT(RANDOM());
count
-------
1
(1 row) {code}
I tried to remove the error message section, then I found spark could execute
the query.
{code:java}
scala> spark.sql("SELECT COUNT(RANDOM())").show()
+-------------+
|count(rand())|
+-------------+
| 1|
+-------------+ {code}
It could be useful for spark users to be able to execute those kinds of queries
because they can simply call
{code:java}
spark.sql("SELECT COUNT(DISTINCT(INPUT_FILE_NAME())) FROM table WHERE ...")
{code}
to find target files, for example.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]