andygrove opened a new issue, #4122:
URL: https://github.com/apache/datafusion-comet/issues/4122

   ## Describe the bug
   
   On Spark 4.1.1 with Comet enabled, two `SQLQueryTestSuite` queries return 
incorrect results. The same `.sql` and golden `.out` files pass on Spark 4.0.2.
   
   ### `except-all.sql` query #22
   
   ```sql
   SELECT v FROM tab3 GROUP BY v
   EXCEPT ALL
   SELECT k FROM tab4 GROUP BY k
   ```
   
   Expected output: `3`. Actual output: `2\n3` (one extra row).
   
   ### `intersect-all.sql` query #15
   
   ```sql
   SELECT v FROM tab1 GROUP BY v
   INTERSECT ALL
   SELECT k FROM tab2 GROUP BY k
   ```
   
   Expected output: `2\n3\nNULL`. Actual output: empty result.
   
   ## Steps to reproduce
   
   Run Spark 4.1.1's SQL test suite with Comet enabled (the `Spark SQL Tests` 
matrix entry for 4.1.1). Both files fail in `SQLQueryTestSuite`.
   
   ## Expected behavior
   
   Comet should produce the same EXCEPT ALL / INTERSECT ALL results as Spark.
   
   ## Workaround
   
   Both files are currently disabled when Comet is enabled via `--SET 
spark.comet.enabled = false` at the top of each file in `dev/diffs/4.1.1.diff`.
   
   ## Additional context
   
   The input `.sql` files and golden `.out` files are byte-identical between 
Spark 4.0.2 and 4.1.1, so the regression is in either Spark planner/optimizer 
behavior or in Comet's interaction with it on 4.1. PR #4093 enables Spark 4.1.1 
in the `Spark SQL Tests` workflow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to