NGA-TRAN commented on issue #9586:
URL: 
https://github.com/apache/arrow-datafusion/issues/9586#issuecomment-1997711716

   Many thanks to @erratic-pattern who has identified the PR that introduce:
   
   Issue is related to [this 
PR](https://github.com/apache/arrow-datafusion/pull/9234) for array aggregate 
order and distinct.
   
   the commit immediately before it works correctly with the above queries. 
   
   in arrow-datafusion:
   ```
   cd datafusion-cli
   git checkout 0e728fce0a1a87567979bc74ebb64951b0fd9ac8
   cargo build
   ./target/debug/datafusion-cli -f ../bug.sql
   DataFusion CLI v36.0.0
   +---------------+------------+------------+
   | servers_count | pool_count | datacenter |
   +---------------+------------+------------+
   | 1             | 1          | mn         |
   | 4             | 3          | va         |
   +---------------+------------+------------+
   ```  
   if you then try to run the same query after the above PR, you get the 
incorrect result:
   
   ```
   git checkout fc84a639fca7716e529384c0e919fb90b75139da
   cargo build
   ./target/debug/datafusion-cli -f ../bug.sql
   DataFusion CLI v36.0.0
   +---------------+------------+------------+
   | servers_count | pool_count | datacenter |
   +---------------+------------+------------+
   | 1             | 1          | mn         |
   | 3             | 2          | va         |
   +---------------+------------+------------+
   2 rows in set. Query took 0.534 seconds.
   ```
   bug.sql:
   ```sql
   SELECT  COUNT(DISTINCT host) AS servers_count, count(distinct pool) as 
pool_count, datacenter from '/tmp/file.parquet' WHERE time >= 
'2024-02-25T00:00:00Z' and time < '2024-02-25T00:00:01Z' and server_role = 
'mesg' GROUP BY datacenter;
   ```
   
   We are working to share the file
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to