mustafasrepo opened a new issue, #6189:
URL: https://github.com/apache/arrow-datafusion/issues/6189

   ### Is your feature request related to a problem or challenge?
   
   For some aggregation functions input ordering changes the result emitted. In 
these cases, it is helpful to require desired ordering in the input. Consider 
query below
   ```sql
   SELECT (ARRAY_AGG(s.amount ORDER BY s.sn DESC)) AS amounts
               FROM sales_global AS s
               GROUP BY s.ts
   ```
   This query successfully runs in Postgre.
   However, in datafusion it returns following message.
   `NotImplemented("ORDER BY not supported in ARRAY_AGG: s.sn DESC")`
   
   ### Describe the solution you'd like
   
   I would like to have this support. With this feature in place, we can add 
new aggregate functions that makes sense with this feature, such as `FIRST`/ 
`FIRST_VALUE`, `LAST`/ `LAST_VALUE`  etc.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   To reproduce problem, you can use test below.
   ```sql
   #[tokio::test]
   async fn test_ordered_aggregation() -> Result<()> {
       let config = SessionConfig::new()
           .with_target_partitions(1);
       let ctx = SessionContext::with_config(config);
       ctx.sql("CREATE TABLE sales_us (
         ts TIMESTAMP,
         currency VARCHAR(3),
         amount INT
       ) as VALUES
             ('2022-01-01 10:00:00'::timestamp, 'USD', 100.00),
             ('2022-01-01 11:00:00'::timestamp, 'USD', 200.00),
             ('2022-01-02 09:00:00'::timestamp, 'USD', 300.00),
             ('2022-01-02 10:00:00'::timestamp, 'USD', 150.00)").await?;
       ctx.sql("CREATE TABLE sales_global (
         ts TIMESTAMP,
         currency VARCHAR(3),
         amount INT
       ) as VALUES
         ('2022-01-01 08:00:00'::timestamp, 'EUR', 50.00),
         ('2022-01-01 11:30:00'::timestamp, 'EUR', 75.00),
         ('2022-01-02 12:00:00'::timestamp, 'EUR', 200.00),
         ('2022-01-03 10:00:00'::timestamp, 'EUR', 100.00)").await?;
       let sql = "SELECT (ARRAY_AGG(s.amount ORDER BY s.sn DESC))[1] AS amounts
           FROM sales_global AS s
           GROUP BY s.ts";
   
       let msg = format!("Creating logical plan for '{sql}'");
       let dataframe: DataFrame = ctx.sql(sql).await.expect(&msg);
       let physical_plan = dataframe.create_physical_plan().await?;
       let batches = collect(physical_plan, ctx.task_ctx()).await?;
       print_batches(&batches)?;
       Ok(())
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to