timsaucer commented on issue #6747:
URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2124676884

   Thank you. I pulled your branch and many of the tests are failing for me 
even though **the functions are returning correct values** when I add 
additional debug statements. I think what's happening here is that because we 
have the partition_by there is no guarantee what order the results come back 
as. On my machine the unit tests are returning the partitions on column C in 
order 10 then 1. I'm guessing on yours it was the opposite.
   
   There are a couple of things I think we can do to resolve this. One way 
would be to make a new macro for testing these partitioned functions. I could 
do something like
   
   ```
   macro_rules! assert_sorted_fn_batches {
       ($EXPR:expr, $EXPECTED: expr, $SORTBY: expr) => {
           let df = create_test_table().await?;
           let df = df.select($EXPR)?.sort($SORTBY)?.limit(0, Some(10))?;
           let batches = df.collect().await?;
   
           assert_batches_eq!($EXPECTED, &batches);
       };
   }
   ```
   
   And then the lead function test would become
   
   ```
   
   async fn test_fn_lead() -> Result<()> {
   
       let expr = lead(col("b"), Some(1), Some(ScalarValue::Int32(Some(-1))))
           .with_partition_by(vec![col("c")])
           .with_order_by(vec![col("b").sort(true, true)])
           .build()
           .alias("lead_b");
   
       let expected = [
           "+----+--------+",
           "| c  | lead_b |",
           "+----+--------+",
           "| 1  | 10     |",
           "| 1  | 10     |",
           "| 1  | -1     |",
           "| 10 | -1     |",
           "+----+--------+",
       ];
   
       let select_expr = vec![col("c"), expr];
       let sort_by = vec![col("c").sort(true, true)];
   
       assert_sorted_fn_batches!(select_expr, expected, sort_by);
   
       Ok(())
   }
   ```
   
   I've added an `alias` just because I think it makes the test more readable. 
If we wanted to get *really* explicit we could also output column A, and sort 
by columns A and C *then* we would have guaranteed the correctness because each 
row would be unique.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to