timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2124676884
Thank you. I pulled your branch and many of the tests are failing for me even though **the functions are returning correct values** when I add additional debug statements. I think what's happening here is that because we have the partition_by there is no guarantee what order the results come back as. On my machine the unit tests are returning the partitions on column C in order 10 then 1. I'm guessing on yours it was the opposite. There are a couple of things I think we can do to resolve this. One way would be to make a new macro for testing these partitioned functions. I could do something like ``` macro_rules! assert_sorted_fn_batches { ($EXPR:expr, $EXPECTED: expr, $SORTBY: expr) => { let df = create_test_table().await?; let df = df.select($EXPR)?.sort($SORTBY)?.limit(0, Some(10))?; let batches = df.collect().await?; assert_batches_eq!($EXPECTED, &batches); }; } ``` And then the lead function test would become ``` async fn test_fn_lead() -> Result<()> { let expr = lead(col("b"), Some(1), Some(ScalarValue::Int32(Some(-1)))) .with_partition_by(vec![col("c")]) .with_order_by(vec![col("b").sort(true, true)]) .build() .alias("lead_b"); let expected = [ "+----+--------+", "| c | lead_b |", "+----+--------+", "| 1 | 10 |", "| 1 | 10 |", "| 1 | -1 |", "| 10 | -1 |", "+----+--------+", ]; let select_expr = vec![col("c"), expr]; let sort_by = vec![col("c").sort(true, true)]; assert_sorted_fn_batches!(select_expr, expected, sort_by); Ok(()) } ``` I've added an `alias` just because I think it makes the test more readable. If we wanted to get *really* explicit we could also output column A, and sort by columns A and C *then* we would have guaranteed the correctness because each row would be unique. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org