alamb commented on code in PR #10526: URL: https://github.com/apache/datafusion/pull/10526#discussion_r1602085179
########## datafusion/sqllogictest/test_files/aggregate.slt: ########## @@ -198,6 +198,73 @@ statement error This feature is not implemented: LIMIT not supported in ARRAY_AG SELECT array_agg(c13 LIMIT 1) FROM aggregate_test_100 +# Test distinct aggregate function with merge batch Review Comment: I ran the test without the changes in this PR and it panic'd as expected: ```shell cargo test --test sqllogictests ... thread 'tokio-runtime-worker' panicked at datafusion/physical-expr/src/aggregate/array_agg_distinct.rs:158:9: assertion `left == right` failed: state array should only include 1 row! left: 5 right: 1 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` ########## datafusion/sqllogictest/test_files/aggregate.slt: ########## @@ -198,6 +198,73 @@ statement error This feature is not implemented: LIMIT not supported in ARRAY_AG SELECT array_agg(c13 LIMIT 1) FROM aggregate_test_100 +# Test distinct aggregate function with merge batch +query II +with A as ( + select 1 as id, 2 as foo + UNION ALL + select 1, null + UNION ALL + select 1, null + UNION ALL + select 1, 3 + UNION ALL + select 1, 2 + ---- The order is non-deterministic, verify with length +) select array_length(array_agg(distinct a.foo)), sum(distinct 1) from A a group by a.id; +---- +3 1 + +# It has only AggregateExec with FinalPartitioned mode, so `merge_batch` is used +# If the plan is changed, whether the `merge_batch` is used should be verified to ensure the test coverage Review Comment: 💯 for the explanatory comments about why this explain is here ########## datafusion/physical-expr/src/aggregate/array_agg_distinct.rs: ########## @@ -153,12 +153,11 @@ impl Accumulator for DistinctArrayAggAccumulator { return Ok(()); } - let array = &states[0]; - - assert_eq!(array.len(), 1, "state array should only include 1 row!"); - // Unwrap outer ListArray then do update batch - let inner_array = array.as_list::<i32>().value(0); - self.update_batch(&[inner_array]) + states[0] + .as_list::<i32>() + .iter() + .flatten() Review Comment: what does the flatten do? Discard `Option`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org