fmonjalet commented on code in PR #16816: URL: https://github.com/apache/datafusion/pull/16816#discussion_r2216291759
########## datafusion/functions-aggregate/src/array_agg.rs: ########## @@ -1008,8 +1002,7 @@ mod tests { acc2.update_batch(&[data(["b", "c", "a"])])?; acc1 = merge(acc1, acc2)?; - // without compaction, the size is 2652. - assert_eq!(acc1.size(), 732); + assert_eq!(acc1.size(), 266); Review Comment: Nice! To convince myself this approach was still accurate enough, I was thinking we should compare this size to the raw data size, which I got from: ```rust let data1 = data(["a", "c", "b"]); let data2 = data(["b", "c", "a"]); println!("Size of data: {} {}", data1.get_array_memory_size(), data2.get_array_memory_size()); ``` I see that the data size is 1208 for each. _But_ it's related to excessive capacity of the Array, which I would hope does not happen too much for larger arrays (the ones that actually matter for memory accounting). I would also hope unused memory pages are not effectively allocated by the kernel anyway, so not taking memory in practice (unsure about that in this case). If we update the `data` helper to use `shrink_to_fit`: ```rust fn data<T, const N: usize>(list: [T; N]) -> ArrayRef where ScalarValue: From<T>, { let values: Vec<_> = list.into_iter().map(ScalarValue::from).collect(); let mut array = ScalarValue::iter_to_array(values).expect("Cannot convert to array"); array.shrink_to_fit(); array } ``` Then we get 139 byte per array, which is 278 total, so we are under-accounting by only 12 bytes in practice. Which sounds good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org