adriangb commented on code in PR #17606: URL: https://github.com/apache/datafusion/pull/17606#discussion_r2354280460
########## datafusion/functions-aggregate/src/count.rs: ########## @@ -746,12 +746,25 @@ fn null_count_for_multiple_cols(values: &[ArrayRef]) -> usize { /// more efficient such as [`PrimitiveDistinctCountAccumulator`] and /// [`BytesDistinctCountAccumulator`] #[derive(Debug)] -struct DistinctCountAccumulator { +pub struct DistinctCountAccumulator { values: HashSet<ScalarValue, RandomState>, Review Comment: Or could we use the information we already have? E.g. every time we add a value to our _existing_ hash tables we check if it was already there or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org