alamb commented on PR #18488: URL: https://github.com/apache/datafusion/pull/18488#issuecomment-3492073723
> In this PR, the extra overhead is for each batch, count the distinct_count for a sorted vector like [0,1,1,2,2,2...] up to batch size long, it seem shouldn't be the bottleneck. (@alamb Could you help trigger then benchmark please?) Kicked it off -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
