alamb commented on PR #8849: URL: https://github.com/apache/arrow-datafusion/pull/8849#issuecomment-1902611868
My plan for this PR is to work through the remaining items on https://github.com/apache/arrow-datafusion/pull/8849#pullrequestreview-1834680790 over the next few days. If you are interested in helping @jayzhan211 that would be awesome Specifically, if you have time adding fuzz or memory accounting tests would be super helpful For fuzz testing I am thinking of adding a small unit test that makes some random data (both small and large strings with / without nulls) and runs the equivalent of `SELECT COUNT(DISTINCT random_data)` and compares the results to computing the same value using `std::collection::HashSet<String>` Perhaps in https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
