alamb commented on PR #8849:
URL: 
https://github.com/apache/arrow-datafusion/pull/8849#issuecomment-1902611868

   My plan for this PR is to work through the remaining items on  
https://github.com/apache/arrow-datafusion/pull/8849#pullrequestreview-1834680790
 over the next few days. If you are interested in helping @jayzhan211  that 
would be awesome
   
   Specifically, if you have time adding fuzz or memory accounting tests would 
be super helpful
   
   For fuzz testing I am thinking of adding a small unit test that makes some 
random data (both small and large strings with / without nulls) and runs the 
equivalent of `SELECT COUNT(DISTINCT random_data)` and compares the results to 
computing the same value using `std::collection::HashSet<String>`
   
   Perhaps in 
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to