wesm opened a new pull request #7358: URL: https://github.com/apache/arrow/pull/7358
The idea of this patch is to provide a more comprehensive baseline for the optimization work I'm undertaking. Summary: * Benchmark take when indices are monotonic and contain no nulls. Monotonic takes perform much faster because it accesses memory consecutively rather than at random * Test null percentages down to 0.01% (1% is even a lot of nulls, and obscures behavior between 1% and 0%). * Benchmark indices/filter-mask with and without nulls, because there may be faster code paths for the no-nulls case * Benchmark when values being taken/filtered are all not null * Benchmark filtering/taking smaller strings. The benchmarks were doing strings of size 0 to 128 -- realistic workloads generally will be working with smaller strings, so I set a range instead of 0 to 32 with 16 the average ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
