ClSlaid commented on PR #9755:
URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4564841279

   Benchmark update for the latest squashed patch (`c736cc511`):
   
   - I kept the dense-filter fallback in place. I also tried removing that 
fallback and splitting the selection between direct copy and `take`, but that 
made mixed inline-view dense cases regress heavily, so that experiment was 
reverted.
   - Command shape: `cargo bench -p arrow --features test_utils --bench 
coalesce_kernels -- ... --sample-size 20 --warm-up-time 1 --measurement-time 2`.
   - Main comparison uses `apache/main` at `e470187b9` with the same benchmark 
definitions as this patch.
   - Times are ns/iter; lower is better. These rows use `null_density=0` and 
`max_string_len=8`.
   
   | case | selectivity | main | patch | speedup |
   | --- | ---: | ---: | ---: | ---: |
   | mixed_binaryview | 0.001 | 19,256,493 | 6,034,896 | 3.19x |
   | mixed_binaryview | 0.01 | 1,755,147 | 882,959 | 1.99x |
   | mixed_binaryview | 0.1 | 587,201 | 444,630 | 1.32x |
   | mixed_binaryview | 0.8 | 577,103 | 545,535 | 1.06x |
   | single_binaryview | 0.001 | 27,373,114 | 10,000,552 | 2.74x |
   | single_binaryview | 0.01 | 2,176,479 | 1,118,766 | 1.95x |
   | single_binaryview | 0.1 | 596,890 | 418,494 | 1.43x |
   | single_binaryview | 0.8 | 723,025 | 715,278 | 1.01x |
   | mixed_utf8view | 0.001 | 19,190,003 | 6,027,227 | 3.18x |
   | mixed_utf8view | 0.01 | 1,787,177 | 898,127 | 1.99x |
   | mixed_utf8view | 0.1 | 590,240 | 446,294 | 1.32x |
   | mixed_utf8view | 0.8 | 657,013 | 609,744 | 1.08x |
   | single_utf8view | 0.001 | 26,815,635 | 10,005,646 | 2.68x |
   | single_utf8view | 0.01 | 2,278,597 | 1,204,443 | 1.89x |
   | single_utf8view | 0.1 | 585,180 | 432,044 | 1.35x |
   | single_utf8view | 0.8 | 616,433 | 728,400 | 0.85x |
   
   Summary: the patch is mainly targeting sparse inline Utf8View/BinaryView 
filters. The gains are strongest at 0.1%-10% selectivity. Dense filters remain 
on the existing `filter_record_batch` path because it is still the safer/faster 
choice for that region.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to