ClSlaid commented on PR #9755: URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4564841279
Benchmark update for the latest squashed patch (`c736cc511`): - I kept the dense-filter fallback in place. I also tried removing that fallback and splitting the selection between direct copy and `take`, but that made mixed inline-view dense cases regress heavily, so that experiment was reverted. - Command shape: `cargo bench -p arrow --features test_utils --bench coalesce_kernels -- ... --sample-size 20 --warm-up-time 1 --measurement-time 2`. - Main comparison uses `apache/main` at `e470187b9` with the same benchmark definitions as this patch. - Times are ns/iter; lower is better. These rows use `null_density=0` and `max_string_len=8`. | case | selectivity | main | patch | speedup | | --- | ---: | ---: | ---: | ---: | | mixed_binaryview | 0.001 | 19,256,493 | 6,034,896 | 3.19x | | mixed_binaryview | 0.01 | 1,755,147 | 882,959 | 1.99x | | mixed_binaryview | 0.1 | 587,201 | 444,630 | 1.32x | | mixed_binaryview | 0.8 | 577,103 | 545,535 | 1.06x | | single_binaryview | 0.001 | 27,373,114 | 10,000,552 | 2.74x | | single_binaryview | 0.01 | 2,176,479 | 1,118,766 | 1.95x | | single_binaryview | 0.1 | 596,890 | 418,494 | 1.43x | | single_binaryview | 0.8 | 723,025 | 715,278 | 1.01x | | mixed_utf8view | 0.001 | 19,190,003 | 6,027,227 | 3.18x | | mixed_utf8view | 0.01 | 1,787,177 | 898,127 | 1.99x | | mixed_utf8view | 0.1 | 590,240 | 446,294 | 1.32x | | mixed_utf8view | 0.8 | 657,013 | 609,744 | 1.08x | | single_utf8view | 0.001 | 26,815,635 | 10,005,646 | 2.68x | | single_utf8view | 0.01 | 2,278,597 | 1,204,443 | 1.89x | | single_utf8view | 0.1 | 585,180 | 432,044 | 1.35x | | single_utf8view | 0.8 | 616,433 | 728,400 | 0.85x | Summary: the patch is mainly targeting sparse inline Utf8View/BinaryView filters. The gains are strongest at 0.1%-10% selectivity. Dense filters remain on the existing `filter_record_batch` path because it is still the safer/faster choice for that region. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
