https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109901

--- Comment #8 from Richard Yao <richard.yao at alumni dot stonybrook.edu> ---
Created attachment 55177
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55177&action=edit
Source code for micro-benchmark.

Here is an example of how not having this optimization slows us down:

https://gcc.godbolt.org/z/nxdrb17G7

custom_binary_search_slow() and custom_binary_search_fast() are the same
function. The only difference is that I manually applied the ((((a) > (b)) -
((a) < (b))) <= 0) -> ((a) <= (b)) transformation to see what GCC would
generate if it were able to do this transformation.

This optimization alone makes binary search ~78% faster on my Ryzen 7 5800X:

Benchmark: array size: 1024, runs: 1000, repetitions: 10000, seed: 1685158101,
density: 10

Even distribution with 1024 32 bit integers, random access

|                                               Name |      Items |       Hits
|     Misses |       Time |
|                                         ---------- | ---------- | ----------
| ---------- | ---------- |
|                          custom_binary_search_slow |       1024 |        983
|       9017 |   0.000313 |
|                          custom_binary_search_fast |       1024 |        983
|       9017 |   0.000176 |

I modified the microbenchmark from scandum/binary_search to better suit a
workload that I am micro-optimizing:

https://github.com/scandum/binary_search

In specific, I wanted to test on small arrays, but avoid cache effects
contaminating the results. One could easily add the search functions from my
modified version into the original to get get numbers for bigger array sizes.

I have attached the source code for the modified micro-benchmark. The above run
was done after compiling with -O2 -fno-inline. Compiling with just -O2 does not
make much difference, since I deleted the code where -fno-inline makes a
difference from that file since it was not relevant to this issue.

Reply via email to