https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110001
Bug ID: 110001 Summary: [13 regression] Suboptimal code generation for branchless binary search Product: gcc Version: 13.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: richard.yao at alumni dot stonybrook.edu Target Milestone: --- GCC 12.3 generated beautiful code for this, with all but the last of the unrolled loop iterations using only 3 instructions: https://gcc.godbolt.org/z/eGbEj9YKd Currently, GCC generates 4 instructions: https://gcc.godbolt.org/z/Ebczq8jjx This probably does not make a huge difference given the data hazard, but there is something awe-inspiring from seeing GCC generate only 3 instructions per unrolled loop iteration for binary search. It would be nice if future versions went back to generating three instructions. This function was inspired by this D code: https://godbolt.org/z/5En7xajzc The bsearch1000() function is entirely branchless and has no more than 2 instructions for every cmov, excluding ret. I wrote a more general version in C that can handle variable array sizes, and to my pleasant surprise, GCC 12.3 generated a similar 3 instruction sequence for all but the last of the unrolled loop iterations. I was saddened when I saw the output from GCC 13.1 and trunk. Anyway, all recent versions of GCC that I cared to check generate a branch for the last unrolled iteration on line 58. That branch is unpredictable, so GCC would generate better code here if it used predication to avoid the branch. I had been able to give GCC a hint to avoid a similar branch at the end using __builtin_expect_with_probability(), but that trick did not work for line 58. Also, if anyone is interested in where that D code originated, here is the source: https://muscar.eu/shar-binary-search-meta.html