Re: clang performance bug is worse on openbsd than freebsd

Stuart Henderson Mon, 08 Nov 2021 08:30:37 -0800

On 2021-11-08, Theo de Raadt <dera...@openbsd.org> wrote:
> Stuart Henderson <stu.li...@spacehopper.org> wrote:
>
>> On 2021-11-08, Otto Moerbeek <o...@drijf.net> wrote:
>> > On Sun, Nov 07, 2021 at 08:13:36PM -0600, Luke Small wrote:
>> >
>> >> https://bugs.llvm.org/show_bug.cgi?id=50026
>> >> 
>> >> I reported it to the llvm people. it is two slightly different quicksort
>> >> algorithms which perform radically differently. The one which you could
>> >> assume would take more time, performs MUCH better.
>> >> 
>> >> I made a custom quicksort algorithm which outperforms qsort by A LOT for
>> >> sorting an array of around 300 randomly created unsigned characters, which
>> >> is what I use it for.
>> >> 
>> >> if you read the report a guy said there's a 10% difference for sorting 3
>> >> million characters on freebsd, but there's about 40% performance 
>> >> difference
>> >> on OpenBSD. maybe it's also how the OpenBSD team modified clang to prevent
>> >> rop chain stuff or something? I'm using a westmere based intell server.
>> >> 
>> >> it's the same for clang 11.
>> >> 
>> >> What's the deal?
>> >
>> > 1. Your test is too small. I increased the test size to get less error
>> > in measurement. I also changed the code to either run quicksort or
>> > quicksort depending on an argument.
>> >
>> > 2. I confirmed your observation using time on amd64, arm64 however
>> > shows a difference more in line with expectations:
>> >
>> > [otto@mc:8]$ time ./a.out A
>> > quicksort
>> > time = 0.335777021
>> >     0m00.35s real     0m00.35s user     0m00.01s system
>> > [otto@mc:9]$ time ./a.out B 
>> > quicksort1
>> > time = 0.345703033
>> >     0m00.36s real     0m00.36s user     0m00.01s system
>> >
>> >
>> > I suspect some non-optimal decision of the optimizer on amd64 for the
>> > first quicksort.
>> >
>> > A next step could be somebody looking at the code produced by the
>> > compiler.  That is not going to be me.
>> 
>> This is another example of code quite badly affected by fixup-gadgets.
>> With an increased test size and building separate objects for each of
>> the two tests and with/without CFLAGS=-fno-fixup-gadgets:
>> 
>> $ for i in quicksort*; do echo $i; time ./$i; done                           
>>     
>> quicksort
>>     0m02.33s real     0m02.32s user     0m00.00s system
>> quicksort-no-fixup-gadgets
>>     0m01.29s real     0m01.27s user     0m00.00s system
>> quicksort1
>>     0m01.06s real     0m00.94s user     0m00.02s system
>> quicksort1-no-fixup-gadgets
>>     0m01.08s real     0m00.87s user     0m00.01s system
>> 
>> 
>
> The word everyone is looking for is "tradeoff", meaning this is intentional.



Yes. But we have seen fixup-gadgets making a suboptimal choice before
and it was improved. Maybe there's nothing more that can be done,
maybe someone proficient in X86 asm will spot something - much easier
to compare the two now we know exactly what's affecting this particular
code and that it can be toggled with a compiler flag.

Re: clang performance bug is worse on openbsd than freebsd

Reply via email to