Den ons 26 feb. 2025 kl 14:45 skrev Daniel Sahlberg <
daniel.l.sahlb...@gmail.com>:

> Den ons 26 feb. 2025 kl 14:30 skrev Stefan Sperling <s...@apache.org>:
>
>> On Wed, Feb 26, 2025 at 01:24:22PM +0100, Daniel Sahlberg wrote:
>> > Looking at the code, I'm assuming that if SVN_UNALIGNED_ACCESS_IS_OK was
>> > set to 0 (so the char by char loop after #endif is used instead), the
>> code
>> > would run just fine. Can you confirm that?
>> >
>> > SVN_UNALIGNED_ACCESS_IS_OK is defined as follows - we'd probably have to
>> > change that for GCC15.
>>
>> Unaligned access is undefined behaviour (UB). Compilers are now taking
>> more liberties with undefined behaviour than they used to. So we need
>> to be very careful in our assumptions about UB.
>>
>> Our code which is guarded by SVN_UNALIGNED_ACCESS_IS_OK might become
>> unsafe when built with newer compilers on any platform. This code's
>> behaviour is no longer under our control, but the compiler's s control.
>> svn_eol__find_eol_start() might crash, become a no-op, or have unknown
>> side-effects resulting in annoyances or even CVEs.
>>
>> At the very least, we should have this macro off by default and require
>> it to be enabled manually. For best safety, the code should be removed.
>> I understand that removing this code will incur a performance cost.
>> The code was backed up by elaborate performance testing when it was
>> written years ago.
>>
>> But what modern compilers make of UB will likely keep changing for the
>> worse (or better, depending on which side of the UB-fence you are on),
>> violating assumptions about platform behaviour in the macro definition:
>>
>> > #ifndef SVN_UNALIGNED_ACCESS_IS_OK
>> > # if defined(_M_IX86) || defined(i386) \
>> >      || defined(_M_X64) || defined(__x86_64) \
>> >      || defined(__powerpc__) || defined(__ppc__)
>> > #  define SVN_UNALIGNED_ACCESS_IS_OK 1
>> > # else
>> > #  define SVN_UNALIGNED_ACCESS_IS_OK 0
>> > # endif
>> > #endif
>>
>> I have seen contrived programs being translated into nothing on x86 when
>> built with a relatively recent version of clang (16) at optimization
>> levels
>> of -O2 and up. For example, the following program will literally have its
>> entire main() function elided:
>>
>> [[[
>> #include <float.h>
>> int main(void) {
>>   double v = DBL_MAX;
>>   int v2 = v;
>>   if (v2 != v) return 0;
>>   return 1;
>> }
>> ]]]
>> (courtesy of https://bsd.network/@kristapsdz/113870782376321887)
>>
>> On OpenBSD/amd64, with clang -O2 this program compiles to 3 instructions,
>> an empty function. The resulting executable just crashes when run.
>> [[[
>> (gdb) disassemble main
>>    0x0000000000001950 <+0>:     endbr64
>>    0x0000000000001954 <+4>:     push   %rbp
>>    0x0000000000001955 <+5>:     mov    %rsp,%rbp
>> ]]]
>>
>> At -01 there are 4 lines (still crashes). Without optimization (-O0)
>> there are 38 lines of assembly instead of 3, and the program exits
>> with status 0.
>>
>> Granted, what I have seen are contrived cases. But they resulted
>> from someone looking into problems while doing development in C.
>> As a C developer, this scares me enough to avoid UB at all costs
>> because I cannot trust the compiler to warn about my mistakes
>> which introduce UB. Rather, the compiler might decide to elide
>> the code and change the intended meaning of the program.
>>
>
> Thanks for the detailed explanation. It seems really ugly.
>
> Would add a separate for-loop before "one machine word at a time",
> iterating until buf is aligned (maximum sizeof(void*)-1 times) help here?
>

Should have researched a bit more before replying. There was already some
code to do exactly this, but it was removed in r1722879:

[[[
stefan2  2016-01-04 15:17:04
Stop using pointer arithmetics to check for proper alignment because
that is not portable.

As a result, platforms that don't allow unaligned data access will
suffer a small additional performance hit.

* subversion/libsvn_subr/eol.c
  (svn_eol__find_eol_start): No longer attempt aligned chunky processing
                             when unaligned access is not supported.

* subversion/libsvn_subr/utf_validate.c
  (first_non_fsm_start_char): Same.
]]]

Don't quite understand why but there is probably a reasonable explanation.


>
> Any idea of the performance gain on SVN_UNALIGNED_ACCESS_IS_OK on modern
> hardware/modern compilers?
>
> Cheers,
> Daniel
>
>

Reply via email to