Re: [PATCH] AArch64: Cleanup alignment macros

Wilco Dijkstra Fri, 06 Dec 2024 11:46:00 -0800

Hi Richard,

>> A common case is a constant string which is compared against some
>> argument. Most string functions work on 8 or 16-byte quantities. If we
>> ensure the whole array fits in one aligned load, we save time in the
>> string function.
>>
>> Runtime data collected for strlen calls shows 97+% has 8-byte alignment
>> or higher - this kind of overalignment helps achieving that.
>
> Ah, ok.  But aren't we then losing that advantage for 4-byte arrays?
> Or are you assuming a 4-byte path too?  Or is strlen just very unlikely
> for such small data?


The advantage comes from being aligned enough. Eg. a strlen implementation
may start like this:

        bic     src, srcin, 15
        ld1     {vdata.16b}, [src]                          // 16-byte aligned 
load
        cmeq    vhas_nul.16b, vdata.16b, 0  // check for NUL byte

It always does a 16-byte aligned load and test for the end of the string. So we 
want
to ensure that small strings fully fit inside the first 16-byte load (if not, 
it takes almost
twice the number of instructions even if the string is only 4 bytes). 4-byte 
alignment
is enough to ensure this.

Another approach is to always load the first 16 bytes from the start of the 
string
(if not close to the end of a page). That is often an unaligned load, and then 
the
difference between 4- and 8-byte alignment is negligible.

Cheers,
Wilco

Re: [PATCH] AArch64: Cleanup alignment macros

Reply via email to