Em seg., 9 de mar. de 2026 às 11:47, Bryan Green <[email protected]>
escreveu:

> I created an example that is a little bit closer to the actual code and
> changed the compiler from C++ to C.
>
> It is interesting the optimization that the compiler has chosen for
> version 1 versus version 2.  One calls
> memcpy and one doesn't.  There is a good chance the inlining of memcpy as
> SSE+scalar per iteration
> will be faster for syscache scans-- which I believe are usually small (1-4
> keys?).
>
I doubt the inline version is better.
Clang is supported too and the code generated is much better with memcpy
one call outside of the loop.


>
> Probably the only reason to do this patch would be if N is normally large
> or if this is considered an
> improvement in code clarity without a detrimental impact on small N
> syscache scans.
> I realize you only said "possible small optimization".  It might be
> worthwhile to benchmark the code for
> different values of n to determine if there is a tipping point either way?
>
 In your opinion, shouldn't this be considered an optimization, even a
small one?

best regards,
Ranier Vilela

Reply via email to