Em seg., 9 de mar. de 2026 às 11:47, Bryan Green <[email protected]> escreveu:
> I created an example that is a little bit closer to the actual code and > changed the compiler from C++ to C. > > It is interesting the optimization that the compiler has chosen for > version 1 versus version 2. One calls > memcpy and one doesn't. There is a good chance the inlining of memcpy as > SSE+scalar per iteration > will be faster for syscache scans-- which I believe are usually small (1-4 > keys?). > I doubt the inline version is better. Clang is supported too and the code generated is much better with memcpy one call outside of the loop. > > Probably the only reason to do this patch would be if N is normally large > or if this is considered an > improvement in code clarity without a detrimental impact on small N > syscache scans. > I realize you only said "possible small optimization". It might be > worthwhile to benchmark the code for > different values of n to determine if there is a tipping point either way? > In your opinion, shouldn't this be considered an optimization, even a small one? best regards, Ranier Vilela
