I created an example that is a little bit closer to the actual code and changed the compiler from C++ to C.
It is interesting the optimization that the compiler has chosen for version 1 versus version 2. One calls memcpy and one doesn't. There is a good chance the inlining of memcpy as SSE+scalar per iteration will be faster for syscache scans-- which I believe are usually small (1-4 keys?). Probably the only reason to do this patch would be if N is normally large or if this is considered an improvement in code clarity without a detrimental impact on small N syscache scans. I realize you only said "possible small optimization". It might be worthwhile to benchmark the code for different values of n to determine if there is a tipping point either way? https://godbolt.org/z/dM18cGfE6 -- bg On Mon, Mar 9, 2026 at 8:05 AM Ranier Vilela <[email protected]> wrote: > > Em seg., 9 de mar. de 2026 às 10:16, Ranier Vilela <[email protected]> > escreveu: > >> Hi. >> >> In the functions *systable_beginscan* and *systable_beginscan_ordered*, >> is possible a small optimization. >> The array *idxkey* can be constructed in one go with a single call to >> mempcy. >> The excess might not make much of a difference, but I think it's worth >> the effort. >> >> patch attached. >> > Someone asked me if O2 does not do the work. > Apparently not. > > https://godbolt.org/z/h5dndz33x > > best regards, > Ranier Vilela >
