It uses AVX2, so... thanks for the extra seconds. :D Cheers. Elias.
2018-06-09 0:43 GMT-03:00 Philip Guenther <[email protected]>: > On Fri, Jun 8, 2018 at 10:13 AM Elias M. Mariani <[email protected]> > wrote: >> >> I usually run long computations on OpenBSD-current, in the last few >> days I see an upgrade in the performance of the process (in this case >> I have 6 threads running a very optimized assembler code). >> Each iteration of the code was about 14 sec. and now is around 13 sec. >> Don't mind the computation, the question is more about if something >> was changed (and if so what) to hit the performance so much, the >> compilation is the same as before, so the code did not change and the >> software does not use anything from ports, only the standard C >> library. >> >> A more accurate description would be that the threads are more >> homogeneous in relation with the iterations time, maybe one thread >> suddenly give a 11 sec. result, and another 15, probably related with >> thread reallocations, now I have a steady time in all the threads and >> in average the numbers are better. >> Anyways... Better is better. Just curious to know why is working better. >> :D > > > If the "optimized assembler" that the threads are running uses AVX or > similar "extended CPU state" extensions then the improvement is almost > certainly from my switch amd64 from "lazy FPU switching" to what I'll call > "semi-eager switching", where the current thread's registers are always > ensured to be loaded before returning to userspace, eliminating the need for > extra userspace->kernel->userspace transitions and IPIs to load the > registers in the current CPU. > > Glad to hear it's such a large improvement for your processing. Enjoy, and > remember to tip your OS vendor! <wink> > > > Philip Guenther >

