On Fri, Jul 5, 2024, at 18:42, Joel Jacobson wrote: > Very nice, v7-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch > is now the winner on all my CPUs:
I thought it would be interesting to also measure the isolated effect on just numeric_mul() without the query overhead. Included var1ndigits=5 var2ndigits=5, that should be unaffected, just to get a sense of the noise level. SELECT timeit.h('numeric_mul',array['9999','9999'],2,min_time:='1 s'::interval); SELECT timeit.h('numeric_mul',array['9999_9999','9999_9999'],2,min_time:='1 s'::interval); SELECT timeit.h('numeric_mul',array['9999_9999_9999','9999_9999_9999'],2,min_time:='1 s'::interval); SELECT timeit.h('numeric_mul',array['9999_9999_9999_9999','9999_9999_9999_9999'],2,min_time:='1 s'::interval); SELECT timeit.h('numeric_mul',array['9999_9999_9999_9999_9999','9999_9999_9999_9999_9999'],2,min_time:='1 s'::interval); CPU | var1ndigits | var2ndigits | HEAD | v7 | HEAD/v7 ---------------------+-------------+-------------+-------+-------+--------- Apple M3 Max | 1 | 1 | 28 ns | 18 ns | 1.56 Apple M3 Max | 2 | 2 | 32 ns | 18 ns | 1.78 Apple M3 Max | 3 | 3 | 38 ns | 21 ns | 1.81 Apple M3 Max | 4 | 4 | 42 ns | 24 ns | 1.75 Intel Core i9-14900K | 1 | 1 | 25 ns | 20 ns | 1.25 Intel Core i9-14900K | 2 | 2 | 28 ns | 20 ns | 1.40 Intel Core i9-14900K | 3 | 3 | 33 ns | 24 ns | 1.38 Intel Core i9-14900K | 4 | 4 | 37 ns | 25 ns | 1.48 AMD Ryzen 9 7950X3D | 1 | 1 | 37 ns | 29 ns | 1.28 AMD Ryzen 9 7950X3D | 2 | 2 | 43 ns | 31 ns | 1.39 AMD Ryzen 9 7950X3D | 3 | 3 | 50 ns | 37 ns | 1.35 AMD Ryzen 9 7950X3D | 4 | 4 | 55 ns | 39 ns | 1.41 Impressive speed-up, between 25% - 81%. Regards, Joel