Albin Ahlbäck <albin.ahlb...@gmail.com> writes: > I see, I definitely need to read up on the CPU pipelines. I also > tested one of your automated scripts for measuring cycles per limbs > for a variety of functions, and it checks out.
As I understand it, there are three main things that limit the performance of the kinds of assembly loops we have in gmp (most operating on small well cached data): 1. Recurrency latency, like the carry dependency Torbjörn explained. Basically, the sum of latencies for the chain of instructions representing how carry out depends on carry in. 2. Throughput of particular execution units, e.g, how many multiply instructions that can be started per cycle (to be completed some cycles later, depending on the latency of the multiplication pipeline). 3. Instruction issue, the processor's limit on how many instructions can be decoded and started each cycle. E.g., if processor can start 3 instructions per cycle in parallel, and your loop is 15 instructions, it's not going to be faster than 5 cycles. For an assembly loop, one can find out from properties of the processor what cycle counts are implied by these three limits. It's often possible (but tedious) to tweak scheduling to get an actual speed pretty close to the limit. And it aids optimization to understand which one is the performance bottleneck. > Anyway, in regards to the performance of multiplication: I did manage > to write some half-hardcoded that outperforms the mpn_mul_basecase > quite a bit on Apple M1 (only tested on the Mac Mini on cfarm). They > are basically on the form > > mpn_mul_N(mp_ptr, mp_srcptr, mp_size_t, mp_srcptr) > > for N in 1, 2, ..., 15. I would expect the speed of such a hard-coded function to be limited by multiplier throughput (O(N^2)); it should be possible to arrange the order you add up the N^2 terms so that your carry chain corresponds to the size of the product (O(N)). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel