ni...@lysator.liu.se (Niels Möller) writes: What about the test in #define TMP_ALLOC(n) \ (LIKELY ((n) < 65536) ? TMP_SALLOC(n) : TMP_BALLOC(n)) That test will cost a cycle or two for each TMP_ALLOC call (with non-constant n), regardless of size, won't it? I think my previous statement "1 cycle" should be amended to "2 cycles".
A correctly predicted compare-and-branch cost 1-2 cycles, with a throughput of 1 per cycle (on any modern machine). The allocation code will run in parallel with the branch (assuming again correct prediction). I cannot see how TMP_ALLOC_LIMBS_2 could save *anything* for small allocations, since it basically performs the same operations. I.e., the net cost of splitting TMP_ALLOC_LIMBS_2 into two TMP_ALLOC_LIMBS is 0. But it might be +-1 depending on alignment and all sorts of magic. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel