Hi 

Some timings for bobcat in svn size=1000 limbs

./benchmpn 
cpu                     bobcat
                   add_n        2538
                   sub_n        2528
                   mul_1        5064    this is the 1-way unrolled core2 code
                addmul_1        5308    again this is core2 code
                submul_1        5292
                   mul_2
                addmul_2
                submul_2
                addadd_n        3715
                addsub_n        3696
                subadd_n        3719
                  lshift        3784
                  rshift        3546
                 lshift2        2536
                 rshift2        2776
                 lshift1        1912
                 rshift1        3546
               addlsh1_n        3293
               sublsh1_n        3541
                addlsh_n        4913
                sublsh_n        4918
                inclsh_n        4926
                declsh_n        4915
               rsh1add_n        3027
               rsh1sub_n        3033
               sumdiff_n        4715
                   store        1157
                   copyi        1786
                   copyd        1659
               rsblsh1_n
               addlsh2_n
               rsblsh2_n
                popcount        5876            nehalem code
                 hamdist        6784            k10 code
                     com        1789
                     not        1118
                   and_n        2531
                   xor_n        2530
                   ior_n        2542
                  nand_n        2537
                  nior_n        2544
                  xnor_n        2546
                  andn_n        2533
                  iorn_n        2534
                 lshiftc        4543
           divexact_byff        2039
        divexact_byfobm1        6261
            divexact_by3        6246
              divexact_1        15062
         modexact_1c_odd        15084
              add_err1_n        4810
              sub_err1_n        4802
   divrem_euclidean_qr_1        19548
   divrem_euclidean_qr_2        40166
    divrem_euclidean_r_1        8261
                divrem_1        21478
                divrem_2        28162
      divrem_hensel_qr_1        12568
    divrem_hensel_qr_1_1        14104
    divrem_hensel_qr_1_2        12606
       divrem_hensel_r_1        14064
  divrem_hensel_rsh_qr_1        16603
  rsh_divrem_hensel_qr_1        13170
rsh_divrem_hensel_qr_1_1        14112
rsh_divrem_hensel_qr_1_2        13176
                 mod_1_1        12003
                 mod_1_2        7556
                 mod_1_3        8062            C code was faster than asm , 
but 
slower than the the asm mod_1_2 !!!
                 mod_1_4
             mod_34lsub1        3045

mul_basecase runs at about 6c/l so clearly from the above we can do better


A quick test gives us 6c for rax , 7c for rdx , thruput of 5c for mul , 
although these maximum values , could be better than this. I've got some more 
on this...



Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to