I don't think it matters much if we leave sqr_basecase.as in or not. There's no clear winner there. My inclination is to leave it in because it is faster for very small and very large basecase squarings.
Bill. On 24 March 2014 21:57, Frithjof Schulze <sfrith...@gmail.com> wrote: > > > On Monday, March 24, 2014 11:12:34 AM UTC, leif wrote: >> >> Bill Hart wrote: >> > Leif said he was going to tune on a bobcat, but hasn't yet. >> >> Well, this of course^TM turned out to be a can of worms... ;-) >> >> So far it looks like the bobcat version of mpn_sqr_basecase was actually >> faster, but I don't really trust the figures. (I played a little with >> the "precision" option, but this seems to be logically limited to 2^31. >> With the default precision, I occasionally get non-monotonic numbers; >> not that one value was exceptionally bad -- as one would expect, but >> timings about twice as fast, despite both cores being on "performance", >> and the machine mostly idle.) >> > > If you want to compare, my values for ./speed -s 1-40 mpn_sqr_basecase are > > with sqr...as -- without > 1 0.000000006 0.000000009 > 2 0.000000012 0.000000017 > 3 0.000000031 0.000000036 > 4 0.000000056 0.000000054 > 5 0.000000079 0.000000069 > 6 0.000000100 0.000000094 > 7 0.000000128 0.000000128 > 8 0.000000159 0.000000151 > 9 0.000000191 0.000000185 > 10 0.000000228 0.000000223 > 11 0.000000270 0.000000281 > 12 0.000000318 0.000000320 > 13 0.000000368 0.000000376 > 14 0.000000418 0.000000419 > 15 0.000000465 0.000000496 > 16 0.000000527 0.000000539 > 17 0.000000580 0.000000616 > 18 0.000000646 0.000000668 > 19 0.000000710 0.000000763 > 20 0.000000788 0.000000822 > 21 0.000000868 0.000000915 > 22 0.000000935 0.000001000 > 23 0.000001021 0.000001098 > 24 0.000001107 0.000001170 > 25 0.000001191 0.000001274 > 26 0.000001286 0.000001372 > 27 0.000001379 0.000001492 > 28 0.000001496 0.000001577 > 29 0.000001597 0.000001700 > 30 0.000001704 0.000001802 > 31 0.000001810 0.000001942 > 32 0.000001927 0.000002038 > 33 0.000002048 0.000002187 > 34 0.000002167 0.000002299 > 35 0.000002283 0.000002461 > 36 0.000002409 0.000002577 > 37 0.000002537 0.000002724 > 38 0.000002672 0.000002864 > 39 0.000002805 0.000003065 > 40 0.000002959 0.000003180 > > This is with > > gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) > Linux 3.8.0-36-generic #52~precise1-Ubuntu SMP Mon Feb 3 21:54:46 UTC 2014 > x86_64 > > ~/src/mpir/tune > cat /proc/cpuinfo > processor : 0 > vendor_id : AuthenticAMD > cpu family : 20 > model : 1 > model name : AMD E-350 Processor > stepping : 0 > microcode : 0x5000029 > cpu MHz : 1600.000 > cache size : 512 KB > physical id : 0 > siblings : 2 > core id : 0 > cpu cores : 2 > apicid : 0 > initial apicid : 0 > fpu : yes > fpu_exception : yes > cpuid level : 6 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt > pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid > aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic > cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat > hw_pstate npt lbrv svm_lock nrip_save pausefilter > bogomips : 3193.07 > TLB size : 1024 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 36 bits physical, 48 bits virtual > power management: ts ttp tm stc 100mhzsteps hwpstate > > processor : 1 > vendor_id : AuthenticAMD > cpu family : 20 > [...] > > -- Frithjof > > The raw output was: > > First try with mpn/x86_64/bobcat/sqr_basecase.as > > overhead 0.000000004 secs, precision 1000000 units of 6.25e-10 secs, CPU > freq 1600.00 MHz > mpn_sqr_basecase > 1 0.000000006 > 2 0.000000012 > 3 0.000000031 > 4 0.000000056 > 5 0.000000079 > 6 0.000000100 > 7 0.000000128 > 8 0.000000159 > 9 0.000000191 > 10 0.000000228 > 11 0.000000270 > 12 0.000000318 > 13 0.000000368 > 14 0.000000418 > 15 0.000000465 > 16 0.000000527 > 17 0.000000580 > 18 0.000000646 > 19 0.000000710 > 20 0.000000788 > 21 0.000000868 > 22 0.000000935 > 23 0.000001021 > 24 0.000001107 > 25 0.000001191 > 26 0.000001286 > 27 0.000001379 > 28 0.000001496 > 29 0.000001597 > 30 0.000001704 > 31 0.000001810 > 32 0.000001927 > 33 0.000002048 > 34 0.000002167 > 35 0.000002283 > 36 0.000002409 > 37 0.000002537 > 38 0.000002672 > 39 0.000002805 > 40 0.000002959 > > Second try with mpn/x86_64/bobcat/sqr_basecase.as removed > > overhead 0.000000004 secs, precision 1000000 units of 6.25e-10 secs, CPU > freq 1600.00 MHz > mpn_sqr_basecase > 1 0.000000009 > 2 0.000000017 > 3 0.000000036 > 4 0.000000054 > 5 0.000000069 > 6 0.000000094 > 7 0.000000128 > 8 0.000000151 > 9 0.000000185 > 10 0.000000223 > 11 0.000000281 > 12 0.000000320 > 13 0.000000376 > 14 0.000000419 > 15 0.000000496 > 16 0.000000539 > 17 0.000000616 > 18 0.000000668 > 19 0.000000763 > 20 0.000000822 > 21 0.000000915 > 22 0.000001000 > 23 0.000001098 > 24 0.000001170 > 25 0.000001274 > 26 0.000001372 > 27 0.000001492 > 28 0.000001577 > 29 0.000001700 > 30 0.000001802 > 31 0.000001942 > 32 0.000002038 > 33 0.000002187 > 34 0.000002299 > 35 0.000002461 > 36 0.000002577 > 37 0.000002724 > 38 0.000002864 > 39 0.000003065 > 40 0.000003180 > > > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel. For more options, visit https://groups.google.com/d/optout.