Re: HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
On Wed, Apr 14, 2010 at 9:21 PM, Ian Smith smi...@nimnet.asn.au wrote: On Wed, 14 Apr 2010, Garrett Cooper wrote: On Wed, Apr 14, 2010 at 7:49 PM, Garrett Cooper yanef...@gmail.com wrote: On Wed, Apr 14, 2010 at 5:46 PM, Maho NAKATA cha...@mac.com wrote: Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops) [1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Er, shouldn't one of those say HTT on? and/or Turbo boost on? Else they're both the same test as [4] but with different results? There's a problem with 8.x+ cores reported by the kernel. For some odd reason more recent Intel processors aren't reporting themselves as HT-enabled when they have HT-cores (see: kern/145385). I didn't look into the issue too hard, but since it does seem to be a major performance loss perhaps I should; besides, it would be good experience to put under my belt :]. Turbo Boost on, Hyper threading on: 71% (32GFlops) [3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] Clarification of all four possible test configs - 8 if you add pinning CPUs or not - might make this a bit clearer? Doesn't this make sense? Hyperthreaded cores in Intel procs still provide an incomplete set of registers as they're logical processors, so I would expect for things to be slower if they're automatically run on the SMT cores instead of the physical ones. Since we're talking FP, do HTT 'cores' share an FPU, or have their own? If contended, you'd have to expect worse (at least FP) performance, no? Ah, that's another excellent point. What instructions is dgemm using -- pure integer based arithmetic, floating point arithmetic, specialized operations that would benefit from using SIMD, etc? Is there a weighting scheme to SCHED_ULE where logical processors (like the SMT variety) get a lower score than real processors do, and thus get scheduled for less intensive interrupting tasks, or maybe just don't get scheduled in high use scenarios like it would if it was a physical processor? Err... wait. Didn't see that the turbo boost results didn't scale linearly or align with one another until just a sec ago. Nevermind my previous comment. Waiting for the fog to lift .. As am I. I don't know enough in this area, but I'm definitely open to learning. Thanks, -Garrett ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Linux static linked ver doesn't work on FBSD (Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
On Wed, Apr 14, 2010 at 10:26 PM, Maho NAKATA cha...@mac.com wrote: From: Pieter de Goeje pie...@degoeje.nl Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 16:05:18 +0200 I think the best test would be to run a statically compiled linux binary on FreeBSD. That way the compiler settings are exactly the same. It is not possible for Linux amd64 binary to run on FreeBSD amd64, ...and not i386 version neither. GotoBLAS uses special systeml call. % ./dgemm linux_sys_futex: unknown op 265 linux: pid 1264 (dgemm): syscall mbind not implemented n: 3000 ^C just halt. Yes, and while this isn't directly tied into numa, mbind(2), mempolicy(2), and a few others use the same facilities that are available via plain numa. I know because of messes I've tried to clean up in these areas. I'm really not sure why this is using numa though to be honest... Thanks, -Garrett ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
May I make a suggestion? Would you mind creating a shared google spreadsheet with your testing results and a shared google document with the test setup? I think having the data in an easily represented, easily shared medium would be beneficial to everyone. Adrian On 15 April 2010 08:46, Maho NAKATA cha...@mac.com wrote: Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops) [1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Turbo Boost on, Hyper threading on: 71% (32GFlops) [3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] ---my system--- CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (2683.44-MHz K8-class CPU) Origin = GenuineIntel Id = 0x106a5 Stepping = 5 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x98e3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 12884901888 (12288 MB) avail memory = 12387717120 (11813 MB) ACPI APIC Table: 110909 APIC1026 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) ---my system--- ---DETAILS--- [1] % ./dgemm n: 3000 time : 57.666717 or 16.339074 Mflops : 33060.624827 n: 3100 time : 61.502677 or 16.597376 Mflops : 35910.025544 n: 3200 time : 69.075401 or 19.199833 Mflops : 34144.297133 n: 3300 time : 73.699540 or 19.633594 Mflops : 36618.756539 n: 3400 time : 82.256194 or 22.373651 Mflops : 35144.518837 n: 3500 time : 88.975662 or 24.118761 Mflops : 35563.394249 n: 3600 time : 96.436652 or 26.027588 Mflops : 35861.148385 n: 3700 [2] % ./dgemm n: 3000 time : 139.622739 or 17.693806 Mflops : 30529.327312 n: 3100 time : 154.344971 or 19.566886 Mflops : 30460.247702 n: 3200 time : 169.507739 or 21.467100 Mflops : 30538.116602 n: 3300 time : 186.363773 or 23.615281 Mflops : 30444.600545 n: 3400 time : 203.798979 or 25.817667 Mflops : 30456.322788 n: 3500 ... [3] % ./dgemm n: 3000 time : 134.673079 or 16.958682 Mflops : 31852.711082 n: 3100 time : 148.410085 or 18.663248 Mflops : 31935.073574 n: 3200 time : 162.835473 or 20.468825 Mflops : 32027.475770 n: 3300 time : 179.025370 or 22.479189 Mflops : 31983.262501 n: 3400 time : 195.859710 or 24.663009 Mflops : 31882.208788 n: 3500 [4] % ./dgemm n: 3000 time : 54.259647 or 14.684309 Mflops : 36786.204907 n: 3100 time : 60.899147 or 17.124599 Mflops : 34804.447141 n: 3200 time : 64.295342 or 17.490787 Mflops : 37480.577569 n: 3300 time : 69.781247 or 18.288840 Mflops : 39311.284796 n: 3400 time : 79.234397 or 21.829736 Mflops : 36020.187858 n: 3500 time : 83.905419 or 22.381237 Mflops : 38324.289174 n: 3600 time : 92.195022 or 25.105942 Mflops : 37177.621122 n: 3700 time : 97.718841 or 25.434243 Mflops : 39841.319494 n: 3800 time : 105.740463 or 27.414029 Mflops : 40042.592613 n: 3900 time : 113.980157 or 29.678505 Mflops : 39984.635420 n: 4000 time : 122.941569 or 31.946174 Mflops : 40077.412531 n: 4100 ---DETAILS--- From: Adam Vande More amvandem...@gmail.com Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 11:34:45 -0500 time : 162.45 or 20.430651 Mflops : 32087.318295 n: 3300 time : 178.497079 or 22.446093 Mflops : 32030.420499 n: 3400 time : 195.550715 or 24.586152 Mflops : 31981.873273 n: 3500 time : 213.403379 or 26.825058 Mflops : 31975.513363 n: 3600 ... above output is on Core i7 920 (2.66GHz; TurboBoost on) My results: $ ./dgemm n: 3000 time : 54.151302 or 28.189781 Mflops : 19162.263125 n: 3100 time : 60.157449 or 32.214141 Mflops : 18501.570537 n: 3200 time : 65.753191 or 34.114872 Mflops : 19216.393378 CPU: CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2653.35-MHz K8-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ⋮ FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) FreeBSD: FreeBSD 8.0-STABLE r205070 amd64 Please note that the system was not dedicated to the test, I had Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons running. That probably explains irregularities in the results. I am not sure how exactly theoretical maximum should
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 14/04/2010 20:47 Adam Vande More said the following: I'm no expert Andriy, but it seems like if gotoblas implemented some of the FreeBSD optimizations then we'd be in the same ballpark. This is a good point. But on the other hand, it means that our scheduler doesn't do a perfect job here. BTW, I use ULE. My observation is that when a number of CPU-intensive long running processes is less than or equal to number of cores, then the processes tend to stay on the same cores for a long time. But if the number of the processes is greater, then they seem to jump from core to core a lot. But I am not sure what would be an optimal strategy for that case. If we try to keep some lucky processes on the same core, then cpu time might be shared unfairly. Shuffling cores provides more fairness, but can hurt total performance. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Thu, Apr 15, 2010 at 3:54 AM, Andriy Gapon a...@freebsd.org wrote: This is a good point. But on the other hand, it means that our scheduler doesn't do a perfect job here. BTW, I use ULE. My observation is that when a number of CPU-intensive long running processes is less than or equal to number of cores, then the processes tend to stay on the same cores for a long time. But if the number of the processes is greater, then they seem to jump from core to core a lot. But I am not sure what would be an optimal strategy for that case. If we try to keep some lucky processes on the same core, then cpu time might be shared unfairly. Shuffling cores provides more fairness, but can hurt total performance. Is is possible to add a tunable to the scheduler for it's aggressiveness in switching cores? -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 15/04/2010 16:23 Adam Vande More said the following: Is is possible to add a tunable to the scheduler for it's aggressiveness in switching cores? No idea; not a scheduler person. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 14/04/2010 02:21 Maho NAKATA said the following: 2. install ports/math/gotoblas (manual download required) make install Do you know how gotoblas on Linux was obtained? Was it built from source? Has it come pre-packaged? If so, can you find out details of its build configuration? Thanks! -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Wednesday 14 April 2010 15:19:13 Andriy Gapon wrote: on 14/04/2010 02:21 Maho NAKATA said the following: 2. install ports/math/gotoblas (manual download required) make install Do you know how gotoblas on Linux was obtained? Was it built from source? Has it come pre-packaged? If so, can you find out details of its build configuration? Thanks! I think the best test would be to run a statically compiled linux binary on FreeBSD. That way the compiler settings are exactly the same. - Pieter ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 14/04/2010 02:21 Maho NAKATA said the following: 4. run dgemm. % ./dgemm n: 3000 time : 134.648208 or 16.910525 Mflops : 31943.419695 n: 3100 time : 148.122279 or 18.615284 Mflops : 32017.357408 n: 3200 time : 162.45 or 20.430651 Mflops : 32087.318295 n: 3300 time : 178.497079 or 22.446093 Mflops : 32030.420499 n: 3400 time : 195.550715 or 24.586152 Mflops : 31981.873273 n: 3500 time : 213.403379 or 26.825058 Mflops : 31975.513363 n: 3600 ... above output is on Core i7 920 (2.66GHz; TurboBoost on) My results: $ ./dgemm n: 3000 time : 54.151302 or 28.189781 Mflops : 19162.263125 n: 3100 time : 60.157449 or 32.214141 Mflops : 18501.570537 n: 3200 time : 65.753191 or 34.114872 Mflops : 19216.393378 CPU: CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2653.35-MHz K8-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ⋮ FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) FreeBSD: FreeBSD 8.0-STABLE r205070 amd64 Please note that the system was not dedicated to the test, I had Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons running. That probably explains irregularities in the results. I am not sure how exactly theoretical maximum should be calculated, I used 2 * 2.66G * 4 ≈ 21.3G. And so 19.2G / 21.3G ≈ 90%. Not as bad as what you get. Although not as good as what you report for Linux. But given the impurity and imprecision of my test… P.S. the machine is two-core obviously :-) Don't have anything with more cpus/cores handy. P.P.S. Having _only glimpsed_ at the source I think that there are some things that GotoBLAS doesn't try to do on FreeBSD that it tries to do on Linux. Like setting CPU-affinity for the threads, or avoiding HTT pseudo-cores. Those things are possible on FreeBSD. Perhaps, there are more things like that. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Wed, Apr 14, 2010 at 10:26 AM, Andriy Gapon a...@freebsd.org wrote: on 14/04/2010 02:21 Maho NAKATA said the following: 4. run dgemm. % ./dgemm n: 3000 time : 134.648208 or 16.910525 Mflops : 31943.419695 n: 3100 time : 148.122279 or 18.615284 Mflops : 32017.357408 n: 3200 time : 162.45 or 20.430651 Mflops : 32087.318295 n: 3300 time : 178.497079 or 22.446093 Mflops : 32030.420499 n: 3400 time : 195.550715 or 24.586152 Mflops : 31981.873273 n: 3500 time : 213.403379 or 26.825058 Mflops : 31975.513363 n: 3600 ... above output is on Core i7 920 (2.66GHz; TurboBoost on) My results: $ ./dgemm n: 3000 time : 54.151302 or 28.189781 Mflops : 19162.263125 n: 3100 time : 60.157449 or 32.214141 Mflops : 18501.570537 n: 3200 time : 65.753191 or 34.114872 Mflops : 19216.393378 CPU: CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2653.35-MHz K8-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ⋮ FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) FreeBSD: FreeBSD 8.0-STABLE r205070 amd64 Please note that the system was not dedicated to the test, I had Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons running. That probably explains irregularities in the results. I am not sure how exactly theoretical maximum should be calculated, I used 2 * 2.66G * 4 ≈ 21.3G. And so 19.2G / 21.3G ≈ 90%. Not as bad as what you get. Although not as good as what you report for Linux. But given the impurity and imprecision of my test… P.S. the machine is two-core obviously :-) Don't have anything with more cpus/cores handy. P.P.S. Having _only glimpsed_ at the source I think that there are some things that GotoBLAS doesn't try to do on FreeBSD that it tries to do on Linux. Like setting CPU-affinity for the threads, or avoiding HTT pseudo-cores. Those things are possible on FreeBSD. Perhaps, there are more things like that. Mine is also a live desktop enviro, kde4+ n: 3000 time : 116.377609 or 16.696066 Mflops : 32353.729042 n: 3100 time : 127.230336 or 17.274867 Mflops : 34501.695325 n: 3200 time : 139.018175 or 18.342056 Mflops : 35741.074976 n: 3300 time : 152.519365 or 20.154714 Mflops : 35671.942364 n: 3400 time : 166.248145 or 21.952426 Mflops : 35818.874941 n: 3500 time : 182.565385 or 24.492597 Mflops : 35020.581786 n: 3600 time : 198.551018 or 26.906992 Mflops : 34689.094992 n: 3700 time : 215.428919 or 28.574964 Mflops : 35462.294838 n: 3800 ^C CPU: Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz (3313.71-MHz K8-class CPU) Origin = GenuineIntel Id = 0x106e5 Family = 6 Model = 1e Stepping = 5 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x98e3fdSSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF TSC: P-state invariant That's about 67% utilization, turning off HTT drops it more. HTT on the newer cores is good, not bad. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Wed, Apr 14, 2010 at 11:34 AM, Adam Vande More amvandem...@gmail.comwrote: That's about 67% utilization, turning off HTT drops it more. HTT on the newer cores is good, not bad. Well that was completely contrarty to some tests I'd run when I first got the cpu. With HTT off: n: 3000 time : 44.705516 or 11.760183 Mflops : 45932.959253 n: 3100 time : 50.598581 or 14.270123 Mflops : 41766.437458 n: 3200 time : 55.748192 or 15.780977 Mflops : 41541.458400 n: 3300 time : 62.072217 or 17.441431 Mflops : 41221.262070 n: 3400 so that's about 79% right there. also if I run cpuset on the dgemm then the utilization is basically at the theoretical max for one core so at least that part is working. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 14/04/2010 19:45 Adam Vande More said the following: also if I run cpuset on the dgemm then the utilization is basically at the theoretical max for one core so at least that part is working. You can also try procstat -t pid to find out thread IDs and cpuset -t to pin the threads to the cores. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon a...@freebsd.org wrote: on 14/04/2010 19:45 Adam Vande More said the following: also if I run cpuset on the dgemm then the utilization is basically at the theoretical max for one core so at least that part is working. You can also try procstat -t pid to find out thread IDs and cpuset -t to pin the threads to the cores. it gets to around 90% doing that. time : 103.617271 or 27.140992 Mflops : 47172.925449 n: 4100 time : 113.910669 or 30.520677 Mflops : 45174.496186 n: 4200 time : 121.880695 or 32.068070 Mflops : 46217.711013 n: 4300 tried a couple of different thread orders but didn't seem to make a difference. galacticdominator% procstat -t 1922 PIDTID COMM TDNAME CPU PRI STATE WCHAN 1922 100092 dgemminitial thread 0 190 run - 1922 100268 dgemm- 1 190 run - 1922 100270 dgemm- 1 191 run - 1922 100272 dgemm- 3 190 run - 1922 100273 dgemm- 2 191 run - 1922 100274 dgemm- 2 191 run - 1922 100282 dgemm- 0 190 run - 1922 100283 dgemm- 3 190 run - galacticdominator% cpuset -t 100092 -l 0 galacticdominator% cpuset -t 100268 -l 1 galacticdominator% cpuset -t 100270 -l 2 galacticdominator% cpuset -t 100272 -l 3 galacticdominator% cpuset -t 100273 -l 0 galacticdominator% cpuset -t 100274 -l 1 galacticdominator% cpuset -t 100282 -l 2 galacticdominator% cpuset -t 100283 -l 3 galacticdominator% cpuset -t 100092 -l 0 galacticdominator% cpuset -t 100268 -l 0 galacticdominator% cpuset -t 100270 -l 1 galacticdominator% cpuset -t 100272 -l 1 galacticdominator% cpuset -t 100273 -l 2 galacticdominator% cpuset -t 100274 -l 2 galacticdominator% cpuset -t 100282 -l 3 galacticdominator% cpuset -t 100283 -l 3 This is from the second set: time : 150.348850 or 40.488350 Mflops : 45022.951141 n: 4600 time : 161.968982 or 43.589618 Mflops : 44669.884500 n: 4700 Since this is a full fledged desktop environment, 90% utilization seems pretty good. I'm no expert Andriy, but it seems like if gotoblas implemented some of the FreeBSD optimizations then we'd be in the same ballpark. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
From: Andriy Gapon a...@freebsd.org Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 16:19:13 +0300 on 14/04/2010 02:21 Maho NAKATA said the following: 2. install ports/math/gotoblas (manual download required) make install Do you know how gotoblas on Linux was obtained? Yes. Just download the archive. Was it built from source? Yes. Has it come pre-packaged? No. If so, can you find out details of its build configuration? I'm not sure I build like following on Ubuntu 9.10 amd64. $ tar xvfz GotoBLAS2-1.13.tar.gz $ cd GotoBLAS2 $ ./quickbuild.64bit ln -fs libgoto2_nehalemp-r1.13.a libgoto2.a for d in interface driver/level2 driver/level3 driver/others kernel lapack ; \ do if test -d $d; then \ make -j 8 -C $d libs || exit 1 ; \ fi; \ done make[1]: Entering directory `/home/maho/a/GotoBLAS2/interface' gcc -O2 -DEXPRECISION -m128bit-long-double -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DMAX_CPU_NUMBER=8 -DASMNAME=saxpy -DASMFNAME=saxpy_ -DNAME=saxpy_ -DCNAME=saxpy -DCHAR_NAME=\saxpy_\ -DCHAR_CNAME=\saxpy\ -I.. -I. -UDOUBLE -UCOMPLEX -c axpy.c -o saxpy.o gcc -O2 -DEXPRECISION -m128bit-long-double -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DMAX_CPU_NUMBER=8 -DASMNAME=sswap -DASMFNAME=sswap_ -DNAME=sswap_ -DCNAME=sswap -DCHAR_NAME=\sswap_\ -DCHAR_CNAME=\sswap\ -I.. -I. -UDOUBLE -UCOMPLEX -c swap.c -o sswap.o gcc -O2 -DEXPRECISION -m128bit-long-double -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DMAX_CPU_NUMBER=8 -DASMNAME=scopy -DASMFNAME=scopy_ -DNAME=scopy_ -DCNAME=scopy -DCHAR_NAME=\scopy_\ -DCHAR_CNAME=\scopy\ -I.. -I. -UDOUBLE -UCOMPLEX -c copy.c -o scopy.o gcc -O2 -DEXPRECISION -m128bit-long-double -Wall -m64 -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DMAX_CPU_NUMBER=8 -DASMNAME=sscal -DASMFNAME=sscal_ -DNAME=sscal_ -DCNAME=sscal -DCHAR_NAME=\sscal_\ -DCHAR_CNAME=\sscal\ -I.. -I. -UDOUBLE -UCOMPLEX -c scal.c -o sscal.o -- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops)[1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Turbo Boost on, Hyper threading on: 71% (32GFlops)[3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] ---my system--- CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (2683.44-MHz K8-class CPU) Origin = GenuineIntel Id = 0x106a5 Stepping = 5 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x98e3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 12884901888 (12288 MB) avail memory = 12387717120 (11813 MB) ACPI APIC Table: 110909 APIC1026 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) ---my system--- ---DETAILS--- [1] % ./dgemm n: 3000 time : 57.666717 or 16.339074 Mflops : 33060.624827 n: 3100 time : 61.502677 or 16.597376 Mflops : 35910.025544 n: 3200 time : 69.075401 or 19.199833 Mflops : 34144.297133 n: 3300 time : 73.699540 or 19.633594 Mflops : 36618.756539 n: 3400 time : 82.256194 or 22.373651 Mflops : 35144.518837 n: 3500 time : 88.975662 or 24.118761 Mflops : 35563.394249 n: 3600 time : 96.436652 or 26.027588 Mflops : 35861.148385 n: 3700 [2] % ./dgemm n: 3000 time : 139.622739 or 17.693806 Mflops : 30529.327312 n: 3100 time : 154.344971 or 19.566886 Mflops : 30460.247702 n: 3200 time : 169.507739 or 21.467100 Mflops : 30538.116602 n: 3300 time : 186.363773 or 23.615281 Mflops : 30444.600545 n: 3400 time : 203.798979 or 25.817667 Mflops : 30456.322788 n: 3500 ... [3] % ./dgemm n: 3000 time : 134.673079 or 16.958682 Mflops : 31852.711082 n: 3100 time : 148.410085 or 18.663248 Mflops : 31935.073574 n: 3200 time : 162.835473 or 20.468825 Mflops : 32027.475770 n: 3300 time : 179.025370 or 22.479189 Mflops : 31983.262501 n: 3400 time : 195.859710 or 24.663009 Mflops : 31882.208788 n: 3500 [4] % ./dgemm n: 3000 time : 54.259647 or 14.684309 Mflops : 36786.204907 n: 3100 time : 60.899147 or 17.124599 Mflops : 34804.447141 n: 3200 time : 64.295342 or 17.490787 Mflops : 37480.577569 n: 3300 time : 69.781247 or 18.288840 Mflops : 39311.284796 n: 3400 time : 79.234397 or 21.829736 Mflops : 36020.187858 n: 3500 time : 83.905419 or 22.381237 Mflops : 38324.289174 n: 3600 time : 92.195022 or 25.105942 Mflops : 37177.621122 n: 3700 time : 97.718841 or 25.434243 Mflops : 39841.319494 n: 3800 time : 105.740463 or 27.414029 Mflops : 40042.592613 n: 3900 time : 113.980157 or 29.678505 Mflops : 39984.635420 n: 4000 time : 122.941569 or 31.946174 Mflops : 40077.412531 n: 4100 ---DETAILS--- From: Adam Vande More amvandem...@gmail.com Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 11:34:45 -0500 time : 162.45 or 20.430651 Mflops : 32087.318295 n: 3300 time : 178.497079 or 22.446093 Mflops : 32030.420499 n: 3400 time : 195.550715 or 24.586152 Mflops : 31981.873273 n: 3500 time : 213.403379 or 26.825058 Mflops : 31975.513363 n: 3600 ... above output is on Core i7 920 (2.66GHz; TurboBoost on) My results: $ ./dgemm n: 3000 time : 54.151302 or 28.189781 Mflops : 19162.263125 n: 3100 time : 60.157449 or 32.214141 Mflops : 18501.570537 n: 3200 time : 65.753191 or 34.114872 Mflops : 19216.393378 CPU: CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2653.35-MHz K8-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e39dSSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant ⋮ FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) FreeBSD: FreeBSD 8.0-STABLE r205070 amd64 Please note that the system was not dedicated to the test, I had Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons running. That probably explains irregularities in the results. I am not sure how exactly theoretical maximum should be calculated, I used 2 * 2.66G * 4 ≈ 21.3G. And so 19.2G / 21.3G ≈ 90%. Not as bad as what you get. Although not as good as what you report for Linux. But given the impurity and imprecision of my test… P.S. the machine is two-core obviously :-) Don't have anything with more cpus/cores handy. ___ freebsd-stable@freebsd.org mailing list http
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
opps I missed this e-mail... From: Adam Vande More amvandem...@gmail.com Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 11:45:04 -0500 On Wed, Apr 14, 2010 at 11:34 AM, Adam Vande More amvandem...@gmail.comwrote: That's about 67% utilization, turning off HTT drops it more. HTT on the newer cores is good, not bad. Well that was completely contrarty to some tests I'd run when I first got the cpu. With HTT off: n: 3000 time : 44.705516 or 11.760183 Mflops : 45932.959253 n: 3100 time : 50.598581 or 14.270123 Mflops : 41766.437458 n: 3200 time : 55.748192 or 15.780977 Mflops : 41541.458400 n: 3300 time : 62.072217 or 17.441431 Mflops : 41221.262070 n: 3400 so that's about 79% right there. also if I run cpuset on the dgemm then the utilization is basically at the theoretical max for one core so at least that part is working. -- Adam Vande More ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Hi Andriy and Adam, I did also the same thing as suggested. my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off, My result of dgemm GotoBLAS performance was following. *summary of result 36-39GFlops 81-87% of peak performance without pinning 35-40GFlops 78-89% of peak performance with pinning my observation * performance is somewhat unstable like 35GFlops then next calculation is 40GFlops...and flips etc. jittering is observed. * pinning makes performance somewhat stabler, but we don't gain a bit more. Details. First I ran %./dgemm n: 3500 time : 84.431008 or 22.428125 Mflops : 38244.168629 n: 3600 time : 90.162220 or 23.440381 Mflops : 39819.284422 n: 3700 time : 101.427504 or 27.404345 Mflops : 36977.121646 Note: 36-39GFlops 81-87% of peak performance then, pinned to each core like following % procstat -t 1408 PIDTID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm- 3 190 run - 1408 100161 dgemm- 2 190 run - 1408 100162 dgemm- 2 190 run - 1408 100163 dgemm- 1 189 run - 1408 100164 dgemm- 0 190 run - 1408 100165 dgemm- 3 189 run - 1408 100166 dgemm- 1 190 run - 1408 100167 dgemminitial thread 0 190 run - % cpuset -t 100160 -l 0 % cpuset -t 100161 -l 0 % cpuset -t 100162 -l 1 % cpuset -t 100163 -l 1 % cpuset -t 100164 -l 2 % cpuset -t 100165 -l 2 % cpuset -t 100166 -l 3 % cpuset -t 100167 -l 3 then, % procstat -t 1408 PIDTID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm- 0 191 run - 1408 100161 dgemm- 0 191 run - 1408 100162 dgemm- 1 190 run - 1408 100163 dgemm- 1 190 run - 1408 100164 dgemm- 2 190 run - 1408 100165 dgemm- 2 190 run - 1408 100166 dgemm- 3 190 run - 1408 100167 dgemminitial thread 3 190 run - n: 4000 time : 121.907696 or 31.475052 Mflops : 40677.295630 n: 4100 time : 139.842701 or 38.702532 Mflops : 35624.444587 n: 4200 time : 143.622179 or 36.725949 Mflops : 40356.011158 n: 4300 time : 153.742976 or 39.465752 Mflops : 40301.013511 n: 4400 time : 164.919566 or 42.380653 Mflops : 40208.611317 n: 4500 time : 175.930335 or 45.422572 Mflops : 40132.139469 Thanks From: Adam Vande More amvandem...@gmail.com Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 12:47:31 -0500 On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon a...@freebsd.org wrote: on 14/04/2010 19:45 Adam Vande More said the following: also if I run cpuset on the dgemm then the utilization is basically at the theoretical max for one core so at least that part is working. You can also try procstat -t pid to find out thread IDs and cpuset -t to pin the threads to the cores. it gets to around 90% doing that. time : 103.617271 or 27.140992 Mflops : 47172.925449 n: 4100 time : 113.910669 or 30.520677 Mflops : 45174.496186 n: 4200 time : 121.880695 or 32.068070 Mflops : 46217.711013 n: 4300 tried a couple of different thread orders but didn't seem to make a difference. galacticdominator% procstat -t 1922 PIDTID COMM TDNAME CPU PRI STATE WCHAN 1922 100092 dgemminitial thread 0 190 run - 1922 100268 dgemm- 1 190 run - 1922 100270 dgemm- 1 191 run - 1922 100272 dgemm- 3 190 run - 1922 100273 dgemm- 2 191 run - 1922 100274 dgemm- 2 191 run - 1922 100282 dgemm- 0 190 run - 1922 100283 dgemm- 3 190 run - galacticdominator% cpuset -t 100092 -l 0 galacticdominator% cpuset -t 100268 -l 1 galacticdominator% cpuset -t 100270 -l 2 galacticdominator% cpuset -t 100272 -l 3 galacticdominator% cpuset -t 100273 -l 0 galacticdominator% cpuset -t 100274 -l 1 galacticdominator% cpuset -t 100282 -l 2 galacticdominator% cpuset -t 100283 -l 3 galacticdominator% cpuset -t 100092 -l 0 galacticdominator% cpuset -t 100268 -l 0 galacticdominator% cpuset -t 100270 -l 1 galacticdominator% cpuset -t 100272 -l 1 galacticdominator% cpuset -t 100273 -l 2 galacticdominator% cpuset -t 100274 -l 2 galacticdominator% cpuset -t 100282 -l 3 galacticdominator% cpuset -t
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Hi Adam, From: Adam Vande More amvandem...@gmail.com Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 12:47:31 -0500 Since this is a full fledged desktop environment, 90% utilization seems pretty good. No, I don't think so. Even on Ubuntu, mine is running on a full desktop environment, GotoBLAS's performance is about 95% using dgemm. dgemm on Linux is lot more stabler than FreeBSD and clearly faster. on Ubuntu $ ./dgemm n: 3000 time : 51.18 or 12.795519 Mflops : 42216.341930 n: 3100 time : 56.28 or 14.261719 Mflops : 41791.049205 n: 3200 time : 61.35 or 15.631380 Mflops : 41939.023080 n: 3300 time : 67.79 or 17.247202 Mflops : 41685.474166 n: 3400 time : 73.80 or 18.471321 Mflops : 42569.300032 n: 3500 time : 81.48 or 20.781936 Mflops : 41273.585044 n: 3600 time : 88.17 or 22.816965 Mflops : 40907.246233 n: 3700 time : 95.21 or 23.864101 Mflops : 42462.684969 n: 3800 thanks -- Nakata Maho http://accc.riken.jp/maho/ , http://ja.openoffice.org/ Nakata Maho's PGP public keys: http://accc.riken.jp/maho/maho.pgp.txt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
On Wed, Apr 14, 2010 at 5:46 PM, Maho NAKATA cha...@mac.com wrote: Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops) [1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Turbo Boost on, Hyper threading on: 71% (32GFlops) [3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] Doesn't this make sense? Hyperthreaded cores in Intel procs still provide an incomplete set of registers as they're logical processors, so I would expect for things to be slower if they're automatically run on the SMT cores instead of the physical ones. Is there a weighting scheme to SCHED_ULE where logical processors (like the SMT variety) get a lower score than real processors do, and thus get scheduled for less intensive interrupting tasks, or maybe just don't get scheduled in high use scenarios like it would if it was a physical processor? Thanks, -Garrett ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
On Wed, Apr 14, 2010 at 7:49 PM, Garrett Cooper yanef...@gmail.com wrote: On Wed, Apr 14, 2010 at 5:46 PM, Maho NAKATA cha...@mac.com wrote: Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops) [1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Turbo Boost on, Hyper threading on: 71% (32GFlops) [3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] Doesn't this make sense? Hyperthreaded cores in Intel procs still provide an incomplete set of registers as they're logical processors, so I would expect for things to be slower if they're automatically run on the SMT cores instead of the physical ones. Is there a weighting scheme to SCHED_ULE where logical processors (like the SMT variety) get a lower score than real processors do, and thus get scheduled for less intensive interrupting tasks, or maybe just don't get scheduled in high use scenarios like it would if it was a physical processor? Err... wait. Didn't see that the turbo boost results didn't scale linearly or align with one another until just a sec ago. Nevermind my previous comment. -Garrett ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
On Wed, 14 Apr 2010, Garrett Cooper wrote: On Wed, Apr 14, 2010 at 7:49 PM, Garrett Cooper yanef...@gmail.com wrote: On Wed, Apr 14, 2010 at 5:46 PM, Maho NAKATA cha...@mac.com wrote: Hi Andry and Adam My test again. No desktop, etc. I just run dgemm. Contrary to Adam's result, Hyper Threading makes the performance worse. all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz) Turbo Boost off, Hyper threading off: 82% (35GFlops) [1] Turbo Boost off, Hyper threading off: 72% (30.5GFlops) [2] Er, shouldn't one of those say HTT on? and/or Turbo boost on? Else they're both the same test as [4] but with different results? Turbo Boost on, Hyper threading on: 71% (32GFlops) [3] Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4] Clarification of all four possible test configs - 8 if you add pinning CPUs or not - might make this a bit clearer? Doesn't this make sense? Hyperthreaded cores in Intel procs still provide an incomplete set of registers as they're logical processors, so I would expect for things to be slower if they're automatically run on the SMT cores instead of the physical ones. Since we're talking FP, do HTT 'cores' share an FPU, or have their own? If contended, you'd have to expect worse (at least FP) performance, no? Is there a weighting scheme to SCHED_ULE where logical processors (like the SMT variety) get a lower score than real processors do, and thus get scheduled for less intensive interrupting tasks, or maybe just don't get scheduled in high use scenarios like it would if it was a physical processor? Err... wait. Didn't see that the turbo boost results didn't scale linearly or align with one another until just a sec ago. Nevermind my previous comment. Waiting for the fog to lift .. cheers, Ian___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Linux static linked ver doesn't work on FBSD (Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)
From: Pieter de Goeje pie...@degoeje.nl Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 16:05:18 +0200 I think the best test would be to run a statically compiled linux binary on FreeBSD. That way the compiler settings are exactly the same. It is not possible for Linux amd64 binary to run on FreeBSD amd64, ...and not i386 version neither. GotoBLAS uses special systeml call. % ./dgemm linux_sys_futex: unknown op 265 linux: pid 1264 (dgemm): syscall mbind not implemented n: 3000 ^C just halt. -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ Blog: http://blog.goo.ne.jp/nakatamaho/ , GPG key: http://accc.riken.jp/maho/maho.pgp.txt ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
on 15/04/2010 04:20 Maho NAKATA said the following: Hi Andriy and Adam, I did also the same thing as suggested. my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off, So HyperThreading is off. then, pinned to each core like following % procstat -t 1408 PIDTID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm- 3 190 run - 1408 100161 dgemm- 2 190 run - 1408 100162 dgemm- 2 190 run - 1408 100163 dgemm- 1 189 run - 1408 100164 dgemm- 0 190 run - 1408 100165 dgemm- 3 189 run - 1408 100166 dgemm- 1 190 run - 1408 100167 dgemminitial thread 0 190 run - But there are still 8 threads. Can you check how many threads you have on Linux with the same configuration? Is it possible to tell GotoBLAS to use 4 threads? If yes, can you also test that scenario? Also, would it be possible for you to test recent 8-STABLE? Just for the sake of experiment. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org