Hi Andriy and Adam, I did also the same thing as suggested.
my conclusion: on Core i7 920, 2.66GHz, TurboBoost on, HyperThreading off, My result of dgemm GotoBLAS performance was following. *summary of result 36-39GFlops 81-87% of peak performance without pinning 35-40GFlops 78-89% of peak performance with pinning my observation * performance is somewhat unstable like 35GFlops then next calculation is 40GFlops...and flips etc. jittering is observed. * pinning makes performance somewhat stabler, but we don't gain a bit more. Details. First I ran %./dgemm n: 3500 time : 84.431008 or 22.428125 Mflops : 38244.168629 n: 3600 time : 90.162220 or 23.440381 Mflops : 39819.284422 n: 3700 time : 101.427504 or 27.404345 Mflops : 36977.121646 Note: 36-39GFlops 81-87% of peak performance then, pinned to each core like following % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 3 190 run - 1408 100161 dgemm - 2 190 run - 1408 100162 dgemm - 2 190 run - 1408 100163 dgemm - 1 189 run - 1408 100164 dgemm - 0 190 run - 1408 100165 dgemm - 3 189 run - 1408 100166 dgemm - 1 190 run - 1408 100167 dgemm initial thread 0 190 run - % cpuset -t 100160 -l 0 % cpuset -t 100161 -l 0 % cpuset -t 100162 -l 1 % cpuset -t 100163 -l 1 % cpuset -t 100164 -l 2 % cpuset -t 100165 -l 2 % cpuset -t 100166 -l 3 % cpuset -t 100167 -l 3 then, % procstat -t 1408 PID TID COMM TDNAME CPU PRI STATE WCHAN 1408 100160 dgemm - 0 191 run - 1408 100161 dgemm - 0 191 run - 1408 100162 dgemm - 1 190 run - 1408 100163 dgemm - 1 190 run - 1408 100164 dgemm - 2 190 run - 1408 100165 dgemm - 2 190 run - 1408 100166 dgemm - 3 190 run - 1408 100167 dgemm initial thread 3 190 run - n: 4000 time : 121.907696 or 31.475052 Mflops : 40677.295630 n: 4100 time : 139.842701 or 38.702532 Mflops : 35624.444587 n: 4200 time : 143.622179 or 36.725949 Mflops : 40356.011158 n: 4300 time : 153.742976 or 39.465752 Mflops : 40301.013511 n: 4400 time : 164.919566 or 42.380653 Mflops : 40208.611317 n: 4500 time : 175.930335 or 45.422572 Mflops : 40132.139469 Thanks From: Adam Vande More <amvandem...@gmail.com> Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920 Date: Wed, 14 Apr 2010 12:47:31 -0500 > On Wed, Apr 14, 2010 at 11:51 AM, Andriy Gapon <a...@freebsd.org> wrote: > >> on 14/04/2010 19:45 Adam Vande More said the following: >> > >> > also if I run cpuset on the dgemm then the utilization is basically at >> > the theoretical max for one core so at least that part is working. >> >> You can also try procstat -t <pid> to find out thread IDs and cpuset -t to >> pin the >> threads to the cores. >> > > it gets to around 90% doing that. > > time : 103.617271 or 27.140992 > Mflops : 47172.925449 > n: 4100 > time : 113.910669 or 30.520677 > Mflops : 45174.496186 > n: 4200 > time : 121.880695 or 32.068070 > Mflops : 46217.711013 > n: 4300 > > tried a couple of different thread orders but didn't seem to make a > difference. > > galacticdominator% procstat -t 1922 > PID TID COMM TDNAME CPU PRI STATE WCHAN > 1922 100092 dgemm initial thread 0 190 run - > 1922 100268 dgemm - 1 190 run - > 1922 100270 dgemm - 1 191 run - > 1922 100272 dgemm - 3 190 run - > 1922 100273 dgemm - 2 191 run - > 1922 100274 dgemm - 2 191 run - > 1922 100282 dgemm - 0 190 run - > 1922 100283 dgemm - 3 190 run - > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 1 > galacticdominator% cpuset -t 100270 -l 2 > galacticdominator% cpuset -t 100272 -l 3 > galacticdominator% cpuset -t 100273 -l 0 > galacticdominator% cpuset -t 100274 -l 1 > galacticdominator% cpuset -t 100282 -l 2 > galacticdominator% cpuset -t 100283 -l 3 > > > galacticdominator% cpuset -t 100092 -l 0 > galacticdominator% cpuset -t 100268 -l 0 > galacticdominator% cpuset -t 100270 -l 1 > galacticdominator% cpuset -t 100272 -l 1 > galacticdominator% cpuset -t 100273 -l 2 > galacticdominator% cpuset -t 100274 -l 2 > galacticdominator% cpuset -t 100282 -l 3 > galacticdominator% cpuset -t 100283 -l 3 > > > This is from the second set: > > time : 150.348850 or 40.488350 > Mflops : 45022.951141 > n: 4600 > time : 161.968982 or 43.589618 > Mflops : 44669.884500 > n: 4700 > > Since this is a full fledged desktop environment, 90% utilization seems > pretty good. I'm no expert Andriy, but it seems like if gotoblas > implemented some of the FreeBSD optimizations then we'd be in the same > ballpark. > > > -- > Adam Vande More _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"