On 13 Nov 2011, at 12:13, Michele Martone wrote: > I'm not an expert of your machine, but I find this speedup reasonable: > librsb's the speedup is limited by memory speed. > To have a rough estimate about it, could you please report the first > lines `./rsbench -M' output ? > > e.g.: on an Atom N450, librsb's "parallel MEMCPY" speedup is 20% only: > $./rsbench -M > #1 cores MEMCPY on 17810773 bytes: 0.542651 GB/s (73 times in 2.39599 s) > #2 cores MEMCPY on 17810773 bytes: 0.60361 GB/s (73 times in 2.15402 s)
The output of RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/3M,L1:8/64/32K" /opt/librsb/bin/rsbench -M does not look like that on my machine, see below. c. #TLB benchmark. #TLB timing benchmark : scanned 128 entries spaced 4096 bytes across 524288 bytes in 0.00115108 s (424.192 MBps) #TLB timing benchmark : scanned 256 entries spaced 4096 bytes across 1048576 bytes in 0.00133705 s (730.385 MBps) #TLB timing benchmark : scanned 512 entries spaced 4096 bytes across 2097152 bytes in 0.00514293 s (379.769 MBps) #TLB timing benchmark : scanned 1024 entries spaced 4096 bytes across 4194304 bytes in 0.036422 s (107.25 MBps) #TLB timing benchmark : scanned 2048 entries spaced 4096 bytes across 8388608 bytes in 0.0742929 s (105.158 MBps) #TLB timing benchmark : scanned 4096 entries spaced 4096 bytes across 16777216 bytes in 0.148562 s (105.175 MBps) #TLB timing benchmark : scanned 8192 entries spaced 4096 bytes across 33554432 bytes in 0.299253 s (104.427 MBps) #TLB timing benchmark : scanned 16384 entries spaced 4096 bytes across 67108864 bytes in 0.596625 s (104.756 MBps) #TLB timing benchmark : scanned 32768 entries spaced 4096 bytes across 134217728 bytes in 1.1966 s (104.462 MBps) #TLB timing benchmark : scanned 65536 entries spaced 4096 bytes across 268435456 bytes in 2.38357 s (104.885 MBps) #TLB timing benchmark : scanned 131072 entries spaced 4096 bytes across 536870912 bytes in 4.77643 s (104.681 MBps) #***************************************************************************** #begin experimental indirect array scan benchmark #***************************************************************************** #***************************************************************************** #autotuning.. #***************************************************************************** ignore this: 332873405 ignore this: 624934944 ignore this: 335691116 ignore this: 1687511808 ignore this: 1926946064 ignore this: -863713472 #***************************************************************************** #autotuning done. will proceed with presumably 1.07323 s samples #***************************************************************************** for 8192 elements, 65536 bytes, random access time: 3.92229e-05, linear access time: 3.82903e-05, ratio 1.02436 for 397312 elements, 3178496 bytes, random access time: 0.00277281, linear access time: 0.00189008, ratio 1.46703 for 786432 elements, 6291456 bytes, random access time: 0.0134281, linear access time: 0.00394277, ratio 3.40577 for 3145728 elements, 25165824 bytes, random access time: 0.10119, linear access time: 0.0153889, ratio 6.57549 #please ignore this: 1370474044 end experimental indirect array scan benchmark #***************************************************************************** #TLB benchmark code is unfinished! #***************************************************************************** # This test will measure times in scanning arrays sized and aligned to fit in caches. # 2 cache levels detected #Level 1: #size size level bw(MBps) READ 32768 1 2616.93 WRITE 32768 1 2410.72 RW 32768 1 932.109 BZERO 32768 1 30629.6 ZERO 32768 1 2640.96 MEMSET 32768 1 29606.1 MEMCPY 32768 1 29002.9 MEMCPY2 32768 1 7751.16 LINEAR_CHASE 32768 1 1076.56 MORTON_CHASE 32768 1 1077.62 #Level 2: #size size level bw(MBps) READ 3145728 2 2307.82 WRITE 3145728 2 2210.49 RW 3145728 2 923.464 BZERO 3145728 2 4123.31 ZERO 3145728 2 2463.75 MEMSET 3145728 2 3977.54 MEMCPY 3145728 2 7400.41 MEMCPY2 3145728 2 3210.63 LINEAR_CHASE 3145728 2 1068.29 MORTON_CHASE 3145728 2 989.428 #READ ratio 0.88188 #WRITE ratio 0.916943 #RW ratio 0.990725 #BZERO ratio 0.134618 #ZERO ratio 0.932902 #MEMSET ratio 0.134349 #MEMCPY ratio 0.255161 #MEMCPY2 ratio 0.414213 #LINEAR_CHASE ratio 0.992319 #MORTON_CHASE ratio 0.918157 #Level 3 (RAM) (sample size 2^1 times the last cache size): #size size level bw(MBps) READ 6291456 3 2087.29 WRITE 6291456 3 1755.45 RW 6291456 3 901.197 BZERO 6291456 3 4144.92 ZERO 6291456 3 1765.75 MEMSET 6291456 3 4093.23 MEMCPY 6291456 3 6858.23 MEMCPY2 6291456 3 1962.98 LINEAR_CHASE 6291456 3 1044.37 MORTON_CHASE 6291456 3 535.575 #READ ratio 0.904442 #WRITE ratio 0.794144 #RW ratio 0.975888 #BZERO ratio 1.00524 #ZERO ratio 0.716692 #MEMSET ratio 1.02908 #MEMCPY ratio 0.926736 #MEMCPY2 ratio 0.611401 #LINEAR_CHASE ratio 0.977609 #MORTON_CHASE ratio 0.541298 #Level 3 (RAM) (sample size 2^2 times the last cache size): #size size level bw(MBps) READ 12582912 4 1874.39 WRITE 12582912 4 1778.48 RW 12582912 4 902.22 BZERO 12582912 4 4118.28 ZERO 12582912 4 1682.59 MEMSET 12582912 4 3933.76 MEMCPY 12582912 4 4610.96 MEMCPY2 12582912 4 1850.51 LINEAR_CHASE 12582912 4 1044.56 MORTON_CHASE 12582912 4 533.563 #READ ratio 0.898 #WRITE ratio 1.01312 #RW ratio 1.00114 #BZERO ratio 0.993572 #ZERO ratio 0.952902 #MEMSET ratio 0.961043 #MEMCPY ratio 0.672325 #MEMCPY2 ratio 0.942704 #LINEAR_CHASE ratio 1.00019 #MORTON_CHASE ratio 0.996242 ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev