Re: [OctDev] "librsb"+"sparsersb" packages proposal for octave-forge

c. Sun, 13 Nov 2011 03:37:10 -0800

On 13 Nov 2011, at 12:13, Michele Martone wrote:

> I'm not an expert of your machine, but I find this speedup reasonable:
> librsb's the speedup is limited by memory speed.
> To have a rough estimate about it, could you please report the first
> lines `./rsbench -M' output ?
> 
> e.g.: on an Atom N450, librsb's "parallel MEMCPY" speedup is 20% only:
> $./rsbench -M
> #1 cores MEMCPY on 17810773 bytes: 0.542651 GB/s (73 times in 2.39599 s)
> #2 cores MEMCPY on 17810773 bytes: 0.60361 GB/s (73 times in 2.15402 s)


The output of 

RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/3M,L1:8/64/32K" 
/opt/librsb/bin/rsbench -M

does not look like that on my machine, see below.
c.


#TLB benchmark.
#TLB timing benchmark : scanned 128 entries spaced 4096 bytes across 524288 
bytes in 0.00115108 s (424.192 MBps)
#TLB timing benchmark : scanned 256 entries spaced 4096 bytes across 1048576 
bytes in 0.00133705 s (730.385 MBps)
#TLB timing benchmark : scanned 512 entries spaced 4096 bytes across 2097152 
bytes in 0.00514293 s (379.769 MBps)
#TLB timing benchmark : scanned 1024 entries spaced 4096 bytes across 4194304 
bytes in 0.036422 s (107.25 MBps)
#TLB timing benchmark : scanned 2048 entries spaced 4096 bytes across 8388608 
bytes in 0.0742929 s (105.158 MBps)
#TLB timing benchmark : scanned 4096 entries spaced 4096 bytes across 16777216 
bytes in 0.148562 s (105.175 MBps)
#TLB timing benchmark : scanned 8192 entries spaced 4096 bytes across 33554432 
bytes in 0.299253 s (104.427 MBps)
#TLB timing benchmark : scanned 16384 entries spaced 4096 bytes across 67108864 
bytes in 0.596625 s (104.756 MBps)
#TLB timing benchmark : scanned 32768 entries spaced 4096 bytes across 
134217728 bytes in 1.1966 s (104.462 MBps)
#TLB timing benchmark : scanned 65536 entries spaced 4096 bytes across 
268435456 bytes in 2.38357 s (104.885 MBps)
#TLB timing benchmark : scanned 131072 entries spaced 4096 bytes across 
536870912 bytes in 4.77643 s (104.681 MBps)
#*****************************************************************************
#begin experimental indirect array scan benchmark
#*****************************************************************************
#*****************************************************************************
#autotuning..
#*****************************************************************************
ignore this: 332873405
ignore this: 624934944
ignore this: 335691116
ignore this: 1687511808
ignore this: 1926946064
ignore this: -863713472
#*****************************************************************************
#autotuning done. will proceed with presumably 1.07323 s samples
#*****************************************************************************
for 8192 elements, 65536 bytes, random access time: 3.92229e-05, linear access 
time: 3.82903e-05, ratio 1.02436
for 397312 elements, 3178496 bytes, random access time: 0.00277281, linear 
access time: 0.00189008, ratio 1.46703
for 786432 elements, 6291456 bytes, random access time: 0.0134281, linear 
access time: 0.00394277, ratio 3.40577
for 3145728 elements, 25165824 bytes, random access time: 0.10119, linear 
access time: 0.0153889, ratio 6.57549
#please ignore this: 1370474044
end experimental indirect array scan benchmark
#*****************************************************************************
#TLB benchmark code is unfinished!
#*****************************************************************************
# This test will measure times in scanning arrays sized and aligned to fit in 
caches.
# 2 cache levels detected
#Level 1:
#size                                   size    level   bw(MBps)
READ                                    32768   1       2616.93
WRITE                                   32768   1       2410.72
RW                                      32768   1       932.109
BZERO                                   32768   1       30629.6
ZERO                                    32768   1       2640.96
MEMSET                                  32768   1       29606.1
MEMCPY                                  32768   1       29002.9
MEMCPY2                                 32768   1       7751.16
LINEAR_CHASE                            32768   1       1076.56
MORTON_CHASE                            32768   1       1077.62
#Level 2:
#size                                   size    level   bw(MBps)
READ                                    3145728 2       2307.82
WRITE                                   3145728 2       2210.49
RW                                      3145728 2       923.464
BZERO                                   3145728 2       4123.31
ZERO                                    3145728 2       2463.75
MEMSET                                  3145728 2       3977.54
MEMCPY                                  3145728 2       7400.41
MEMCPY2                                 3145728 2       3210.63
LINEAR_CHASE                            3145728 2       1068.29
MORTON_CHASE                            3145728 2       989.428
#READ                             ratio  0.88188 
#WRITE                            ratio  0.916943 
#RW                               ratio  0.990725 
#BZERO                            ratio  0.134618 
#ZERO                             ratio  0.932902 
#MEMSET                           ratio  0.134349 
#MEMCPY                           ratio  0.255161 
#MEMCPY2                          ratio  0.414213 
#LINEAR_CHASE                     ratio  0.992319 
#MORTON_CHASE                     ratio  0.918157 
#Level 3 (RAM) (sample size 2^1 times the last cache size):
#size                                   size    level   bw(MBps)
READ                                    6291456 3       2087.29
WRITE                                   6291456 3       1755.45
RW                                      6291456 3       901.197
BZERO                                   6291456 3       4144.92
ZERO                                    6291456 3       1765.75
MEMSET                                  6291456 3       4093.23
MEMCPY                                  6291456 3       6858.23
MEMCPY2                                 6291456 3       1962.98
LINEAR_CHASE                            6291456 3       1044.37
MORTON_CHASE                            6291456 3       535.575
#READ                             ratio  0.904442 
#WRITE                            ratio  0.794144 
#RW                               ratio  0.975888 
#BZERO                            ratio  1.00524 
#ZERO                             ratio  0.716692 
#MEMSET                           ratio  1.02908 
#MEMCPY                           ratio  0.926736 
#MEMCPY2                          ratio  0.611401 
#LINEAR_CHASE                     ratio  0.977609 
#MORTON_CHASE                     ratio  0.541298 
#Level 3 (RAM) (sample size 2^2 times the last cache size):
#size                                   size    level   bw(MBps)
READ                                    12582912        4       1874.39
WRITE                                   12582912        4       1778.48
RW                                      12582912        4       902.22
BZERO                                   12582912        4       4118.28
ZERO                                    12582912        4       1682.59
MEMSET                                  12582912        4       3933.76
MEMCPY                                  12582912        4       4610.96
MEMCPY2                                 12582912        4       1850.51
LINEAR_CHASE                            12582912        4       1044.56
MORTON_CHASE                            12582912        4       533.563
#READ                             ratio  0.898 
#WRITE                            ratio  1.01312 
#RW                               ratio  1.00114 
#BZERO                            ratio  0.993572 
#ZERO                             ratio  0.952902 
#MEMSET                           ratio  0.961043 
#MEMCPY                           ratio  0.672325 
#MEMCPY2                          ratio  0.942704 
#LINEAR_CHASE                     ratio  1.00019 
#MORTON_CHASE                     ratio  0.996242 


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Octave-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/octave-dev

Re: [OctDev] "librsb"+"sparsersb" packages proposal for octave-forge

Reply via email to