On 13 Nov 2011, at 11:17, Michele Martone wrote:

> On 20111113@10:55, Carlo de Falco wrote:
>> $ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" octave -q
>> ...
>> How can I check whether the system is being actually handled in parallel?
>> ...
> 
> 
> You can influence the OpenMP environment:
> OMP_NUM_THREADS=1 RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" 
> octave -q
> OMP_NUM_THREADS=2 RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" 
> octave -q
> ...

I was actually rather asking for a way to check a-posteriori ...

> And also play with librsb's cache parameters:
> RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" octave -q
> RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/2M,L1:8/64/32K" octave -q
> RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/4M,L1:8/64/32K" octave -q

What values 

Processor Name: Intel Core 2 Duo
Processor Speed:        2.4 GHz
Number Of Processors:   1
Total Number Of Cores:  2
L2 Cache:       3 MB

> For, only L2 capacity's is influential, but this may change (will
> change for sure --- I expect a second wave of librsb tuning in the
> future, as soon as we adapt it to be stable in sparsersb). 
> 
> Declaring a large L2 cache as in:
> RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/16M,L1:8/64/32K" octave -q
> may "help" getting a the matrix not subdivided at all, and hence getting
> no parallelism during multiplication.

Interesting enough, even with no parallelism, I still get a speed-up

$ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/16M,L1:8/64/32K" OMP_NUM_THREADS=1 
octave -q
>> pkg load bim
>> N = 75; pp = linspace (0, 1, N); msh = bim3c_mesh_properties 
>> (msh3m_structured_mesh (pp, pp, pp, 1, 1:6));
>> dn = bim3c_unknowns_on_faces (msh, 1:6); in = setdiff (1:columns(msh.p), dn);
>> mat= bim3a_laplacian (msh, 1, 1); A = mat(in, in);
>> f  = bim3a_rhs (msh, 1, 1);  b = f(in);
>> As = sparsersb (A);
>> P  = diag (diag (A));
>> tic (); x = pcg (A, b, 1e-7, 1e3, P); toc ()
pcg: converged in 164 iterations. the initial residual norm was reduced 
1.04289e+07 times.
Elapsed time is 12.6074 seconds.
>> tic (); xs = pcg (As, b, 1e-7, 1e3, P); toc ()
pcg: converged in 164 iterations. the initial residual norm was reduced 
1.04289e+07 times.
Elapsed time is 10.0631 seconds.
>> 

Yet multithreading does seem to have an impact:

$ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/3M,L1:8/64/32K" OMP_NUM_THREADS=2 
octave -q
>> pkg load bim
>> N = 75; pp = linspace (0, 1, N); msh = bim3c_mesh_properties 
>> (msh3m_structured_mesh (pp, pp, pp, 1, 1:6));
>> dn = bim3c_unknowns_on_faces (msh, 1:6); in = setdiff (1:columns(msh.p), dn);
>> mat= bim3a_laplacian (msh, 1, 1); A = mat(in, in);
>> f  = bim3a_rhs (msh, 1, 1);  b = f(in);
>> As = sparsersb (A);
>> P  = diag (diag (A));
>> tic (); x = pcg (A, b, 1e-7, 1e3, P); toc ()
pcg: converged in 164 iterations. the initial residual norm was reduced 
1.04289e+07 times.
Elapsed time is 12.6412 seconds.
>> As = sparsersb (A);
>> tic (); xs = pcg (As, b, 1e-7, 1e3, P); toc ()
pcg: converged in 164 iterations. the initial residual norm was reduced 
1.04289e+07 times.
Elapsed time is 7.9651 seconds.

c.




------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to