On 13 Nov 2011, at 11:17, Michele Martone wrote: > On 20111113@10:55, Carlo de Falco wrote: >> $ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" octave -q >> ... >> How can I check whether the system is being actually handled in parallel? >> ... > > > You can influence the OpenMP environment: > OMP_NUM_THREADS=1 RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" > octave -q > OMP_NUM_THREADS=2 RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" > octave -q > ...
I was actually rather asking for a way to check a-posteriori ... > And also play with librsb's cache parameters: > RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/512K,L1:8/64/32K" octave -q > RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/2M,L1:8/64/32K" octave -q > RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/4M,L1:8/64/32K" octave -q What values Processor Name: Intel Core 2 Duo Processor Speed: 2.4 GHz Number Of Processors: 1 Total Number Of Cores: 2 L2 Cache: 3 MB > For, only L2 capacity's is influential, but this may change (will > change for sure --- I expect a second wave of librsb tuning in the > future, as soon as we adapt it to be stable in sparsersb). > > Declaring a large L2 cache as in: > RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/16M,L1:8/64/32K" octave -q > may "help" getting a the matrix not subdivided at all, and hence getting > no parallelism during multiplication. Interesting enough, even with no parallelism, I still get a speed-up $ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/16M,L1:8/64/32K" OMP_NUM_THREADS=1 octave -q >> pkg load bim >> N = 75; pp = linspace (0, 1, N); msh = bim3c_mesh_properties >> (msh3m_structured_mesh (pp, pp, pp, 1, 1:6)); >> dn = bim3c_unknowns_on_faces (msh, 1:6); in = setdiff (1:columns(msh.p), dn); >> mat= bim3a_laplacian (msh, 1, 1); A = mat(in, in); >> f = bim3a_rhs (msh, 1, 1); b = f(in); >> As = sparsersb (A); >> P = diag (diag (A)); >> tic (); x = pcg (A, b, 1e-7, 1e3, P); toc () pcg: converged in 164 iterations. the initial residual norm was reduced 1.04289e+07 times. Elapsed time is 12.6074 seconds. >> tic (); xs = pcg (As, b, 1e-7, 1e3, P); toc () pcg: converged in 164 iterations. the initial residual norm was reduced 1.04289e+07 times. Elapsed time is 10.0631 seconds. >> Yet multithreading does seem to have an impact: $ RSB_USER_SET_MEM_HIERARCHY_INFO="L2:4/64/3M,L1:8/64/32K" OMP_NUM_THREADS=2 octave -q >> pkg load bim >> N = 75; pp = linspace (0, 1, N); msh = bim3c_mesh_properties >> (msh3m_structured_mesh (pp, pp, pp, 1, 1:6)); >> dn = bim3c_unknowns_on_faces (msh, 1:6); in = setdiff (1:columns(msh.p), dn); >> mat= bim3a_laplacian (msh, 1, 1); A = mat(in, in); >> f = bim3a_rhs (msh, 1, 1); b = f(in); >> As = sparsersb (A); >> P = diag (diag (A)); >> tic (); x = pcg (A, b, 1e-7, 1e3, P); toc () pcg: converged in 164 iterations. the initial residual norm was reduced 1.04289e+07 times. Elapsed time is 12.6412 seconds. >> As = sparsersb (A); >> tic (); xs = pcg (As, b, 1e-7, 1e3, P); toc () pcg: converged in 164 iterations. the initial residual norm was reduced 1.04289e+07 times. Elapsed time is 7.9651 seconds. c. ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev