Memory access is typically a significant bottleneck in sparse mat-vec, so unfortunately I'm skeptical that one could achieve good performance using Julia's current distributed memory approach on a multicore machine. This really calls for something like OpenMP.
On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote: > > I'm developing an iterative optimization algorithm in Julia along the > lines of other contributions to the Iterative Solvers > project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov > Subspace > <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module > whose > only computationally intensive step is computing A*b or A'*b. I would like > to parallelize the method by using a parallel sparse matrix vector > multiply. Is there a standard backend matrix-vector multiply that's > recommended in Julia if I'm targeting a shared memory computer with a large > number of processors? Similarly, is there a recommended backend for > targeting a cluster? My matrices can easily reach 10 million rows by 1 > million columns, with sparsity anywhere from .01% to problems that are > nearly diagonal. > > I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> talking > about integrating PETSc as a backend for this purpose, but it looks like > the project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has > stalled - the last commits I see are a year ago. I'm also interested in > other backends, eg Spark <http://spark.incubator.apache.org/>, > SciDB<http://scidb.org/>, > etc. > > I'm more interested in solving sparse problems, but as a side note, the > built-in BLAS acceleration by changing the number of threads > `blas_set_num_threads` > works ok for dense problems using a moderate number of processors. I wonder > why the number of threads isn't set higher than one by default, for > example, using as many as nprocs() cores? >