Memory access is typically a significant bottleneck in sparse mat-vec, so 
unfortunately I'm skeptical that one could achieve good performance using 
Julia's current distributed memory approach on a multicore machine. This 
really calls for something like OpenMP. 

On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote:
>
> I'm developing an iterative optimization algorithm in Julia along the 
> lines of other contributions to the Iterative Solvers 
> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov 
> Subspace 
> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module
>  whose 
> only computationally intensive step is computing A*b or A'*b. I would like 
> to parallelize the method by using a parallel sparse matrix vector 
> multiply. Is there a standard backend matrix-vector multiply that's 
> recommended in Julia if I'm targeting a shared memory computer with a large 
> number of processors? Similarly, is there a recommended backend for 
> targeting a cluster? My matrices can easily reach 10 million rows by 1 
> million columns, with sparsity anywhere from .01% to problems that are 
> nearly diagonal.
>
> I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> talking 
> about integrating PETSc as a backend for this purpose, but it looks like 
> the project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has 
> stalled - the last commits I see are a year ago. I'm also interested in 
> other backends, eg Spark <http://spark.incubator.apache.org/>, 
> SciDB<http://scidb.org/>, 
> etc. 
>
> I'm more interested in solving sparse problems, but as a side note, the 
> built-in BLAS acceleration by changing the number of threads 
> `blas_set_num_threads` 
> works ok for dense problems using a moderate number of processors. I wonder 
> why the number of threads isn't set higher than one by default, for 
> example, using as many as nprocs() cores?
>

Reply via email to