Miles, you're right that writing sparse matrix vector products in native
Julia probably won't be the best idea given Julia's model of parallelism.
That's why I'm interested in calling an outside library like PETSc.

I see it's possible to link Julia with MKL. I haven't tried this yet, but
if I do, will A*b (where A is sparse) call MKL to perform the matrix vector
product?


On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <miles.lu...@gmail.com> wrote:

> Memory access is typically a significant bottleneck in sparse mat-vec, so
> unfortunately I'm skeptical that one could achieve good performance using
> Julia's current distributed memory approach on a multicore machine. This
> really calls for something like OpenMP.
>
>
> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote:
>>
>> I'm developing an iterative optimization algorithm in Julia along the
>> lines of other contributions to the Iterative Solvers 
>> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov
>> Subspace
>> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module
>>  whose
>> only computationally intensive step is computing A*b or A'*b. I would like
>> to parallelize the method by using a parallel sparse matrix vector
>> multiply. Is there a standard backend matrix-vector multiply that's
>> recommended in Julia if I'm targeting a shared memory computer with a large
>> number of processors? Similarly, is there a recommended backend for
>> targeting a cluster? My matrices can easily reach 10 million rows by 1
>> million columns, with sparsity anywhere from .01% to problems that are
>> nearly diagonal.
>>
>> I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> talking
>> about integrating PETSc as a backend for this purpose, but it looks like
>> the 
>> project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has 
>> stalled - the last commits I see are a year ago. I'm also interested in
>> other backends, eg Spark <http://spark.incubator.apache.org/>, 
>> SciDB<http://scidb.org/>,
>> etc.
>>
>> I'm more interested in solving sparse problems, but as a side note, the
>> built-in BLAS acceleration by changing the number of threads 
>> `blas_set_num_threads`
>> works ok for dense problems using a moderate number of processors. I wonder
>> why the number of threads isn't set higher than one by default, for
>> example, using as many as nprocs() cores?
>>
>


-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Reply via email to