Thanks, Jake. It seems like I'm still not able to go above 6 threads, but that may be deep in my MKL install.
I've begun implementing a shared sparse matrix class in native Julia that seems to scale better. The code is attached to this email. How can I overload A'*b for my SharedBilinearOperator class, so that I can call At_mul_B on a stored copy of A', rather than computing A' each time I want to multiply by it? ie can I write something like ('*)(L::SharedBilinearOperator,x) = At_mul_B(L.A,x) ? On Tue, Feb 11, 2014 at 8:40 AM, Jake Bolewski <jakebolew...@gmail.com>wrote: > Hey Madeleine, > > First I would check that your global environment > variables<http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-0DE6A77B-00E0-4ED6-9CAE-52FCF49E5623.htm>are > set up correctly. If you want to set up the number of threads > programmatically: > > _ _ _(_)_ | A fresh approach to technical computing > (_) | (_) (_) | Documentation: http://docs.julialang.org > _ _ _| |_ __ _ | Type "help()" to list help topics > | | | | | | |/ _` | | > | | |_| | | | (_| | | Version 0.3.0-prerelease+1470 (2014-02-08 16:23 > UTC) > _/ |\__'_|_|_|\__'_| | Commit 596d5c4* (3 days old master) > |__/ | x86_64-unknown-linux-gnu > > julia> Base.blas_vendor() > :mkl > > julia> Base.blas_set_num_threads(12) > > > You can see the relevant code here: > https://github.com/JuliaLang/julia/blob/8ac5a7f7fff1c54a768c7bc9ae85cf053d310f42/base/util.jl#L293 > . > It's always worth a quick search through the code base to figure out what > is going on behind the scenes. > > Hope that helps! > Jake > > On Tuesday, February 11, 2014 12:34:56 AM UTC-5, Madeleine Udell wrote: >> >> Jake, thanks for the >> reference<http://software.intel.com/en-us/forums/topic/294954>; >> I have 32 hyperthreaded cores, so there's definitely something else going >> on to limit me to 6 in addition to not exploiting hyperthreading. >> > > >> Perhaps I need to call something like omp_set_num_threads()? But there >> doesn't seem to be a function by this name in the libmkl_rt library. >> >> julia> ccall((:omp_set_num_threads,Base.libblas_name),Ptr{Void},( >> Uint8,),32) >> ERROR: ccall: could not find function omp_set_num_threads in library >> libmkl_rt >> in anonymous at no file >> >> >> On Mon, Feb 10, 2014 at 4:05 PM, Jake Bolewski <jakebo...@gmail.com>wrote: >> >>> Are those hyper-threaded "cores"? If so, check Intel MKL's >>> documentation on hyper-threading. >>> >>> -Best >>> Jake >>> >>> On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote: >>> >>>> It looks like only 6 threads are being used when I use mkl from julia. >>>> If I do blas_set_num_threads(n), then using top, I see julia is >>>> running at min(n,6)*100% cpu. Any idea why this would be, or how to >>>> get mkl to use more threads? I'm not sure if this is an issue in julia >>>> or with my installation of mkl. >>>> >>>> On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen >>>> <andreasno...@gmail.com> wrote: >>>> > You are welcome. Good to hear that it worked. >>>> > >>>> > >>>> > 2014-02-10 22:35 GMT+01:00 Madeleine Udell <madelei...@gmail.com>: >>>> > >>>> >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and >>>> am able >>>> >> to run the following test successfully: >>>> >> >>>> >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' >>>> >> matdescra = "GXXF" # G = general, X = ignored, F = one-based >>>> indexing >>>> >> m,n = 50,100 >>>> >> A = sprand(m,n,.01) >>>> >> y = zeros(m) >>>> >> x = rand(n) >>>> >> alpha = 1.0 >>>> >> beta = 1.0 >>>> >> >>>> >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) >>>> >> y_builtin = A*x >>>> >> >>>> >> julia> y==y_builtin >>>> >> true >>>> >> >>>> >> >>>> >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen >>>> >> <andreasno...@gmail.com> wrote: >>>> >>> >>>> >>> Hi Madeleine >>>> >>> >>>> >>> When compiling julia with MKL you'll have to do make cleanall as >>>> well as >>>> >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and >>>> make -C >>>> >>> distclean-suitesparse. It is also easier to create a Make.user >>>> there you set >>>> >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, >>>> e.g. >>>> >>> /opt/intel/mkl. The arguments are explained here >>>> >>> >>>> >>> >>>> >>> http://software.intel.com/sites/products/documentation/hpc/ >>>> mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm# >>>> GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2 >>>> >>> >>>> >>> but transa determines if the operation is Ax or A'x and matdescra >>>> has >>>> >>> information about the structure of the matrix, e.g. if it is >>>> triangular. >>>> >>> When you have succeeded in compiling julia with MKL, the libblas >>>> variable >>>> >>> should just be Base.libblas_name. >>>> >>> >>>> >>> >>>> >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <madelei...@gmail.com>: >>>> >>> >>>> >>>> I'm having trouble using MKL in julia. I changed Make.inc so that >>>> >>>> USE_MKL = 1, >>>> >>>> but when I make and run julia, I find that Base.libblas_name = >>>> >>>> "libopenblas". Is this expected? I would have thought it would be >>>> eg >>>> >>>> "libmkl_core". >>>> >>>> >>>> >>>> Andreas, I found your wrappers for MKL in this PR. I've not used >>>> MKL >>>> >>>> before, so I don't understand the call signature of those >>>> functions in order >>>> >>>> to call MKL directly. Any chance you could explain what are >>>> transa::BlasChar >>>> >>>> and matdescra::ASCIIString in the following function, and which >>>> mkl library >>>> >>>> is expected in the libblas variable? I see many .so files in the >>>> lib/intel64 >>>> >>>> directory of my mkl installation, and I'm not sure which one to >>>> point julia >>>> >>>> to. >>>> >>>> >>>> >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, >>>> >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, >>>> >>>> y::StridedVector{$T}) >>>> >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) >>>> columns >>>> >>>> multiplied with vector of length $(length(x))")) >>>> >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length >>>> $(A.m) >>>> >>>> added to vector of length $(length(y))")) # >>>> >>>> ccall(($(string(mv)), libblas), Void, >>>> >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, >>>> >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, >>>> >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), >>>> >>>> &transa, &A.m, &A.n, &α, >>>> >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), >>>> >>>> pointer(A.colptr, 2), x, &β, y) >>>> >>>> return y >>>> >>>> end >>>> >>>> >>>> >>>> Thanks for your help! >>>> >>>> Madeleine >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen >>>> >>>> <andreasno...@gmail.com> wrote: >>>> >>>>> >>>> >>>>> A*b will not call MKL when A is sparse. There has been some >>>> writing >>>> >>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) >>>> with the >>>> >>>>> MKL versions and I actually wrote wrappers for the sparse MKL >>>> subroutines in >>>> >>>>> the fall for the same reason. >>>> >>>>> >>>> >>>>> >>>> >>>>> 2014-02-05 Madeleine Udell <madelei...@gmail.com>: >>>> >>>>> >>>> >>>>>> Miles, you're right that writing sparse matrix vector products >>>> in >>>> >>>>>> native Julia probably won't be the best idea given Julia's model >>>> of >>>> >>>>>> parallelism. That's why I'm interested in calling an outside >>>> library like >>>> >>>>>> PETSc. >>>> >>>>>> >>>> >>>>>> I see it's possible to link Julia with MKL. I haven't tried this >>>> yet, >>>> >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform >>>> the matrix >>>> >>>>>> vector product? >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <miles...@gmail.com> >>>> >>>> >>>>>> wrote: >>>> >>>>>>> >>>> >>>>>>> Memory access is typically a significant bottleneck in sparse >>>> >>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve >>>> good >>>> >>>>>>> performance using Julia's current distributed memory approach >>>> on a multicore >>>> >>>>>>> machine. This really calls for something like OpenMP. >>>> >>>>>>> >>>> >>>>>>> >>>> >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine >>>> Udell >>>> >>>>>>> wrote: >>>> >>>>>>>> >>>> >>>>>>>> I'm developing an iterative optimization algorithm in Julia >>>> along >>>> >>>>>>>> the lines of other contributions to the Iterative Solvers >>>> project or Krylov >>>> >>>>>>>> Subspace module whose only computationally intensive step is >>>> computing A*b >>>> >>>>>>>> or A'*b. I would like to parallelize the method by using a >>>> parallel sparse >>>> >>>>>>>> matrix vector multiply. Is there a standard backend >>>> matrix-vector multiply >>>> >>>>>>>> that's recommended in Julia if I'm targeting a shared memory >>>> computer with a >>>> >>>>>>>> large number of processors? Similarly, is there a recommended >>>> backend for >>>> >>>>>>>> targeting a cluster? My matrices can easily reach 10 million >>>> rows by 1 >>>> >>>>>>>> million columns, with sparsity anywhere from .01% to problems >>>> that are >>>> >>>>>>>> nearly diagonal. >>>> >>>>>>>> >>>> >>>>>>>> I've seen many posts talking about integrating PETSc as a >>>> backend >>>> >>>>>>>> for this purpose, but it looks like the project has stalled - >>>> the last >>>> >>>>>>>> commits I see are a year ago. I'm also interested in other >>>> backends, eg >>>> >>>>>>>> Spark, SciDB, etc. >>>> >>>>>>>> >>>> >>>>>>>> I'm more interested in solving sparse problems, but as a side >>>> note, >>>> >>>>>>>> the built-in BLAS acceleration by changing the number of >>>> threads >>>> >>>>>>>> `blas_set_num_threads` works ok for dense problems using a >>>> moderate number >>>> >>>>>>>> of processors. I wonder why the number of threads isn't set >>>> higher than one >>>> >>>>>>>> by default, for example, using as many as nprocs() cores? >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> -- >>>> >>>>>> Madeleine Udell >>>> >>>>>> PhD Candidate in Computational and Mathematical Engineering >>>> >>>>>> Stanford University >>>> >>>>>> www.stanford.edu/~udell >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> -- >>>> >>>>> Med venlig hilsen >>>> >>>>> >>>> >>>>> Andreas Noack Jensen >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Madeleine Udell >>>> >>>> PhD Candidate in Computational and Mathematical Engineering >>>> >>>> Stanford University >>>> >>>> www.stanford.edu/~udell >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Med venlig hilsen >>>> >>> >>>> >>> Andreas Noack Jensen >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Madeleine Udell >>>> >> PhD Candidate in Computational and Mathematical Engineering >>>> >> Stanford University >>>> >> www.stanford.edu/~udell >>>> > >>>> > >>>> > >>>> > >>>> > -- >>>> > Med venlig hilsen >>>> > >>>> > Andreas Noack Jensen >>>> >>>> >>>> >>>> -- >>>> Madeleine Udell >>>> PhD Candidate in Computational and Mathematical Engineering >>>> Stanford University >>>> www.stanford.edu/~udell >>>> >>> >> >> >> -- >> Madeleine Udell >> PhD Candidate in Computational and Mathematical Engineering >> Stanford University >> www.stanford.edu/~udell >> > -- Madeleine Udell PhD Candidate in Computational and Mathematical Engineering Stanford University www.stanford.edu/~udell
parallel_matmul.jl
Description: Binary data
test_parallel_matmul.jl
Description: Binary data