Hi Brian,

On Tue, 7 Sep 2010 19:56:43 +0000, Brian Menounos <menou...@unbc.ca> wrote:
> Hi Andreas - I realize you're pretty busy answering emails of late, so answer 
> when you can... 

Yeah, sorry. Pretty swamped ATM. I hope things will clear out a bit
during the fall semester, but so far I don't see when that would
be happening...

Btw, I cc'd the list on my reply. Hope you don't mind. Please keep them
in the loop (for archival purposes) unless you're discussing something
confidential.

> I've attached your SparseSolve.py examples tweaked to deal with two
> pickled numpy arrays (1D and 2D) in order to try out pycuda's
> conjugate gradient (cg) function.
> 
> I'm typically building sparse matrices and doing iterative cg calls as
> part of a numerical model for mountain glaciation. I was hoping to
> speed up the cg function within scipy by sending the task to my
> gpu. However, what is clear is that much time is spent assembling the
> packets (your PacketedSpMV() function) before execution of
> solve_pkt_with_cg().
> 
> I need to execute cg for each time step of my model (typically 1-1.yr
> steps for 10,000 yr integration) and this is the part of the model
> where most time is spent. Any speed up here would be ideal.
> 
> However, the performance is about 20 times slower than if run on a
> single cpu using scipy's cg function. I knew there would be some
> overhead for reading/writing to the GPU, but I wasn't expecting this
> much time in packet assembly. Am I wasting my time trying to do this
> on a GPU? I apologize in advance for my deficit in GPU/parallel
> coding!

Does the sparsity structure of the matrix change? If not, you could
simply scatter the new entries into the existing data structure, which
would be pretty fast (but would still require a little additional code
on top of what's there).

If your structure *does* change and can't be predicted/generalized over
somehow, then the present code is simply not for you. It spends a
significant amount of (CPU!) time building, partitioning and
transferring, under the assumption that this only happens during
preprocessing. The actual CG and matrix-vector products are tuned to be
fast. If you'd want to accommodate changing sparsity patterns, you'd
have to GPU-ify assembly, but I don't think even cusp [1] does that.

Andreas

[1] http://code.google.com/p/cusp-library/

Attachment: pgpDz54r45gB9.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to