Hi,

>> The use of aijcusp instead of a dense matrix type certainly adds to the issue.
I know, but I couldn't find a dense gpu type in the petsc manual, please 
correct me if there is any.

There is indeed no dense GPU matrix type in PETSc (yet).


Please send the output of -log_summary so that we can see where most
time is spent.
I am unable to do that as somehow I am having no output when I use that option. 
I also tried to explicitly call PetscLogView but still nothing is printed out.
If I try with one of the slepc examples, I get the output.
Why is this happening? If I run my code with -info or -log_trace I see their 
output, only -log_summary is shy!

Maybe you forgot to call SlepcFinalize()?


If you have good (recent) CPUs in dual-socket configuration, it's more than
unlikely that you will gain anything beyond ~2x with an optimized GPU setup.
Even that ~2x may only be possible with heavily tweaking the current SVD-
implementation in SLEPc, of which I don't know the details.
I used Xeon processors from 2010, just like the GPUs.

Ok, this is actually a relatively GPU-friendly setup, because CPUs have reduced the gap in terms of FLOPs quite a bit (see for example http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/ )

This is not good news, as my supervisor is really optimist about using GPUs and 
getting high speed-ups!
Anyway, at the moment my gpu version is several times slower than the cpu 
version, so even a 2x would be a win now :D

I'd suggest to convince your supervisor into buying/using a cluster with current hardware and enjoy a higher speedup compared to what you could get in an ideal setting with a GPU from 2010 anyway ;-)

(Having said that, I carefully estimate that you can get some performance gains for SVD if you deep-dive into the existing SVD implementation, carefully redesign it to minimize CPU<->GPU communication, and use optimized library routines from the BLAS 3 operations. Currently there is not enough GPU-infrastructure in PETSc to achieve this via command line parameters only.)

Best regards,
Karli

Reply via email to