There are a few things: * GPU have higher latencies and so you basically need a large enough problem to get GPU speedup * I assume you are assembling the matrix on the CPU. The copy of data to the GPU takes time and you really should be creating the matrix on the GPU * I agree with Barry, Roughly 1M / GPU is around where you start seeing a win but this depends on a lot of things. * There are startup costs, like the CPU-GPU copy. It is best to run one mat-vec, or whatever, push a new stage and then run the benchmark. The timing for this new stage will be separate in the log view data. Look at that. - You can fake this by running your benchmark many times to amortize any setup costs.
On Fri, Jan 14, 2022 at 4:27 PM Rohan Yadav <roh...@alumni.cmu.edu> wrote: > Hi, > > I'm looking to use PETSc with GPUs to do some linear algebra operations, > like SpMV, SPMM etc. Building PETSc with `--with-cuda=1` and running with > `-mat_type aijcusparse -vec_type cuda` gives me a large slowdown from the > same code running on the CPU. This is not entirely unexpected, as things > like data transfer costs across the PCIE might erroneously be included in > my timing. Are there some examples of benchmarking GPU computations with > PETSc, or just the proper way to write code in PETSc that will work for > CPUs and GPUs? > > Rohan >