Justin Chang <jychan...@gmail.com> writes: > Matt, > > So what's an example of "doing a bunch of iterations to make sending the > initial datadown worth it"?
CG/Jacobi for a high resolution problem. You pretty much have to have thrown in the towel on finding a good preconditioner, otherwise you'd be at risk of solving the problem too quickly. Some groups have shown acceptable multigrid performance, though it's a tough sell if you're paying for the coprocessor. One problem with the 3x bandwidth difference is that GPU algorithms often require temporaries or multiple passes over the date where a CPU would be able to do a single pass with little or no temporaries. In finite element computations, and also some sparse matrix operations, those intermediate quantities can more than squander the apparent bandwidth advantage. > Is there a correlation between that and arithmetic intensity, where an > application is likely to be more compute-bound and memory-bandwidth > bound? Not really because each iteration accesses the entire sparse matrix.
signature.asc
Description: PGP signature