Justin Chang <jychan...@gmail.com> writes:

> Matt,
>
> So what's an example of "doing a bunch of iterations to make sending the
> initial datadown worth it"? 

CG/Jacobi for a high resolution problem.  You pretty much have to have
thrown in the towel on finding a good preconditioner, otherwise you'd be
at risk of solving the problem too quickly.  Some groups have shown
acceptable multigrid performance, though it's a tough sell if you're
paying for the coprocessor.

One problem with the 3x bandwidth difference is that GPU algorithms
often require temporaries or multiple passes over the date where a CPU
would be able to do a single pass with little or no temporaries.  In
finite element computations, and also some sparse matrix operations,
those intermediate quantities can more than squander the apparent
bandwidth advantage.

> Is there a correlation between that and arithmetic intensity, where an
> application is likely to be more compute-bound and memory-bandwidth
> bound?

Not really because each iteration accesses the entire sparse matrix.

Attachment: signature.asc
Description: PGP signature

Reply via email to