[petsc-dev] Performance of VecMDot_SeqCUSP

Jed Brown Tue, 24 Apr 2012 14:42:28 -0500

On Tue, Apr 24, 2012 at 14:29, Daniel Lowell <redratio1 at gmail.com> wrote:


> Launching smaller overlapping asynchronous kernels can have speed up if
> your vectors are large and you are doing reductions. This way warps stalls
> can be compensated for, and latencies can be hidden. Not sure what you mean
> "the way it currently is" though...


The reduction is only needed at the end. Any sequential launch adds
artificial synchronization. I'd be interested to see the performance
comparison, but I'd be surprised if independent kernel launches were faster
than a decent implementation with one kernel launch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/895b96f4/attachment.html>

[petsc-dev] Performance of VecMDot_SeqCUSP

Reply via email to