[petsc-dev] Performance of VecMDot_SeqCUSP

Aron Ahmadia Tue, 24 Apr 2012 22:44:20 +0300

I'm interested in seeing this too, especially if somebody can explain the
results after they've been demonstrated :)


A

On Tue, Apr 24, 2012 at 10:42 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> On Tue, Apr 24, 2012 at 14:29, Daniel Lowell <redratio1 at gmail.com> wrote:
>
>> Launching smaller overlapping asynchronous kernels can have speed up if
>> your vectors are large and you are doing reductions. This way warps stalls
>> can be compensated for, and latencies can be hidden. Not sure what you mean
>> "the way it currently is" though...
>
>
> The reduction is only needed at the end. Any sequential launch adds
> artificial synchronization. I'd be interested to see the performance
> comparison, but I'd be surprised if independent kernel launches were faster
> than a decent implementation with one kernel launch.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/ec372597/attachment.html>

[petsc-dev] Performance of VecMDot_SeqCUSP

Reply via email to