I'm interested in seeing this too, especially if somebody can explain the results after they've been demonstrated :)
A On Tue, Apr 24, 2012 at 10:42 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote: > On Tue, Apr 24, 2012 at 14:29, Daniel Lowell <redratio1 at gmail.com> wrote: > >> Launching smaller overlapping asynchronous kernels can have speed up if >> your vectors are large and you are doing reductions. This way warps stalls >> can be compensated for, and latencies can be hidden. Not sure what you mean >> "the way it currently is" though... > > > The reduction is only needed at the end. Any sequential launch adds > artificial synchronization. I'd be interested to see the performance > comparison, but I'd be surprised if independent kernel launches were faster > than a decent implementation with one kernel launch. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120424/ec372597/attachment.html>