On Wed, 2 Dec 2015, Jakub Jelinek wrote:

> On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
> > On 12/02/15 05:40, Jakub Jelinek wrote:
> > > Don't know the HW good enough, is there any power consumption, heat etc.
> > >difference between the two approaches?  I mean does the HW consume 
> > >different
> > >amount of power if only one thread in a warp executes code and the other
> > >threads in the same warp just jump around it, vs. having all threads busy?
> > 
> > Having all threads busy will increase power consumption.  It's also bad if
> > the other vectors are executing memory access instructions.  However, for
> 
> Then the uniform SIMT approach might not be that good idea.

Why?  Remember that the tradeoff is copying registers (and in OpenACC, stacks
too).  We don't know how the costs balance.  My intuition is that copying is
worse compared to what I'm doing.

Anyhow, for good performance the offloaded code needs to be running in vector
regions most of the time, where the concern doesn't apply.

Alexander

Reply via email to