On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote: > On 12/02/15 05:40, Jakub Jelinek wrote: > > Don't know the HW good enough, is there any power consumption, heat etc. > >difference between the two approaches? I mean does the HW consume different > >amount of power if only one thread in a warp executes code and the other > >threads in the same warp just jump around it, vs. having all threads busy? > > Having all threads busy will increase power consumption. It's also bad if > the other vectors are executing memory access instructions. However, for
Then the uniform SIMT approach might not be that good idea. > small blocks, it is probably a win over the jump around approach. One of > the optimizations for the future of the neutering algorithm is to add such > predication for small blocks and keep branching for the larger blocks. > > >How exactly does OpenACC copy the stack? At least for OpenMP, one could > >have automatic vars whose addresses are passed to simd regions in different > >functions, say like: > > The stack frame of the current function is copied when entering a > partitioned region. (There is no visibility of caller's frame and such.) > Again, optimization would be trying to only copy the stack that's used in > the partitioned region. Always the whole stack, from the current stack pointer up to top of the stack, so sometimes a few bytes, sometimes a few kilobytes or more each time? Jakub