Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

Nathan Sidwell Wed, 02 Dec 2015 05:03:31 -0800

On 12/02/15 05:40, Jakub Jelinek wrote:

 Don't know the HW good enough, is there any power consumption, heat etc.
difference between the two approaches?  I mean does the HW consume different
amount of power if only one thread in a warp executes code and the other
threads in the same warp just jump around it, vs. having all threads busy?

Having all threads busy will increase power consumption. It's also bad if theother vectors are executing memory access instructions. However, for smallblocks, it is probably a win over the jump around approach. One of theoptimizations for the future of the neutering algorithm is to add suchpredication for small blocks and keep branching for the larger blocks.

How exactly does OpenACC copy the stack?  At least for OpenMP, one could
have automatic vars whose addresses are passed to simd regions in different
functions, say like:

The stack frame of the current function is copied when entering a partitionedregion. (There is no visibility of caller's frame and such.) Again,optimization would be trying to only copy the stack that's used in thepartitioned region.


nathan

Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

Reply via email to