On 12/02/15 10:12, Jakub Jelinek wrote:

If we have a reasonable IPA pass to discover which addressable variables can
be shared by multiple threads and which can't, then we could use soft-stack
for those that can be shared by multiple PTX threads (different warps, or
same warp, different threads in it), then we shouldn't need to copy any
stack, just broadcast the scalar vars.

Note the current scalar (.reg) broadcasting uses the live register set. Not the subset of that that is actually read within the partitioned region. That'd be a relatively straightforward optimization I think.

nathan

Reply via email to