On 12/02/15 08:10, Jakub Jelinek wrote:
On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:

Always the whole stack, from the current stack pointer up to top of the
stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?

The frame of the current function. Not the whole stack. As I said, there's no visibility of the stack beyond the current function. (one could implement some kind of chaining, I guess)

PTX does not expose the concept of a stack at all. No stack pointer, no link register, no argument pushing.

It does expose 'local' memory, which is private to a thread and only live during a function (not like function-scope 'static'). From that we construct stack frames.

The rules of PTX are such that one can (almost) determine the call graph statically. I don't know whether the JIT implements .local as a stack or statically allocates it (and perhaps uses a liveness algorithm to determine which pieces may overlap). Perhaps it depends on the physical device capabilities.

The 'almost' fails with indirect calls, except that
1) at an indirect call, you may specify the static set of fns you know it'll resolve to 2) if you don't know that, you have to specify the function prototype anyway. So the static set would be 'all functions of that type'.

I don't know if the JIT makes use of that information.

nathan

Reply via email to