On 12/02/15 08:10, Jakub Jelinek wrote:
On Wed, Dec 02, 2015 at 08:02:47AM -0500, Nathan Sidwell wrote:
Always the whole stack, from the current stack pointer up to top of the
stack, so sometimes a few bytes, sometimes a few kilobytes or more each time?
The frame of the current function. Not the whole stack. As I said, there's no
visibility of the stack beyond the current function. (one could implement some
kind of chaining, I guess)
PTX does not expose the concept of a stack at all. No stack pointer, no link
register, no argument pushing.
It does expose 'local' memory, which is private to a thread and only live during
a function (not like function-scope 'static'). From that we construct stack frames.
The rules of PTX are such that one can (almost) determine the call graph
statically. I don't know whether the JIT implements .local as a stack or
statically allocates it (and perhaps uses a liveness algorithm to determine
which pieces may overlap). Perhaps it depends on the physical device capabilities.
The 'almost' fails with indirect calls, except that
1) at an indirect call, you may specify the static set of fns you know it'll
resolve to
2) if you don't know that, you have to specify the function prototype anyway.
So the static set would be 'all functions of that type'.
I don't know if the JIT makes use of that information.
nathan