On 21/06/17 00:22, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> But the stack pointer might have already been advanced into the guard
>> page by the caller.   For the sake of argument assume the guard page is
>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
>> the caller hasn't touched the 0xf1000 page.
>>
>> If FrameSize >= 32, then the stores are going to hit the 0xf0000 page
>> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
>> to emit a probe prior to this stack allocation.
> 
> That's an incorrect ABI that allows adjusting the frame by 4080+32! A
> correct
> one might allow say 1024 bytes for outgoing arguments. That means when
> you call a function, there is still guard-page-size - 1024 bytes left
> that you can
> use to allocate locals. With a 4K guard page that allows leaf functions
> up to 3KB,
> and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing
> arguments without inserting any probes beyond the normal frame stores.
> 
> This design means almost no functions need additional probes. Assuming we're
> also increasing the guard page size to 64KB, it's cheap even for large
> functions.
> 
> Wilco

A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
stack which, with at least 8 parameters passed in registers, would allow
for calls with 40 parameters.  There can't be many in that space.  Any
function making calls with more than that might need additional probes,
but that's going to be exceedingly rare.

Put the cost on the least common sequences, even if they pay
disproportionately - it will be a win over all.

R.

Reply via email to