On Thu, Jul 21, 2016 at 02:07:05PM -0600, Jeff Law wrote:
> On 06/23/2016 03:57 AM, Dominik Vogt wrote:
> >>and use that rather than rounding size up to an alignment boundary.
> >
> >Not exactly.  Consider the unpatched code.  At the beginning we
> >have some amount of space to be allocated on the stack at runtime
> >("SSIZE"), some requested alignment for it ("SALIGN").
> >
> >get_dynamic_stack_size() first calculates the space needed for run
> >time alignment:
> >
> >  SIZE = SSIZE + SALIGN - 1
> >
> >Then it calls round_push() to add *another* chunk of memory to the
> >allocation size to be able to align it to the required stack slot
> >alignment ("SLOTALIGN") at run time.
> >
> >  SIZE = SIZE + SLOTALIGN - 1
> >       = SSIZE + (SALIGN - 1) + (SLOTALIGN - 1)
> >
> >Now it has added two chunks of memory but alignment is only done
> >once.  With the patch it just adds the maximum of (SALIGN - 1) and
> >(SLOTALIGN - 1), not both.  Thinking about it, the "round_push"
> >stuff is a very complicated way of saying "add max(A, B)".
> Now I see it.  Thanks, that helped a ton.
> 
> >
> >I'd volunteer to clean this up more, but preferrably when the two
> >pending patches are in.  The current code is a real brain-twister.
> I'd be all for such cleanups after we wrap up the pending patches.
> It's certainly a rats nest of code right now.
> 
> This patch is fine for the trunk.  Thanks for your patience.

Actually I was goind to abandon the patch in its current state.
:-)  We talked about it internally and concluded that the problem
is really this:

 * get_dynamic_stack_size is passed a SIZE of a data block (which
   is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the
   alignment of the underlying memory units (e.g. 32 bytes split
   into 4 times 8 bytes = 64 bit alignment) and the
   REQUIRED_ALIGN of the data portion of the allocated memory.
 * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8
   and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9.
   This is what is needed to have two bytes 8-byte-aligned at some
   memory location without any known alignment.
 * Finally round_push is called to round up SIZE to a multiple of
   the stack slot size.

The key to understanding this is that the function assumes that
STACK_DYNMAIC_OFFSET is completely unknown at the time its called
and therefore it does not make assumptions about the alignment of
STACKPOINTER + STACK_DYNMAIC_OFFSET.  The latest patch simply
hard-codes that SP + SDO is supposed to be aligned to at least
stack slot size (and does that in a very complicated way).  Since
there is no guarantee that this is the case on all targets, the
patch is broken.  It may miscalculate a SIZE that is too small in
some cases.

However, on many targets there is some guarantee about the
alignment of SP + SDO even if the actual value of SDO is unknown.
On s390x it's always 8-byte-aligned (stack slot size).  So the
right fix should be to add knowledge about the target's guaranteed
alignment of SP + SDO to the function.  I'm right now testing a
much simpler patch that uses
REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the
alignment.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

Reply via email to