On Thu, Jul 21, 2016 at 02:07:05PM -0600, Jeff Law wrote: > On 06/23/2016 03:57 AM, Dominik Vogt wrote: > >>and use that rather than rounding size up to an alignment boundary. > > > >Not exactly. Consider the unpatched code. At the beginning we > >have some amount of space to be allocated on the stack at runtime > >("SSIZE"), some requested alignment for it ("SALIGN"). > > > >get_dynamic_stack_size() first calculates the space needed for run > >time alignment: > > > > SIZE = SSIZE + SALIGN - 1 > > > >Then it calls round_push() to add *another* chunk of memory to the > >allocation size to be able to align it to the required stack slot > >alignment ("SLOTALIGN") at run time. > > > > SIZE = SIZE + SLOTALIGN - 1 > > = SSIZE + (SALIGN - 1) + (SLOTALIGN - 1) > > > >Now it has added two chunks of memory but alignment is only done > >once. With the patch it just adds the maximum of (SALIGN - 1) and > >(SLOTALIGN - 1), not both. Thinking about it, the "round_push" > >stuff is a very complicated way of saying "add max(A, B)". > Now I see it. Thanks, that helped a ton. > > > > >I'd volunteer to clean this up more, but preferrably when the two > >pending patches are in. The current code is a real brain-twister. > I'd be all for such cleanups after we wrap up the pending patches. > It's certainly a rats nest of code right now. > > This patch is fine for the trunk. Thanks for your patience.
Actually I was goind to abandon the patch in its current state. :-) We talked about it internally and concluded that the problem is really this: * get_dynamic_stack_size is passed a SIZE of a data block (which is allocated elsewhere), the SIZE_ALIGN of the SIZE (i.e. the alignment of the underlying memory units (e.g. 32 bytes split into 4 times 8 bytes = 64 bit alignment) and the REQUIRED_ALIGN of the data portion of the allocated memory. * Assuming the function is called with SIZE = 2, SIZE_ALIGN = 8 and REQUIRED_ALIGN = 64 it first adds 7 bytes to SIZE -> 9. This is what is needed to have two bytes 8-byte-aligned at some memory location without any known alignment. * Finally round_push is called to round up SIZE to a multiple of the stack slot size. The key to understanding this is that the function assumes that STACK_DYNMAIC_OFFSET is completely unknown at the time its called and therefore it does not make assumptions about the alignment of STACKPOINTER + STACK_DYNMAIC_OFFSET. The latest patch simply hard-codes that SP + SDO is supposed to be aligned to at least stack slot size (and does that in a very complicated way). Since there is no guarantee that this is the case on all targets, the patch is broken. It may miscalculate a SIZE that is too small in some cases. However, on many targets there is some guarantee about the alignment of SP + SDO even if the actual value of SDO is unknown. On s390x it's always 8-byte-aligned (stack slot size). So the right fix should be to add knowledge about the target's guaranteed alignment of SP + SDO to the function. I'm right now testing a much simpler patch that uses REGNO_POINTER_ALIGN(VIRTUAL_STACK_DYNAMIC_REGNUM) as the alignment. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany