[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Lewis Baker via Phabricator via cfe-commits Fri, 05 Mar 2021 17:41:43 -0800

lewissbaker added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4450
+    Value *Extra = Builder.CreateSelect(Cmp, Diff, Zero);
+    return RValue::get(Builder.CreateAdd(SizeCall, Extra));
   }
----------------
ychen wrote:
> rjmccall wrote:
> > ychen wrote:
> > > ychen wrote:
> > > > rjmccall wrote:
> > > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > > to round up to get an aligned frame.  But how do you round back down 
> > > > > to get the actual pointer that you need to delete?  This just doesn't 
> > > > > work.
> > > > > 
> > > > > You really ought to just be using the aligned `operator new` instead 
> > > > > when the required alignment is too high.  If that means checking the 
> > > > > alignment "dynamically" before calling `operator new` / `operator 
> > > > > delete`, so be it.  In practice, it will not be dynamic because 
> > > > > lowering will replace the `coro.align` call with a constant, at which 
> > > > > point the branch will be foldable.
> > > > > 
> > > > > I don't know what to suggest if the aligned `operator new` isn't 
> > > > > reliably available on the target OS.  You could outline a function to 
> > > > > pick the best allocator/deallocator, I suppose.
> > > > Thanks for the review!
> > > > 
> > > > > Okay, so you're implicitly increasing the coroutine size to allow you 
> > > > > to round up to get an aligned frame. But how do you round back down 
> > > > > to get the actual pointer that you need to delete? This just doesn't 
> > > > > work.
> > > > 
> > > > Hmm, you're right that I missed the `delete` part, thanks. The adjusted 
> > > > amount is constant, I could just dial it down in `coro.free`, right?
> > > > 
> > > > >  You really ought to just be using the aligned operator new instead 
> > > > > when the required alignment is too high. If that means checking the 
> > > > > alignment "dynamically" before calling operator new / operator 
> > > > > delete, so be it. In practice, it will not be dynamic because 
> > > > > lowering will replace the coro.align call with a constant, at which 
> > > > > point the branch will be foldable.
> > > >  
> > > > That's my intuition at first. But spec is written in a way suggesting 
> > > > (IMHO) that the aligned version should not be used? What if the user 
> > > > specify their own allocator, then which one they should use?
> > > Sorry, I meant the adjusted amount is coro.align - std::max_align_t,  I 
> > > could subtract it in `coro.free` . I think it should work?
> > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > addition, it's an addition plus a mask, which isn't reversible.  Suppose 
> > the frame needs to be 32-byte-aligned, but the allocator only promises 
> > 8-byte alignment.  The problem is that when you go to free a frame pointer, 
> > and you see that it's 32-byte-aligned (which, again, it always will be), 
> > the pointer you got from the allocator might be the frame pointer minus any 
> > of 8, 16, or 24 — or it might be exactly the same.  The only way to reverse 
> > that is to store some sort of cookie, either the amount to subtract or even 
> > just the original pointer.
> > 
> > Now, if you could change the entire coroutine ABI, you could make the frame 
> > handle that you pass around be the unadjusted pointer and then just repeat 
> > the adjustment every time you enter the coroutine.  But that doesn't work 
> > because the ABI relies on things like the promise being at a reliable 
> > offset from the frame handle.
> > 
> > I think the best solution would be to figure out a way to use an aligned 
> > allocator, which at worst does this in a more systematic way and at best 
> > can actually just satisfy your requirement directly without any overhead.  
> > If you can't do that, adding an offset to the frame would be best; if you 
> > can't do that, doing it as a cookie is okay.
> > 
> > > That's my intuition at first. But spec is written in a way suggesting 
> > > (IMHO) that the aligned version should not be used? What if the user 
> > > specify their own allocator, then which one they should use?
> > 
> > It seems like a spec bug that this doesn't use aligned allocators even when 
> > they're available.  If there's an aligned allocator available, I think this 
> > should essentially do dynamically what it would normally do statically, 
> > i.e.:
> > 
> > ```
> > void *allocation = alignment > __STDCPP_DEFAULT_NEW_ALIGNMENT__ ? operator 
> > new(size, align_val_t(alignment)) : operator new(size);
> > ```
> > 
> > This would ODR-use both allocation functions, of course.
> > 
> > Maybe it's right to do this cookie thing if we can't rely on an aligned 
> > allocator, like if the promise class provides only an `operator 
> > new(size_t)`.
> > No, because the adjustment you have to do in `coro.alloc` isn't just an 
> > addition, it's an addition plus a mask, which isn't reversible.  Suppose 
> > the frame needs to be 32-byte-aligned, but the allocator only promises 
> > 8-byte alignment.  The problem is that when you go to free a frame pointer, 
> > and you see that it's 32-byte-aligned (which, again, it always will be), 
> > the pointer you got from the allocator might be the frame pointer minus any 
> > of 8, 16, or 24 — or it might be exactly the same.  The only way to reverse 
> > that is to store some sort of cookie, either the amount to subtract or even 
> > just the original pointer.
> 
> I got myself confused. This makes perfect sense.
> 
> > 
> > Now, if you could change the entire coroutine ABI, you could make the frame 
> > handle that you pass around be the unadjusted pointer and then just repeat 
> > the adjustment every time you enter the coroutine.  But that doesn't work 
> > because the ABI relies on things like the promise being at a reliable 
> > offset from the frame handle.
> > 
> > I think the best solution would be to figure out a way to use an aligned 
> > allocator, which at worst does this in a more systematic way and at best 
> > can actually just satisfy your requirement directly without any overhead.  
> > If you can't do that, adding an offset to the frame would be best; if you 
> > can't do that, doing it as a cookie is okay.
> > 
> 
> This is very helpful. I'll explore the adding offset to the frame option 
> first. If it is not plausible, I'll use the cookie method. Thanks!
> 
There was a proposal to extend the coroutine specification with support for the 
align_val_t overloads of operator new() when allocating coroutine frames. 
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf

Unfortunately this was not adopted at the time as it was proposed late in the 
C++20 cycle and there was not yet any implementation experience.

So for now, if the compiler determines that the frame needs to be aligned to a 
value greater than the default alignment of global operator new it will need to 
overallocate, align the frame within that buffer and store the offset applied 
somewhere so that the it can reconstruct the address of the pointer returned 
from operator new() so that it can pass it to operator delete() on 
coroutine_handle::destroy().

Note that there was also a meeting to discuss ABI for coroutine frames with the 
intent that the major coroutines implementations would all (eventually) agree 
on a compatible coroutine ABI.

The results of this meeting was written up in the doc 
https://docs.google.com/document/d/1t53lAuQNd3rPqN-VByZabwL6PL2Oyl4zdJxm-gQlhfU/edit?usp=sharing

The end result was that we decided to place any padding needed to align the 
promise before the resume/destroy function pointers rather than place that 
padding in-between the function-pointers and the promise. The rationale here 
being that we can then calculate the address of the promise as a constant 
offset from the frame address (typically at an offset of two pointers into the 
frame) rather than the offset being variable depending on the promise type's 
alignment. This should help building of certain tooling / debuggers / walking 
async stack-traces etc. as we don't need to know the exact promise type to be 
able to determine the location of the promise.

The compiler should know exactly how many bytes of padding was added at the 
start of the frame allocation to get to the frame address and so should be able 
to translate the coroutine frame address back to the allocation address before 
destruction - however this also may have an interplay with the support for 
overaligned frames (which may be required due to overaligned local 
variables/args and not only based on the promise-type), so I'm mentioning it 
here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Reply via email to