Hi Richard,
> On 14 Sep 2023, at 11:18, Richard Biener <[email protected]> wrote:
>
> On Wed, Sep 6, 2023 at 5:44 PM FX Coudert <[email protected]> wrote:
>>
>> ping**2 on the revised patch, for Richard or another global reviewer. So far
>> all review feedback is that it’s a step forward, and it’s been widely used
>> for both aarch64-darwin and x86_64-darwin distributions for almost three
>> years now.
>>
>> OK to commit?
>
> I just noticed that ftrampoline-impl isn't Optimize, thus it's not
> streamed with LTO.
I think this is fine, the nested pass runs before LTO streaming and lowers to
the relevant built-ins for the chosen impl.
The builtins are distinct and can co-exist in the linked exe,
> How does mixing different -ftrampoline-impl for different LTO TUs behave?
Assuming that a target can support multiple implementations, then each is
applied local to a single TU. The nested functions are scoped within their
parent and thus should not be candidates for merging by LTO.
For a target that cannot support both, then one or more of the TUs should be
rejected before we even get to LTO.
> How does mis-specifying -ftrampoline-impl at LTO link time compared to
> compile-time behave?
The flag should be a NOP at LTO link time (but I do not think we want to
reject it, that would probably create other issues?)
> Is the state fully reflected during pre-IPA compilation and the flag not
> needed after that?
yes, that is my understanding, nested runs very early.
> It appears so, but did you check?
I actually checked on x86_64-darwin (which does support both) and we see…
here with two tus with nested fns and a third with the main().
$ nm -mapv ./nn.ltrans0.ltrans.o
as expected, two instances of the nested “bar”.
00000000000001a8 (__TEXT,__cstring) non-external lC0
000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0
00000000000001d0 (__TEXT,__cstring) non-external lC1
00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1
000000000000007c (__TEXT,__text) external _foo_1
0000000000000149 (__TEXT,__text) external _foo_2
0000000000000000 (__TEXT,__text) external _main
>>> these for heap-based:
(undefined) external ___builtin_nested_func_ptr_created
(undefined) external ___builtin_nested_func_ptr_deleted
>>> this for stack-based.
(undefined) external ___enable_execute_stack
(and the code executes as expected).
> OK if that's a non-issue.
thanks, we'll wait a day or two in case of any follow-on comments,
Iain
P.S. I was investigating some unrelated unwinder issues a couple of weeks ago,
but that did highlight that we have a possibility to avoid the leaks from
longjump if we hang on the forced_unwind() machinery [TODO, tho, not part of
this initial patch]
>
> Thanks,
> Richard.
>
>> FX
>>
>>
>>
>>> Le 5 août 2023 à 16:20, FX Coudert <[email protected]> a écrit :
>>>
>>> Hi Richard,
>>>
>>> Thanks for your feedback. Here is an amended version of the patch, taking
>>> into consideration your requests and the following discussion. There is no
>>> configure option for the libgcc part, and the documentation is amended. The
>>> patch is split into three commits for core, target and libgcc.
>>>
>>> Currently regtesting on x86_64 linux and darwin (it was fine before I split
>>> up into three commits, so I’m re-testing to make sure I didn’t screw
>>> anything up).
>>>
>>> OK to commit?
>>> FX
>>