On Fri, Oct 11, 2013 at 3:27 PM, Richard Sandiford
<rdsandif...@googlemail.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Fri, Oct 11, 2013 at 12:48 PM, Richard Sandiford
>> <rdsandif...@googlemail.com> wrote:
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>> On Fri, Oct 11, 2013 at 11:43 AM, Richard Sandiford
>>>> <rdsandif...@googlemail.com> wrote:
>>>>> Jakub Jelinek <ja...@redhat.com> writes:
>>>>>> On Fri, Oct 11, 2013 at 10:17:41AM +0200, Richard Biener wrote:
>>>>>>> asm(".alias __sync_synchronize sync_synchronize");
>>>>>>
>>>>>> It is .set, but not everywhere.
>>>>>> /* The MIPS assembler has different syntax for .set. We set it to
>>>>>>    .dummy to trap any errors.  */
>>>>>> #undef SET_ASM_OP
>>>>>> #define SET_ASM_OP "\t.dummy\t"
>>>>>> But perhaps it would require fewer variants than providing inline asm
>>>>>> of the __sync_* builtin by hand for all the targets that need it.
>>>>>
>>>>> Yeah, that's why I prefer the sed approach.  GCC knows how to do exactly
>>>>> what we want, and what syntax to use.  We just need to take its output and
>>>>> change the function name.
>>>>>
>>>>> And like Richard says, parsing top-level asms would be fair game,
>>>>> especially if GCC and binutils ever were integrated (for libgccjit.so).
>>>>> So using top-level asms seems like putting off the inevitable
>>>>> (albeit putting it off further than __asm renaming).
>>>>>
>>>>> Do either of you object to the sed thing?
>>>>
>>>> Well, ideally there would be a way to not lie about symbol names to GCC.
>>>> That is, have a "native" way of telling GCC what to do here (which is
>>>> as far as I understand to emit the code for the builtin-handled $SYM
>>>> in a function with $SYM - provide the out-of-line copy for a builtin).
>>>>
>>>> For the __sync functions it's unfortunate that the library function has
>>>> the same 'name' as the builtin and the builtin doesn't have an alternate
>>>> spelling.  So - can't we just add __builtin__sync_... spellings and use
>>>>
>>>> __sync_synchronize ()
>>>> {
>>>>   __builtin_sync_syncronize ();
>>>> }
>>>>
>>>> ?  (what if __builtin_sync_syncronize expands to a libcall?  if it can't,
>>>> what's the point of the library function?)
>>>
>>> It can't expand to a libcall in nomips16 mode.  It always expands to a
>>> libcall in mips16 mode.  The point is to provide out-of-line nomips16
>>> functions that the mips16 code can call.
>>>
>>>> Why does a simple
>>>>
>>>> __sync_synchronize ()
>>>> {
>>>>   __sync_synchronize ();
>>>> }
>>>>
>>>> not work?  On x86_64 I get from that:
>>>>
>>>> __sync_synchronize:
>>>> .LFB0:
>>>>         .cfi_startproc
>>>>         mfence
>>>>         ret
>>>>         .cfi_endproc
>>>>
>>>> (similar to how you can have a definition of memcpy () and have
>>>> another memcpy inside it inline-expanded)
>>>
>>> Is that with optimisation enabled?  -O2 gives me:
>>>
>>> __sync_synchronize:
>>> .LFB0:
>>>         .cfi_startproc
>>>         .p2align 4,,10
>>>         .p2align 3
>>> .L2:
>>>         jmp     .L2
>>>         .cfi_endproc
>>> .LFE0:
>>>
>>> We do want to compile this stuff with optimisation enabled.
>>
>> I compiled with -O1 only.  Yes, at -O2 I get the infinite loop.
>>
>> Eventually we should simply not build cgraph edges _from_ calls
>> to builtins?  Or disable tail recursion for builtin calls (tail-recursion
>> is what does this optimization).
>>
>> Honza?  For tailr this boils down to symtab_semantically_equivalent_p ().
>> I suppose we don't want to change that but eventually special case
>> builtins in tailr, thus
>>
>> Index: gcc/tree-tailcall.c
>> ===================================================================
>> --- gcc/tree-tailcall.c (revision 203409)
>> +++ gcc/tree-tailcall.c (working copy)
>> @@ -446,7 +446,9 @@ find_tail_calls (basic_block bb, struct
>>    /* We found the call, check whether it is suitable.  */
>>    tail_recursion = false;
>>    func = gimple_call_fndecl (call);
>> -  if (func && recursive_call_p (current_function_decl, func))
>> +  if (func
>> +      && ! DECL_BUILT_IN (func)
>> +      && recursive_call_p (current_function_decl, func))
>>      {
>>        tree arg;
>>
>> which makes -O2 not turn it into an infinite loop (possibly also applies
>> to the original code with the alias declaration?)
>
> If that's OK then I'm certainly happy with it :-)

I'm happy with the tailcall change (if you can do the testing ... ;))

>  FWIW, the alias case
> was first optimised from __sync_synchronize->sync_synchronize, before it
> got converted into a tail call.  That happened very early in the pipeline
> and I suppose would stop the built-in expansion from kicking in,
> even with tail calls disabled.  But if we say that:
>
> foo () { foo (); }
>
> is a supported way of defining out-of-line versions of built-in foo,
> then that's much more convenient than the aliases anyway.

Btw, if it is supported then we should add a testcase that makes sure
we don't regress for this.  (I wouldn't say we should document this
"feature" just in case we want to decide it's not the way we want it
in the future).

Richard.

Reply via email to