On Mon, Nov 5, 2012 at 7:43 PM, Uros Bizjak <ubiz...@gmail.com> wrote:

>>>> As discussed above, modes of loads, generated from __builtin_apply
>>>> have no connection to function return mode. mode-switching.c does
>>>> detect __builtin_apply situation and raises maybe_builtin_apply flag,
>>>> but doesn't use it to short-circuit wrong check. In proposed patch, we
>>>> detect this situation and raise force_late_switch in the same way, as
>>>> SH4 does for its "late" fpscr emission.
>>>
>>> If I understand correctly, we need to insert the vzeroupper because the
>>> function returns double in SSE registers but we generate an OImode load
>>> instead of a DFmode load because of the __builtin_return.  So we're in the
>>> forced_late_switch case but we fail to recognize the tweaked return value 
>>> load
>>> since the number of registers doesn't match.

Actually, the complication with __buitlin_apply/__builtin_return is
that it blindly loads all possible return registers. So, in PR41993
case, we load SSE register using OImode load (that forces AVX dirty
state), even if we actually return %eax. In your patch it is assumed
that wide-load corresponds to the current function return register,
which is not the case.

Please note, that we enter the above code only in case the needed mode
of processed insn is different than MODE_EXIT. So in x86 case, we know
that this was due to hard-register load in "wrong" mode from
__builtin_{apply,return}. My patch also uses extra condition, where
hard-reg should be one of possible return registers, but not only
current function return register.

Uros.

Reply via email to