> On Oct 21, 2020, at 3:03 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubiz...@gmail.com
> <mailto:ubiz...@gmail.com>> wrote:
>>
>> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>
>>> +/* Check whether the register REGNO should be zeroed on X86.
>>> + When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>> + together, no need to zero it again.
>>> + Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>>> + very hard to be zeroed individually, don't zero individual st or
>>> + mm registgers at this time. */
>>> +
>>> +static bool
>>> +zero_call_used_regno_p (const unsigned int regno,
>>> + bool all_sse_zeroed)
>>> +{
>>> + return GENERAL_REGNO_P (regno)
>>> + || (!all_sse_zeroed && SSE_REGNO_P (regno))
>>> + || MASK_REGNO_P (regno);
>>> +}
>>> +
>>> +/* Return the machine_mode that is used to zero register REGNO. */
>>> +
>>> +static machine_mode
>>> +zero_call_used_regno_mode (const unsigned int regno)
>>> +{
>>> + /* NB: We only need to zero the lower 32 bits for integer registers
>>> + and the lower 128 bits for vector registers since destination are
>>> + zero-extended to the full register width. */
>>> + if (GENERAL_REGNO_P (regno))
>>> + return SImode;
>>> + else if (SSE_REGNO_P (regno))
>>> + return V4SFmode;
>>> + else
>>> + return HImode;
>>> +}
>>> +
>>> +/* Generate a rtx to zero all vector registers togetehr if possible,
>>> + otherwise, return NULL. */
>>> +
>>> +static rtx
>>> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> + if (!TARGET_AVX)
>>> + return NULL;
>>> +
>>> + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> + if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>>> + || (TARGET_64BIT
>>> + && (REX_SSE_REGNO_P (regno)
>>> + || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> + return NULL;
>>> +
>>> + return gen_avx_vzeroall ();
>>> +}
>>> +
>>> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
>>> + otherwise, return NULL. */
>>> +
>>> +static rtx
>>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> + if (!TARGET_MMX)
>>> + return NULL;
>>> +
>>> + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> + if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> + return NULL;
>>> +
>>> + return gen_mmx_emms ();
>>>
>>>
>>> emms is not clearing any register, it only loads x87FPUTagWord with
>>> FFFFH. So I think, the above is useless, as far as register clearing
>>> is concerned.
>>>
>>>
>>> Thanks for the info.
>>>
>>> So, for mm and st registers, should we clear them, and how?
>>>
>>>
>>> I don't know.
>>>
>>> Please note that %mm and %st share the same register file, and
>>> touching %mm registers will block access to %st until emms is emitted.
>>> You can't just blindly load 0 to %st registers, because the register
>>> file can be in MMX mode and vice versa. For 32bit targets, function
>>> can also return a value in the %mm0.
>>>
>>>
>>> If data flow determine that %mm0 does not return a value at the return, can
>>> we clear all the %st as following:
>>>
>>> emms
>>> mov %st0, 0
>>> mov %st1, 0
>>> mov %st2, 0
>>> mov %st3, 0
>>> mov %st4, 0
>>> mov %st5, 0
>>> mov %st6, 0
>>> mov %st7, 0
>>
>> The i386 ABI says:
>>
>> -- q --
>> The CPU shall be in x87 mode upon entry to a function. Therefore,
>> every function that uses the MMX registers is required to issue an
>> emms or femms instruction after using MMX registers, before returning
>> or calling another function.
>> -- /q --
>>
>> (The above requirement slightly contradicts its own ABI, since we have
>> 3 MMX argument registers and MMX return register, so the CPU obviously
>> can't be in x87 mode at all function boundaries).
>>
>> So, assuming that the first sentence is not deliberately vague w.r.t
>> function exit, emms should not be needed. However, we are dealing with
>> x87 stack registers that have their own set of peculiarities. It is
>> not possible to load a random register in the way you show. Also,
>> stack should be either empty or one (two in case of complex value
>> return) levels deep at the function return. I think you want a series
>> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
>> the stack and mark stack slots empty.
>
> Something like this:
>
> --cut here--
> long double
> __attribute__ ((noinline))
> test (long double a, long double b)
> {
> long double r = a + b;
>
> asm volatile ("fldz; \
> fldz; \
> fldz; \
> fldz; \
> fldz; \
> fldz; \
> fldz; \
> fstp %%st(0); \
> fstp %%st(0); \
> fstp %%st(0); \
> fstp %%st(0); \
> fstp %%st(0); \
> fstp %%st(0); \
> fstp %%st(0)" : : "X"(r));
> return r;
> }
>
> int
> main ()
> {
> long double a = 1.1, b = 1.2;
>
> long double c = test (a, b);
>
> printf ("%Lf\n", c);
>
> return 0;
> }
> --cut here—
Okay, so,
1. First compute how many st registers need to be zeroed, num_of_zeroed_st
2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all
the dead stack slots;
3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the
stack.
Is the above understanding correctly?
Another thought is:
Looks like it’s very complicate to use the st/mm register set correctly, So,
I assume that this set of registers might be very hard to be used by the
attacker correctly.
Right?
thanks.
Qing
>
> Uros.