> On Oct 21, 2020, at 3:03 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
> 
> On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubiz...@gmail.com 
> <mailto:ubiz...@gmail.com>> wrote:
>> 
>> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <qing.z...@oracle.com> wrote:
>> 
>>> +/* Check whether the register REGNO should be zeroed on X86.
>>> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>> +   together, no need to zero it again.
>>> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>>> +   very hard to be zeroed individually, don't zero individual st or
>>> +   mm registgers at this time.  */
>>> +
>>> +static bool
>>> +zero_call_used_regno_p (const unsigned int regno,
>>> + bool all_sse_zeroed)
>>> +{
>>> +  return GENERAL_REGNO_P (regno)
>>> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
>>> +  || MASK_REGNO_P (regno);
>>> +}
>>> +
>>> +/* Return the machine_mode that is used to zero register REGNO.  */
>>> +
>>> +static machine_mode
>>> +zero_call_used_regno_mode (const unsigned int regno)
>>> +{
>>> +  /* NB: We only need to zero the lower 32 bits for integer registers
>>> +     and the lower 128 bits for vector registers since destination are
>>> +     zero-extended to the full register width.  */
>>> +  if (GENERAL_REGNO_P (regno))
>>> +    return SImode;
>>> +  else if (SSE_REGNO_P (regno))
>>> +    return V4SFmode;
>>> +  else
>>> +    return HImode;
>>> +}
>>> +
>>> +/* Generate a rtx to zero all vector registers togetehr if possible,
>>> +   otherwise, return NULL.  */
>>> +
>>> +static rtx
>>> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> +  if (!TARGET_AVX)
>>> +    return NULL;
>>> +
>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>>> +  || (TARGET_64BIT
>>> +      && (REX_SSE_REGNO_P (regno)
>>> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> +      return NULL;
>>> +
>>> +  return gen_avx_vzeroall ();
>>> +}
>>> +
>>> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
>>> +   otherwise, return NULL.  */
>>> +
>>> +static rtx
>>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> +  if (!TARGET_MMX)
>>> +    return NULL;
>>> +
>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> +      return NULL;
>>> +
>>> +  return gen_mmx_emms ();
>>> 
>>> 
>>> emms is not clearing any register, it only loads x87FPUTagWord with
>>> FFFFH. So I think, the above is useless, as far as register clearing
>>> is concerned.
>>> 
>>> 
>>> Thanks for the info.
>>> 
>>> So, for mm and st registers, should we clear them, and how?
>>> 
>>> 
>>> I don't know.
>>> 
>>> Please note that %mm and %st share the same register file, and
>>> touching %mm registers will block access to %st until emms is emitted.
>>> You can't just blindly load 0 to %st registers, because the register
>>> file can be in MMX mode and vice versa. For 32bit targets, function
>>> can also  return a value in the %mm0.
>>> 
>>> 
>>> If data flow determine that %mm0 does not return a value at the return, can 
>>> we clear all the %st as following:
>>> 
>>> emms
>>> mov %st0, 0
>>> mov %st1, 0
>>> mov %st2, 0
>>> mov %st3, 0
>>> mov %st4, 0
>>> mov %st5, 0
>>> mov %st6, 0
>>> mov %st7, 0
>> 
>> The i386 ABI says:
>> 
>> -- q --
>> The CPU shall be in x87 mode upon entry to a function. Therefore,
>> every function that uses the MMX registers is required to issue an
>> emms or femms instruction after using MMX registers, before returning
>> or calling another function.
>> -- /q --
>> 
>> (The above requirement slightly contradicts its own ABI, since we have
>> 3 MMX argument registers and MMX return register, so the CPU obviously
>> can't be in x87 mode at all function boundaries).
>> 
>> So, assuming that the first sentence is not deliberately vague w.r.t
>> function exit, emms should not be needed. However, we are dealing with
>> x87 stack registers that have their own set of peculiarities. It is
>> not possible to load a random register in the way you show.  Also,
>> stack should be either empty or one (two in case of complex value
>> return) levels deep at the function return. I think you want a series
>> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
>> the stack and mark stack slots empty.
> 
> Something like this:
> 
> --cut here--
> long double
> __attribute__ ((noinline))
> test (long double a, long double b)
> {
>  long double r = a + b;
> 
>  asm volatile ("fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0)" : : "X"(r));
>  return r;
> }
> 
> int
> main ()
> {
>  long double a = 1.1, b = 1.2;
> 
>  long double c = test (a, b);
> 
>  printf ("%Lf\n", c);
> 
>  return 0;
> }
> --cut here—


Okay, so,

1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all 
the dead stack slots;
3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the 
stack.

Is the above understanding correctly?

Another thought is:

Looks like it’s very complicate to use the st/mm register set correctly, So, 
I assume that this set of registers might be very hard to be used by the 
attacker correctly. 
Right?

thanks.

Qing
> 
> Uros.

Reply via email to