On Sun, Mar 4, 2012 at 12:09 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu <hongjiu...@intel.com> wrote:
>
>> The current x32 implementation uses LEAs to convert 32bit address to
>> 64bit.  However, we can use addr32 prefix to use 32bit address directly.
>> It improves performance by 5% in SPEC CPU 2K/2006.  All changes are done
>> in x86 backend, except for a smaill unwind library assert change:
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html
>>
>> due to return column size difference.
>>
>> For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit.
>> push/pop only work on word_mode.  Also string instructions take Pmode
>> pointers.
>>
>> I will submit a set of patches to use 32bit Pmode for x32.  This is
>> the first patch to properly use Pmode and word_mode.  It also adds
>> addr32 prefix to string instructions if needed.  OK for trunk?
>
> First round of review comments:
>
> @@ -10252,14 +10260,18 @@ ix86_expand_prologue (void)
>       if (r10_live && eax_live)
>         {
>          t = choose_baseaddr (m->fs.sp_offset - allocate);
> -         emit_move_insn (r10, gen_frame_mem (Pmode, t));
> +         emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
> +                         gen_frame_mem (word_mode, t));
>          t = choose_baseaddr (m->fs.sp_offset - allocate - UNITS_PER_WORD);
> -         emit_move_insn (eax, gen_frame_mem (Pmode, t));
> +         emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
> +                         gen_frame_mem (word_mode, t));
>        }
>       else if (eax_live || r10_live)
>        {
>          t = choose_baseaddr (m->fs.sp_offset - allocate);
> -         emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t));
> +         emit_move_insn (gen_rtx_REG (word_mode,
> +                                      (eax_live ? AX_REG : R10_REG)),
> +                         gen_frame_mem (word_mode, t));
>        }
>     }
>   gcc_assert (m->fs.sp_offset == frame.stack_pointer_offset);
>
> Please just change
>
>      rtx eax = gen_rtx_REG (Pmode, AX_REG);
>
> and
>          r10 = gen_rtx_REG (Pmode, R10_REG);

This is done on purpose.  We manipulate stack using AX and R10 as
scratch registers in Pmode since stack is in Pmode.  But AX and R10
registers have to be saved and restored in word_mode.

> around line 10305 and line 10324. You also have gen_push in Pmode,

In those places, they just want to push a register on stack to save it.
Callers don't care how it is done.  I changed gen_push  to allow
Pmode by always pushing registers in word_mode:

 if (REG_P (arg) && GET_MODE (arg) != word_mode)
    arg = gen_rtx_REG (word_mode, REGNO (arg));

> just following the former line. Please review the whole
> ix86_expand_prologue how AX and R10 are defined and used.

The same issue applies here.

> @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void)
>        {
>          rtx rax;
>
> -         rax = gen_rtx_REG (Pmode, AX_REG);
> -         emit_move_insn (rax, reg10);
> +         rax = gen_rtx_REG (word_mode, AX_REG);
> +         emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG));
>          use_reg (&call_fusage, rax);
>        }
>
> Same here. Please review how AX, R10 and R11 are defined and used.
> Also, this needs review from split stack author.

I CCed Ian. That is the same issue.  We need some scratch registers
in Pmode to manipulate stack.  But we have to save and restore them
in word_mode, not Pmode.

> @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
> ix86_address *out)
>   else
>     disp = addr;                       /* displacement */
>
> +  /* Since address override works only on the (reg) part in fs:(reg),
> +     we can't use it as memory operand.  */
> +  if (Pmode != word_mode && seg == SEG_FS && (base || index))
> +    return 0;
>
> Can you explain the above some more? IMO, if the override works on
> (reg) part, this is just what we want.

When Pmode == SImode, we have

fs segment register == 0x1001

and

base register (SImode) == -1 (0xffffffff).

We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
is 0x1000 + 0xffffffff, not 0x1000 since 0x67 address prefix only applies to
base register to zero-extend 0xffffffff to 64bit.

> @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
>              gcc_unreachable ();
>            }
>
> -         ix86_print_operand (file, x, 0);
> +         ix86_print_operand (file, x,
> +                             TARGET_64BIT && REG_P (x) ? 'q' : 0);
>          return;
>
> This is too big hammer. You output everything in DImode, so even if
> the address is in fact in SImode, you output it in DImode with an
> addr32 prefix.
>

"%A" is only used in "jmp\t%A0" and there is no "jmp *%eax" instruction in
64bit mode, only "jmp *%rax":

[hjl@gnu-4 tmp]$ cat j.s
        jmp *%eax
        jmp *%rax
[hjl@gnu-4 tmp]$ gcc -c j.s
j.s: Assembler messages:
j.s:1: Error: operand type mismatch for `jmp'
[hjl@gnu-4 tmp]$

It is OK for x32 since the upper 32bits are zero when we are loading "%eax".


-- 
H.J.

Reply via email to