Re: MIPS RDHWR instruction reordering

2006-07-25 Thread Atsushi Nemoto
On Mon, 24 Jul 2006 19:08:41 +0100, Richard Sandiford [EMAIL PROTECTED] wrote:
 You can set up DejaGNU's unix.exp to use rsh and rcp.  It's usually
 a case of creating a board file foo.exp like this:
...
 and then running with make -k check RUNTESTFLAGS='--target_board foo'.

 FWIW, I can run a gcc test run for you on mips64-linux-gnu (and perhaps
 glibc too, depending on the time).  I should have the results by the weekend.

Good advice, thanks!  I'll setup the environment.  It might take some
time since I'm not familiar with dejagnu, so your results will be
appreciated.

  +; Since rdhwr always generate a trap for now, it should not be be put
  +; on delay slot.  It it was on delay slot, the emulation will be
  +; slower.
 
 Andreas has already commented on this, but the tense sounds wrong too.
 Maybe it would be better to have something like:
 
 ; Putting rdhwr in a delay slot would make the kernel's emulation
 ; of it much slower.
 
 right above the can_delay line.  Whatever you feel is best though.

Thank you.  Here is a updated patch with your suggestions.


2006-07-25  Atsushi Nemoto  [EMAIL PROTECTED]

partial PR target/28126
* config/mips/mips.md (tls_get_tp_mode): Set can_delay to no.

Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md (revision 115735)
+++ gcc/config/mips/mips.md (working copy)
@@ -5458,6 +5458,9 @@
   HAVE_AS_TLS  !TARGET_MIPS16
   .set\tpush\;.set\tmips32r2\t\;rdhwr\t%0,$29\;.set\tpop
   [(set_attr type unknown)
+; Since rdhwr always generates a trap for now, putting it in a delay
+; slot would make the kernel's emulation of it much slower.
+   (set_attr can_delay no)
(set_attr mode MODE)])
 
 ; The MIPS Paired-Single Floating Point and MIPS-3D Instructions.


Re: MIPS RDHWR instruction reordering

2006-07-24 Thread Atsushi Nemoto
On 22 Jul 2006 20:58:16 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 OK, patch is approved, with a ChangeLog entry, and assuming it passes
 the testsuite.

Thanks, here is a patch with a ChangeLog entry.

I can cross-build gcc and glibc successfully, but unfortunately I can
not build and run the testsuite natively (in reasonable time) due to
limited CPU/memory resources on my target platform.  Is there good way
to run testsuite on cross environment?


2006-07-24  Atsushi Nemoto  [EMAIL PROTECTED]

partial PR target/28126
* config/mips/mips.md (tls_get_tp_mode): Set can_delay to no.

Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md (revision 115370)
+++ gcc/config/mips/mips.md (working copy)
@@ -5450,6 +5450,9 @@
 ; MIPS 32r2 specification, but we use it on any architecture because
 ; we expect it to be emulated.  Use .set to force the assembler to
 ; accept it.
+; Since rdhwr always generate a trap for now, it should not be be put
+; on delay slot.  It it was on delay slot, the emulation will be
+; slower.
 
 (define_insn tls_get_tp_mode
   [(set (match_operand:P 0 register_operand =v)
@@ -5458,6 +5461,7 @@
   HAVE_AS_TLS  !TARGET_MIPS16
   .set\tpush\;.set\tmips32r2\t\;rdhwr\t%0,$29\;.set\tpop
   [(set_attr type unknown)
+   (set_attr can_delay no)
(set_attr mode MODE)])
 
 ; The MIPS Paired-Single Floating Point and MIPS-3D Instructions.


Re: MIPS RDHWR instruction reordering

2006-07-24 Thread Andreas Schwab
Atsushi Nemoto [EMAIL PROTECTED] writes:

 Index: gcc/config/mips/mips.md
 ===
 --- gcc/config/mips/mips.md   (revision 115370)
 +++ gcc/config/mips/mips.md   (working copy)
 @@ -5450,6 +5450,9 @@
  ; MIPS 32r2 specification, but we use it on any architecture because
  ; we expect it to be emulated.  Use .set to force the assembler to
  ; accept it.
 +; Since rdhwr always generate a trap for now, it should not be be put

s/generate/s/

 +; on delay slot.  It it was on delay slot, the emulation will be

on a (or in a?), s/It/If/

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: MIPS RDHWR instruction reordering

2006-07-24 Thread Richard Sandiford
Thanks for tackling this.

Atsushi Nemoto [EMAIL PROTECTED] writes:
 On 22 Jul 2006 20:58:16 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 OK, patch is approved, with a ChangeLog entry, and assuming it passes
 the testsuite.

 Thanks, here is a patch with a ChangeLog entry.

 I can cross-build gcc and glibc successfully, but unfortunately I can
 not build and run the testsuite natively (in reasonable time) due to
 limited CPU/memory resources on my target platform.  Is there good way
 to run testsuite on cross environment?

You can set up DejaGNU's unix.exp to use rsh and rcp.  It's usually
a case of creating a board file foo.exp like this:

load_generic_config unix
process_multilib_options 
set_board_info hostname 

You can use ssh and scp instead by adding:

set_board_info rsh_prog ssh
set_board_info rcp_prog rcp

and then running with make -k check RUNTESTFLAGS='--target_board foo'.

FWIW, I can run a gcc test run for you on mips64-linux-gnu (and perhaps
glibc too, depending on the time).  I should have the results by the weekend.

 +; Since rdhwr always generate a trap for now, it should not be be put
 +; on delay slot.  It it was on delay slot, the emulation will be
 +; slower.

Andreas has already commented on this, but the tense sounds wrong too.
Maybe it would be better to have something like:

; Putting rdhwr in a delay slot would make the kernel's emulation
; of it much slower.

right above the can_delay line.  Whatever you feel is best though.

Richard


Re: MIPS RDHWR instruction reordering

2006-07-22 Thread Atsushi Nemoto
On 21 Jul 2006 10:06:34 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 I also don't see why revision 108713 would affect this.
 
 But I do note that this version is still bad.  The rdhwr instruction
 is in the branch delay slot, and is therefore always executed.

Yes, and I think rdhwr should not be in delay slot anyway.  Just
avoiding delay slot is quite easy.  Here is a proposal patch.

With this patch, it seems gcc 4.2 generate desired code for now,
though there might be somewhere to fix.

Index: gcc/config/mips/mips.md
===
--- gcc/config/mips/mips.md (revision 115370)
+++ gcc/config/mips/mips.md (working copy)
@@ -5450,6 +5450,9 @@
 ; MIPS 32r2 specification, but we use it on any architecture because
 ; we expect it to be emulated.  Use .set to force the assembler to
 ; accept it.
+; Since rdhwr always generate a trap for now, it should not be be put
+; on delay slot.  It it was on delay slot, the emulation will be
+; slower.
 
 (define_insn tls_get_tp_mode
   [(set (match_operand:P 0 register_operand =v)
@@ -5458,6 +5461,7 @@
   HAVE_AS_TLS  !TARGET_MIPS16
   .set\tpush\;.set\tmips32r2\t\;rdhwr\t%0,$29\;.set\tpop
   [(set_attr type unknown)
+   (set_attr can_delay no)
(set_attr mode MODE)])
 
 ; The MIPS Paired-Single Floating Point and MIPS-3D Instructions.


Re: MIPS RDHWR instruction reordering

2006-07-22 Thread Ian Lance Taylor
Atsushi Nemoto [EMAIL PROTECTED] writes:

 On 21 Jul 2006 10:06:34 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
  I also don't see why revision 108713 would affect this.
  
  But I do note that this version is still bad.  The rdhwr instruction
  is in the branch delay slot, and is therefore always executed.
 
 Yes, and I think rdhwr should not be in delay slot anyway.  Just
 avoiding delay slot is quite easy.  Here is a proposal patch.
 
 With this patch, it seems gcc 4.2 generate desired code for now,
 though there might be somewhere to fix.

This may be right, but I'm not sure that it is.  If it is OK to
unconditionally execute rdhwr, then it should be OK to put it in a
delay slot.  Unless that will break something when rdhwr is emulated.
Or will the emulation code run slower when rdhwr is in a delay slot?
That is, slower than it would if rdhwr were emulated without being in
a delay slot?

Ian


Re: MIPS RDHWR instruction reordering

2006-07-22 Thread Daniel Jacobowitz
On Sat, Jul 22, 2006 at 09:52:44AM -0700, Ian Lance Taylor wrote:
 This may be right, but I'm not sure that it is.  If it is OK to
 unconditionally execute rdhwr, then it should be OK to put it in a
 delay slot.  Unless that will break something when rdhwr is emulated.
 Or will the emulation code run slower when rdhwr is in a delay slot?
 That is, slower than it would if rdhwr were emulated without being in
 a delay slot?

Yes.  Much, much slower.  There's a fast path support for rdhwr (I'm
not sure if it is committed yet but it's definitely floating around)
which only handles the non-branch-delay case.  It will still work in a
delay slot, but it's a much heavier-weight operation.

So, until and unless there is a revision of the MIPS architecture on
which this instruction is not guaranteed to trap, I think we should not
put it in a delay slot.

-- 
Daniel Jacobowitz
CodeSourcery


Re: MIPS RDHWR instruction reordering

2006-07-22 Thread Ian Lance Taylor
Daniel Jacobowitz [EMAIL PROTECTED] writes:

 On Sat, Jul 22, 2006 at 09:52:44AM -0700, Ian Lance Taylor wrote:
  This may be right, but I'm not sure that it is.  If it is OK to
  unconditionally execute rdhwr, then it should be OK to put it in a
  delay slot.  Unless that will break something when rdhwr is emulated.
  Or will the emulation code run slower when rdhwr is in a delay slot?
  That is, slower than it would if rdhwr were emulated without being in
  a delay slot?
 
 Yes.  Much, much slower.  There's a fast path support for rdhwr (I'm
 not sure if it is committed yet but it's definitely floating around)
 which only handles the non-branch-delay case.  It will still work in a
 delay slot, but it's a much heavier-weight operation.
 
 So, until and unless there is a revision of the MIPS architecture on
 which this instruction is not guaranteed to trap, I think we should not
 put it in a delay slot.

OK, patch is approved, with a ChangeLog entry, and assuming it passes
the testsuite.

Thanks.

Ian


Re: MIPS RDHWR instruction reordering

2006-07-21 Thread Atsushi Nemoto
On 19 Jun 2006 16:45:43 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 I'm not sure, because I'm not sure what is hoisting the instruction.
 
 I tried recreating this, but I couldn't.  I get this:
 
 foo:
   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
   .mask   0x,0
   .fmask  0x,0
   .setnoreorder
   .cpload $25
   .setreorder
   .setnoreorder
   .setnomacro
   beq $4,$0,$L7
   .setpush
   .setmips32r2
   rdhwr   $3,$29
   .setpop
   .setmacro
   .setreorder

FYI, I found that the difference between your result (gcc 4.2) and
mine (gcc 4.1.1) is come from r108713 commit.

With r108712 I got:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

lw  $2,%gottprel(x)($28)
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop
addu$2,$2,$3
beq $4,$0,$L4
move$3,$0

lw  $3,0($2)
$L4:
j   $31
move$2,$3

And with r108713 I got:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

beq $4,$0,$L7
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop

lw  $2,%gottprel(x)($28)
nop
addu$2,$2,$3
lw  $2,0($2)
j   $31
nop

$L7:
j   $31
move$2,$0

And I can not see why the commit make such a difference...

---
Atsushi Nemoto


Re: MIPS RDHWR instruction reordering

2006-07-21 Thread Ian Lance Taylor
Atsushi Nemoto [EMAIL PROTECTED] writes:

 And with r108713 I got:
 
 foo:
   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
   .mask   0x,0
   .fmask  0x,0
   .setnoreorder
   .cpload $25
   .setnomacro
   
   beq $4,$0,$L7
   .setpush
   .setmips32r2
   rdhwr   $3,$29
   .setpop
 
   lw  $2,%gottprel(x)($28)
   nop
   addu$2,$2,$3
   lw  $2,0($2)
   j   $31
   nop
 
 $L7:
   j   $31
   move$2,$0
 
 And I can not see why the commit make such a difference...

I also don't see why revision 108713 would affect this.

But I do note that this version is still bad.  The rdhwr instruction
is in the branch delay slot, and is therefore always executed.

Ian


Re: MIPS RDHWR instruction reordering

2006-06-20 Thread Atsushi Nemoto
On 19 Jun 2006 16:45:43 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
 I tried recreating this, but I couldn't.  I get this:
...
 This of course is not ideal, since it unconditionally executes the
 rdhwr instruction.  But it is not the same as what the OP reported.

I used stock gcc 4.1.1 to get the result: mips-linux-gcc -O2 -S foo.c

If I used -O instead of -O2, I got similer (but not same) result:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

bne $4,$0,$L2
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop

j   $31
move$2,$0

$L2:
lw  $2,%gottprel(x)($28)
nop
addu$2,$2,$3
lw  $2,0($2)
j   $31
nop

JFYI.


Should I file a bug report?

---
Atsushi Nemoto


Re: MIPS RDHWR instruction reordering

2006-06-20 Thread Ian Lance Taylor
Atsushi Nemoto [EMAIL PROTECTED] writes:

 On 19 Jun 2006 16:45:43 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
  I tried recreating this, but I couldn't.  I get this:
 ...
  This of course is not ideal, since it unconditionally executes the
  rdhwr instruction.  But it is not the same as what the OP reported.
 
 I used stock gcc 4.1.1 to get the result: mips-linux-gcc -O2 -S foo.c

I was using current mainline.

 Should I file a bug report?

Yes, please.  Thanks.

Ian


Re: MIPS RDHWR instruction reordering

2006-06-19 Thread Daniel Jacobowitz
On Fri, Jun 16, 2006 at 02:12:29PM -0700, Ian Lance Taylor wrote:
 The computation of the address of x was moved outside the
 conditional--that is, both the rdhwr and the addu moved.  You'll have
 to figure out why.  gcc shouldn't move instructions outside of a
 conditional unless they are cheap and don't trap.  This instruction
 doesn't trap, but it's not cheap.

What metric gets used for this - rtx_cost?

-- 
Daniel Jacobowitz
CodeSourcery


Re: MIPS RDHWR instruction reordering

2006-06-19 Thread Ian Lance Taylor
Daniel Jacobowitz [EMAIL PROTECTED] writes:

 On Fri, Jun 16, 2006 at 02:12:29PM -0700, Ian Lance Taylor wrote:
  The computation of the address of x was moved outside the
  conditional--that is, both the rdhwr and the addu moved.  You'll have
  to figure out why.  gcc shouldn't move instructions outside of a
  conditional unless they are cheap and don't trap.  This instruction
  doesn't trap, but it's not cheap.
 
 What metric gets used for this - rtx_cost?

I'm not sure, because I'm not sure what is hoisting the instruction.

I tried recreating this, but I couldn't.  I get this:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setreorder
.setnoreorder
.setnomacro
beq $4,$0,$L7
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop
.setmacro
.setreorder

lw  $2,%gottprel(x)($28)
addu$2,$2,$3
lw  $2,0($2)
j   $31
$L7:
.setnoreorder
.setnomacro
j   $31
move$2,$0
.setmacro
.setreorder

This of course is not ideal, since it unconditionally executes the
rdhwr instruction.  But it is not the same as what the OP reported.

This case happens because reorg.c ignores the cost of the instruction
in fill_slots_from_thread.  I believe that reorg.c should not move an
expensive instruction which is only conditionally executed into a
delay slot.  That is probably a bug.

We can see a similar case with this:

int foo(int arg, int x)
{
if (arg)
  return x * x;
return 0;
}

which yields this:

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setreorder
.setnoreorder
.setnomacro
beq $4,$0,$L7
mult$5,$5
.setmacro
.setreorder

mflo$2
j   $31
$L7:
.setnoreorder
.setnomacro
j   $31
move$2,$0
.setmacro
.setreorder

which executes the mult instruction unconditionally which is
probably not desirable since it will tie up the multiplication
pipeline.

Ian


Re: MIPS RDHWR instruction reordering

2006-06-17 Thread Atsushi Nemoto
On 16 Jun 2006 14:12:29 -0700, Ian Lance Taylor [EMAIL PROTECTED] wrote:
  The RDHWR is executed _before_ evaluating the arg value.  For arg ==
  0 case, the RDHWR has no point but just a overhead.  Without -O2, the
  RDHWR is executed _after_ the evaluation, so gcc's optimizer reorder
  the RDHWR instruction.
 
 The computation of the address of x was moved outside the
 conditional--that is, both the rdhwr and the addu moved.  You'll have
 to figure out why.  gcc shouldn't move instructions outside of a
 conditional unless they are cheap and don't trap.  This instruction
 doesn't trap, but it's not cheap.

AFAIK on all MIPS CPU the rdhwr generates an exception (trap).

I tried changing unspec to unspec_volatile in this definition:

(define_insn tls_get_tp_mode
  [(set (match_operand:P 0 register_operand =v)
(unspec:P [(const_int 0)]
  UNSPEC_TLS_GET_TP))]
  HAVE_AS_TLS  !TARGET_MIPS16
  .set\tpush\;.set\tmips32r2\t\;rdhwr\t%0,$29\;.set\tpop
  [(set_attr type unknown)
   (set_attr mode MODE)])

With unspec_volatile, gcc do not move the rdhwr before the branch.
But this change has bad side effects.  For example, if I incremented a
thread local variable, rdhwr is used twice (for load and store).

So I suppose we should tell gcc that rdhwr is not cheap.  But I do not
know how to describe such information in .md file...

---
Atsushi Nemoto


MIPS RDHWR instruction reordering

2006-06-16 Thread Atsushi Nemoto
The RDHWR instruction is used to support TLS on Linux/MIPS.  For now
it is always emulated by kernel (on Reserved Instruction exception
handler), the instruction will be quite expensive.

If I compile this code with gcc 4.1.1 (-O2),

extern __thread int x;
int foo(int arg)
{
if (arg)
return x;
return 0;
}

I got this output.

foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

lw  $2,%gottprel(x)($28)
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop
addu$2,$2,$3
beq $4,$0,$L4
move$3,$0

lw  $3,0($2)
$L4:
j   $31
move$2,$3

The RDHWR is executed _before_ evaluating the arg value.  For arg ==
0 case, the RDHWR has no point but just a overhead.  Without -O2, the
RDHWR is executed _after_ the evaluation, so gcc's optimizer reorder
the RDHWR instruction.

I want to make the RDHWR called only if really required.  Is it
possible to prevent such a reording for RDHWR ?  

This is very common case in libc.  On returning from system call, libc
checks the result and write to the errno which is thread-local.  On
successful syscall, we do not need to access errno at all, but gcc
sometimes put RDHWR _before_ checking the result of the syscall.  This
add an additional exception for the syscall for usual case.

I do not know much about gcc's internal.  Any help are welcome.  Thank
you.
---
Atsushi Nemoto


Re: MIPS RDHWR instruction reordering

2006-06-16 Thread Ian Lance Taylor
Atsushi Nemoto [EMAIL PROTECTED] writes:

 If I compile this code with gcc 4.1.1 (-O2),
 
 extern __thread int x;
 int foo(int arg)
 {
   if (arg)
   return x;
   return 0;
 }
 
 I got this output.
 
 foo:
   .frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
   .mask   0x,0
   .fmask  0x,0
   .setnoreorder
   .cpload $25
   .setnomacro
   
   lw  $2,%gottprel(x)($28)
   .setpush
   .setmips32r2
   rdhwr   $3,$29
   .setpop
   addu$2,$2,$3
   beq $4,$0,$L4
   move$3,$0
 
   lw  $3,0($2)
 $L4:
   j   $31
   move$2,$3
 
 The RDHWR is executed _before_ evaluating the arg value.  For arg ==
 0 case, the RDHWR has no point but just a overhead.  Without -O2, the
 RDHWR is executed _after_ the evaluation, so gcc's optimizer reorder
 the RDHWR instruction.

The computation of the address of x was moved outside the
conditional--that is, both the rdhwr and the addu moved.  You'll have
to figure out why.  gcc shouldn't move instructions outside of a
conditional unless they are cheap and don't trap.  This instruction
doesn't trap, but it's not cheap.

Ian