[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2008-09-08 Thread rdsandiford at googlemail dot com


--- Comment #12 from rdsandiford at googlemail dot com  2008-09-08 19:48 
---
Subject: Re:  gcc moves an expensive instruction outside of a conditional

daney at gcc dot gnu dot org [EMAIL PROTECTED] writes:
 Can we close this now?

 I think it is fixed.

Sorry, this is still on the back-burner.

I think the underlying problem is still there, even if it doesn't
trigger for the original testcases.

Richard


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2008-09-07 Thread daney at gcc dot gnu dot org


--- Comment #11 from daney at gcc dot gnu dot org  2008-09-07 17:32 ---
Can we close this now?

I think it is fixed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-08-05 Thread anemo at mba dot ocn dot ne dot jp


--- Comment #10 from anemo at mba dot ocn dot ne dot jp  2006-08-05 15:48 
---
(In reply to comment #9)
 Created an attachment (id=12010)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12010action=view) [edit]
 A hackish fix

I tried gcc 4.1 with this patch and compiled glibc 2.4.
With a quick look, I could not find unnecessary rdhwr.  Thanks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-08-03 Thread rsandifo at gcc dot gnu dot org


--- Comment #9 from rsandifo at gcc dot gnu dot org  2006-08-03 21:06 
---
Created an attachment (id=12010)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12010action=view)
A hackish fix

I agree with Kaz that a blockage would be a correct fix here.
I'm just worried about the performance impact.

A simple hack for 4.1 is to model tls_tp_mode as a division
instruction.  The middle-end will then treat it as potentially
trapping and won't execute it speculatively.  The advantages
are that:

  (1) it won't hinder speculation or scheduling of unrelated
  instructions, and

  (2) it still allows the optimisers to remove redundant rdhwrs,
  just as they would for redundant divisions of the same values.

Although this is indeed hackish, we're kind-of relying on the same
infrasturcture to prevent speculative execution of FPU code on targets
where kernel emulation is required.  A cleaner fix would be to get
may_trap_p (and perhaps other functions) to call a target hook
for UNSPECs.

As it happens, we're not really getting (2) anyway.  The use of register
$3 is exposed right at the outset, which stops most optimisers from
touching it.  E.g. something as simple as:

extern __thread int x;
void foo (void) { x++; }

will execute rdhwr twice.  It's simple to fix this by using a pseudo
instead of (reg $3).  This shouldn't cause problems, as we're already
relying on the =v constraint to force the use of the right register.

If we do use a pseudo, the question is what to do about uses in loops.
E.g. if we have:

extern __thread int x;
void foo (int n, int *ptr)
{
  while (n--  0)
if (*ptr++ == 1)
  x++;
}

should we allow the rdhwr to be hoisted?  (It will be if we keep
a non-trapping representation of tls_get_tp_mode, but won't be
if we treat it as trapping.)  I think the answer is that, in the
absence of profiling information, we simply don't know.  There are
going to be some cases where hoisting is exactly the right thing to
do and others where it's exactly the wrong thing.

This makes me wonder if we should compromise, and get the base pointer
lazily.  When we first find that a function needs the base pointer,
we can allocate a function-wide pseudo for it, and make sure that
the pseudo is zeroed at the beginning of the function.  We can then
emit:

(set (pc) (if_then_else (ne (reg bp) 0) (label_ref foo) (pc))
(set (reg bp) ...UNSPEC_TLS_GET_TP...)
foo:

every time.  This in itself is easy to do, but we'd need some way
of telling the optimisers that the result of ...UNSPEC_TLS_GET_TP...
is nonzero, and that subsequent (ne (reg bp) 0) branches will
always be taken.

Random musing, sorry.

Richard


-- 

rsandifo at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rsandifo at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-30 Thread rsandifo at gcc dot gnu dot org


--- Comment #8 from rsandifo at gcc dot gnu dot org  2006-07-30 10:56 
---
Subject: Bug 28126

Author: rsandifo
Date: Sun Jul 30 10:56:07 2006
New Revision: 115819

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=115819
Log:
gcc/
2006-07-25  Atsushi Nemoto  [EMAIL PROTECTED]

PR target/28126 (partial fix)
* config/mips/mips.md (tls_get_tp_mode): Set can_delay to no.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/mips/mips.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-21 Thread anemo at mba dot ocn dot ne dot jp


--- Comment #7 from anemo at mba dot ocn dot ne dot jp  2006-07-21 16:34 
---
(In reply to comment #6)
 Thanks.  With this patch, gcc 4.1.1 produces expected output.
 It seems gcc 4.2 does not move rdhwr before branch without this patch, but I
 can not see why.

I tried to find the reason of this difference between 4.1.1 and 4.2,
and found that the change happend on r108713.  The changelog is:

 2005-12-17  Danny Berlin [EMAIL PROTECTED]
   Kenneth Zadeck [EMAIL PROTECTED]

   * basic-block.h: Changed basic block numbering so that the entry
   block is 0 and the exit block is 1.  Changed insn iterators so
   that they are tolerant of blocks with no insns.
   * regrename.c (copyprop_hardreg_forward): Changed basic block
   numbering so that the entry block is 0 and the exit block is 1.
   ...

While it is too complex for me, I'm still not sure gcc 4.2 really do not need
the patch in comment #2.

Looking at comment #1, it seems gcc 4.2 still need the patch, but I could not
reproduce the same result by daney with 4.2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-13 Thread anemo at mba dot ocn dot ne dot jp


--- Comment #5 from anemo at mba dot ocn dot ne dot jp  2006-07-13 14:42 
---
Created an attachment (id=11881)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11881action=view)
do not put rdhwr instruction on delay slot

With this patch, gcc 4.2 (with -O1, -O2) and gcc 4.1.1 with -O1 produces
expected code.
gcc 4.1 with -O2 still moves rdhwr outside of a conditional.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-13 Thread anemo at mba dot ocn dot ne dot jp


--- Comment #6 from anemo at mba dot ocn dot ne dot jp  2006-07-13 14:58 
---
(In reply to comment #2)
 Although I don't know much about the instruction scheduling, I had
 a similar problem on SH and it was workarounded with emitting blockage
 insns.  The patch below might work for you, though I'm not sure if
 it's the right thing to do.

Thanks.  With this patch, gcc 4.1.1 produces expected output.
It seems gcc 4.2 does not move rdhwr before branch without this patch, but I
can not see why.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-12 Thread drow at gcc dot gnu dot org


--- Comment #4 from drow at gcc dot gnu dot org  2006-07-12 16:08 ---
(In reply to comment #3)
 Is it OK to add (set_attr can_delay no) for tls_get_tp_mode
 definition?

I think so, with suitable comment.  Some future MIPS architecture may provide a
register for this, in which case the instruction will not trap, and putting it
in a delay slot will be valuable.  But in the mean time, all existing MIPS ISAs
guarantee the trap.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-07-06 Thread anemo at mba dot ocn dot ne dot jp


--- Comment #3 from anemo at mba dot ocn dot ne dot jp  2006-07-06 16:20 
---
Subject: Re:  gcc moves an expensive instruction outside
 of a conditional

One note: I think rdhwr $v1, $29 should not be placed in delay slot
anyway.  The instruction always generate an exception, so if it was in
delay slot the kernel must calculate the target address of the
preceding branch/jump instruction.

Is it OK to add (set_attr can_delay no) for tls_get_tp_mode
definition?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-06-21 Thread daney at gcc dot gnu dot org


--- Comment #1 from daney at gcc dot gnu dot org  2006-06-21 16:33 ---
Confirmed on: mipsel-linux-gcc (GCC) 4.2.0 20060605 (experimental)
cross compiler configured as:
../gcc/configure   --target=mipsel-linux
--with-sysroot=/usr/local/mipsel-linux-test
--prefix=/usr/local/mipsel-linux-test --with-arch=mips32 --with-float=soft
--disable-libmudflap --enable-tls --enable-languages=c

I get a slightly different output for the test case but the rdhwr is still
being hoisted out of the conditional:

.file   1 tls.c
.section .mdebug.abi32
.previous
.abicalls
.text
.align  2
.globl  foo
.entfoo
.type   foo, @function
foo:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.cpload $25
.setnomacro

lw  $2,%gottprel(x)($28)
.setpush
.setmips32r2
rdhwr   $3,$29
.setpop
beq $4,$0,$L7
addu$2,$2,$3

j   $31
lw  $2,0($2)

$L7:
j   $31
move$2,$0

.setmacro
.setreorder
.endfoo
.ident  GCC: (GNU) 4.2.0 20060605 (experimental)


-- 

daney at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||daney at gcc dot gnu dot org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-06-21 16:33:02
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126



[Bug target/28126] gcc moves an expensive instruction outside of a conditional

2006-06-21 Thread kkojima at gcc dot gnu dot org


--- Comment #2 from kkojima at gcc dot gnu dot org  2006-06-22 04:16 ---
Although I don't know much about the instruction scheduling, I had
a similar problem on SH and it was workarounded with emitting blockage
insns.  The patch below might work for you, though I'm not sure if
it's the right thing to do.  Ian's comment in the gcc list
  http://gcc.gnu.org/ml/gcc/2006-06/msg00669.html
suggests us that there are more than one problem.

--- ORIG/gcc-4_1-branch/gcc/config/mips/mips.c  2006-06-20 10:53:21.0
+0900
+++ TMP/gcc-4_1-branch/gcc/config/mips/mips.c   2006-06-22 12:49:26.0
+0900
@@ -1995,6 +1995,7 @@ mips_legitimize_tls_address (rtx loc)
 case TLS_MODEL_INITIAL_EXEC:
   tmp1 = gen_reg_rtx (Pmode);
   tmp2 = mips_unspec_address (loc, SYMBOL_GOTTPREL);
+  emit_insn (gen_blockage ());
   if (Pmode == DImode)
{
  emit_insn (gen_tls_get_tp_di (v1));
@@ -2011,6 +2012,7 @@ mips_legitimize_tls_address (rtx loc)

 case TLS_MODEL_LOCAL_EXEC:

+  emit_insn (gen_blockage ());
   if (Pmode == DImode)
emit_insn (gen_tls_get_tp_di (v1));
   else


-- 

kkojima at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||kkojima at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126