The RDHWR instruction is used to support TLS on Linux/MIPS. For now it is always emulated by kernel (on Reserved Instruction exception handler), the instruction will be quite expensive.
If I compile this code with gcc 4.1.1 (-O2), extern __thread int x; int foo(int arg) { if (arg) return x; return 0; } I got this output. foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .cpload $25 .set nomacro lw $2,%gottprel(x)($28) .set push .set mips32r2 rdhwr $3,$29 .set pop addu $2,$2,$3 beq $4,$0,$L4 move $3,$0 lw $3,0($2) $L4: j $31 move $2,$3 The RDHWR is executed _before_ evaluating the "arg" value. For arg == 0 case, the RDHWR has no point but just a overhead. Without -O2, the RDHWR is executed _after_ the evaluation, so gcc's optimizer reorder the RDHWR instruction. If I used -O instead of -O2, I got: foo: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x00000000,0 .fmask 0x00000000,0 .set noreorder .cpload $25 .set nomacro bne $4,$0,$L2 .set push .set mips32r2 rdhwr $3,$29 .set pop j $31 move $2,$0 $L2: lw $2,%gottprel(x)($28) nop addu $2,$2,$3 lw $2,0($2) j $31 nop This is not desired too since rdhwr on delay slot is executed unconditionally. -- Summary: gcc moves an expensive instruction outside of a conditional Product: gcc Version: 4.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: anemo at mba dot ocn dot ne dot jp GCC target triplet: mips*-*-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126