[Bug c++/52315] New: [C++11] constexpr object of nested class

2012-02-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52315

 Bug #: 52315
   Summary: [C++11] constexpr object of nested class
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de


I'm not sure whether this is valid/invalid C++11, but it seems a little weird. 
The following works as expected:

struct B
{
  constexpr B (unsigned v) noexcept : val (v) { }
  constexpr unsigned value (void) const noexcept { return val; }

  unsigned val;
};

struct A
{
  int stuff[B (16).value ()];
};


But moving struct B inside of struct A ...

struct A
{
  struct B
  {
constexpr B (unsigned v) noexcept : val (v) { }
constexpr unsigned value (void) const noexcept { return val; }

unsigned val;
  };

  int stuff[ B (16).value () ];
};

... results in error: size of array 'stuff' is not an integral
constant-expression



Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20120220 (experimental) (GCC)


[Bug target/52049] New: SH Target: Inefficient constant address access

2012-01-29 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52049

 Bug #: 52049
   Summary: SH Target: Inefficient constant address access
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


static volatile int* const g_0 = (volatile int*)0x1240;
static volatile int* const g_1 = (volatile int*)0x1244;
static volatile int* const g_2 = (volatile int*)0x1248;
static volatile int* const g_3 = (volatile int*)0x124C;

int test_24 (void)
{
  return *g_0 + *g_1 + *g_2 + *g_3;
}

Compiled with -O1:
mov.w.L31,r1
mov.l@r1+,r0
mov.l@r1+,r3
mov.l@r1+,r2
mov.l@r1,r1
addr3,r0
addr2,r0
rts
addr1,r0
.align 1
.L31:
.short4672

Compiled with -O2, -Os, -O3:
mov.w.L31,r1
mov.l@r1,r0
mov.l@(4,r1),r3
mov.l@(8,r1),r2
add#12,r1  ! why not mov.l @(12,r1),r1 ??
mov.l@r1,r1
addr3,r0
addr2,r0
rts
addr1,r0
.align 1
.L31:
.short4672

This happens always for the last memory access, if the number of contiguous
accesses is  2.


When using non-volatile variables:

static int* const h_0 = (int*)0x1240;
static int* const h_1 = (int*)0x1244;
static int* const h_2 = (int*)0x1248;
static int* const h_3 = (int*)0x124C;

int test_25 (void)
{
  return *h_0 + *h_1 + *h_2 + *h_3;
}

Compiled with -O1,-O2,-Os,-O3:
mov.w.L33,r1
mov.l@r1+,r0
mov.l@r1,r1
addr1,r0
mov.w.L34,r1
mov.l@r1,r1
addr1,r0
mov.w.L35,r1
mov.l@r1,r1
rts
addr1,r0
.align 1
.L33:
.short4672
.L34:
.short4680
.L35:
.short4684

Better:
mov.w.L31,r1
mov.l@r1+,r0
mov.l@r1+,r3
mov.l@r1+,r2
addr3,r0
mov.l@r1,r1
addr2,r0
rts
addr1,r0
.align 1
.L31:
.short4672

I'm not sure whether this is actually a problem of the SH back-end or of some
middle-end passes.  It happens for all sub-targets and regardless of the
endianess.

Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20120129 (experimental) (GCC)


[Bug target/31640] cache block alignment is too aggressive on sh-elf

2011-12-31 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31640

--- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-12-31 17:24:47 UTC 
---
Created attachment 26208
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26208
Proposed patch

(In reply to comment #0)
 The sh4 port aligns blocks that have no fallthrus and that are either
 frequently executed (JUMP_ALIGN) or preceeded a barrier
 (LABEL_ALIGN_AFTER_BARRIER) on a cache line.
 
 While in theory this help to avoid cache misses if the block slits over 2 
 cache
 lines, in practise this reduces cache locality and lenghten distance between
 blocks.
 The number of issued instructions are also impacted. For example the relative
 indirect address in jump tables needs a byte zero extend instruction if the
 distance occupies 8 bits instead of 7 bits. 
 
 I ran some experiments and benchmarked (eembc) with 2 strategies
 1) -falign-jumps=1
 2) Align the block if the size is bigger than a given threshold. (empirically
 set to 16 bytes, half of the cache line size). See illustrating attached 
 patch.
 
 My conclusion is that in -O3 the performance never degrades (option 2 is a
 little bit better, even improving dhrystone by 3%) when removing this padding.
 And the text size improves by ~15%.

Because of this I would like to propose the following alignment strategies
(unless they are changed by the user with -falign-??? options).

-Os:
  Align everything to 2 byte to get compact code

-O2,-O3:
  Align functions to 4 bytes.
  Align labels and jumps to 2 bytes (to avoid potential code bloat).
  Align loops to 4 bytes.

The attached patch should do that, although not fully tested yet.


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-30 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #8 from Oleg Endo oleg.e...@t-online.de 2011-12-30 21:21:14 UTC 
---
(In reply to comment #7)
 (In reply to comment #3)
  I haven't ran all tests on it yet, but CSiBE shows average code size 
  reduction
  of approx. -0.1% for -m4* with some code size increases in some files.
  Would something like that be OK for stage 3?
 
 Looks good, though not appropriate for stage 3, I think.

The patch passed the testsuite without new failures.  I'll queue it up for
stage 1.


[Bug target/51708] New: SH Target: SHAD / SHLD constant not CSE-ed

2011-12-29 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51708

 Bug #: 51708
   Summary: SH Target: SHAD / SHLD constant not CSE-ed
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
Target: sh*-*-*


If the shift amount is a constant for SHAD / SHLD insns it should be CSE-ed.
This could be done in a similar way as in the movnegt insn.
Example function:

int foo (int* a, int x, int n)
{
  int i;
  int count;

  for (i = 0; i  n; i ++)
count += (*(a + i)  12);

  return count;
}

compiled with -m4 -O3:
cmp/pl  r6
bf  .L11
shll2   r6
add #-4,r6
shlr2   r6
add #1,r6
.L9:
mov.l   @r4+,r1
mov #12,r2! load constant in loop
dt  r6
shldr2,r1 ! use constant in loop
bf/s.L9
add r1,r0
.L11:
rts
nop


better:
cmp/pl  r6
bf  .L11
shll2   r6
add #-4,r6
shlr2   r6
add #1,r6
mov #12,r2! load constant before loop
.L9:
mov.l   @r4+,r1
dt  r6
shldr2,r1 ! use constant in loop
bf/s.L9
add r1,r0
.L11:
rts
nop


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-12-29 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #10 from Oleg Endo oleg.e...@t-online.de 2011-12-30 02:14:00 UTC 
---
(In reply to comment #9)
 (In reply to comment #8)
  Specifying -fno-tree-forwprop doesn't seem to have any effect on these 
  cases.
 
 For that function, -fdump-tree-all shows that the tree loop ivopts
 optimization does it.  Try -fno-tree-forwprop -fno-ivopts.

This makes even worse code.
Anyway, I think this issue should be addressed by the auto_inc_dec pass.

If OK, I'd like to change it from target PR to middle-end PR.


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-12-28 15:59:35 UTC 
---
(In reply to comment #3)
 Created attachment 26191 [details]
 Proposed patch to improve some of the issues.
 
 The attached patch removes the useless sequence and still allows the -1
 constant to be CSE-ed for such cases as the example function above.
 
 I haven't ran all tests on it yet, but CSiBE shows average code size reduction
 of approx. -0.1% for -m4* with some code size increases in some files.

Some of the code size increases are caused by the ifcvt.c pass which tries to
transform sequences like:

int test_func_6 (int a, int b, int c)
{
  if (a == 16)
c = 0;
  return b + c;
}

into branch-free code like:
mov r4,r0   ! 45movsi_ie/2[length = 2]
cmp/eq  #16,r0  ! 9 cmpeqsi_t/2[length = 2]
mov #-1,r0  ! 34movsi_ie/3[length = 2]
negcr0,r0   ! 38*negc[length = 2]
neg r0,r0   ! 36negsi2[length = 2]
and r6,r0   ! 37*andsi3_compact/2[length = 2]
rts ! 48*return_i[length = 2]
add r5,r0   ! 14*addsi3_compact[length = 2]

instead of the more compact (and on SH4 most likely better):
movr4,r0   ! 41movsi_ie/2[length = 2]
cmp/eq#16,r0  ! 9cmpeqsi_t/2[length = 2]
bf0f  ! 34*movsicc_t_true/2[length = 4]
mov#0,r6
0:
addr5,r6   ! 14*addsi3_compact[length = 2]
rts ! 44*return_i[length = 2]
movr6,r0   ! 19movsi_ie/2[length = 2]

This particular case is handled in noce_try_store_flag_mask, which does the
transformation if BRANCH_COST = 2, which is true for -m4.  I guess before the
patch ifcvt didn't realize that this transformation can be applied.

I've tried setting BRANCH_COST to 1, which avoids this transformation but
increases overall code size a bit.


[Bug target/51340] SH Target: Make -mfused-madd enabled by default

2011-12-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340

--- Comment #4 from Oleg Endo oleg.e...@t-online.de 2011-12-29 00:02:40 UTC 
---
(In reply to comment #3)
 (In reply to comment #2)
  Uhm, yes...
  The title should have been Enable -mfused-madd by -ffast-math
 
 Do you mean something like this?
 
 --- ORIG/trunk/gcc/config/sh/sh.c2011-12-03 10:03:41.0 +0900
 +++ trunk/gcc/config/sh/sh.c2011-12-27 08:33:23.0 +0900
 @@ -838,6 +838,11 @@ sh_option_override (void)
  align_functions = min_align;
  }
 
 +  /* Default to use fmac insn when -ffast-math.  See PR target/29100.  */
 +  if (global_options_set.x_TARGET_FMAC == 0
 +   fast_math_flags_set_p (global_options)
 +TARGET_FMAC = 1;
 +
if (sh_fixed_range_str)
  sh_fix_range (sh_fixed_range_str);
 

Yes, something like that.  Or maybe check flag_unsafe_math_optimizations, as it
is done for FSCA and FSRRA insns in sh.md.

  I don't know the exact semantics for the new patterns.  All I know is that
  rounding is supposed to be done only once after the two operations.  This is
  the case for the SH fmac insn.  Not sure whether this is enough though.
 
 It seems that we can use the fma pattern, though it would be an another issue.

Maybe when trunk is back to stage 1.


[Bug target/51697] New: SH Target: Inefficient DImode comparisons for -Os

2011-12-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51697

 Bug #: 51697
   Summary: SH Target: Inefficient DImode comparisons for -Os
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


For -Os and everything but -m1 DImode comparisons are not optimized properly
which results in redundant SImode comparisons, producing code worse than for
-O1.  
A reduced example:

int test_0 (long long* x)
{
  return *x  0x ? -20 : -40;
}

-Os -m2/-m3/-m4:
mov#0,r2   ! 55movsi_ie/3[length = 2]
tstr2,r2   ! 57cmpeqsi_t/1[length = 2]
bf/s.L12! 58branch_false[length = 2]
mov.l@(4,r4),r3! 12movsi_ie/7[length = 2]
tstr3,r3   ! 59cmpeqsi_t/1[length = 2]
.L12:
bt/s.L11! 14branch_true[length = 2]
mov#-40,r0 ! 5movsi_ie/3[length = 2]
mov#-20,r0 ! 4movsi_ie/3[length = 2]
.L11:
rts
nop ! 65*return_i[length = 4]


-Os -m1:
-O2 -m4:
mov.l   @(4,r4),r1   ! 10movsi_i/5[length = 2]
mov #-40,r0 ! 5movsi_i/3[length = 2]
tst r1,r1   ! 15cmpeqsi_t/1[length = 2]
bt  .L7 ! 16branch_true[length = 2]
mov #-20,r0 ! 4movsi_i/3[length = 2]
.L7:
rts
nop ! 61*return_i[length = 4]


-O1 -m4:
mov.l   @(4,r4),r1  ! 10movsi_ie/7[length = 2]
tst r1,r1   ! 17cmpeqsi_t/1[length = 2]
bt/s.L6 ! 18branch_true[length = 2]
mov #-40,r0 ! 5movsi_ie/3[length = 2]
mov #-20,r0 ! 4movsi_ie/3[length = 2]
.L6:
rts
nop ! 62*return_i[length = 4]


Another example would be:

int test_2 (unsigned long long x)
{
  return x = 0x1LL ? -20 : -40;
}

-Os -m2/-m3/-m4:
mov #0,r2   ! 48movsi_ie/3[length = 2]
mov #-1,r3  ! 49movsi_ie/3[length = 2]
cmp/eq  r2,r4   ! 9cmpgtudi_t[length = 8]
bf/s.Ldi67
cmp/hi  r2,r4
cmp/hi  r3,r5
.Ldi67:
bf/s.L16! 10branch_false[length = 2]
mov #-40,r0 ! 5movsi_ie/3[length = 2]
mov #-20,r0 ! 4movsi_ie/3[length = 2]
.L16:
rts
nop ! 52*return_i[length = 4]


-Os -m1:
tst r4,r4   ! 9cmpeqsi_t/1[length = 2]
mov #-20,r0 ! 4movsi_i/3[length = 2]
bf  .L12! 10branch_false[length = 2]
mov #-40,r0 ! 5movsi_i/3[length = 2]
.L12:
rts
nop ! 56*return_i[length = 4]



The problem does not appear for -m1, only for -Os and -m2*, -m3*, -m4*.


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-12-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #15 from Oleg Endo oleg.e...@t-online.de 2011-12-29 00:34:53 UTC 
---
(In reply to comment #14)
 With trunk rev 181517 I have observed the following problem, which happens 
 when
 compiling for -m2*, -m3*, -m4* and -Os:
 

This is still present as of rev 182713 and seems to be a different issue.
I've created PR51697 for it.


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #2 from Oleg Endo oleg.e...@t-online.de 2011-12-27 21:26:33 UTC 
---
(In reply to comment #1)
 
 BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.
 

Yes, very much so.
It is reduced to return b for -m2, -m2e, -m2a, -m3, -m3e
but not for -m1 and -m4*.

The correct test function should be rather:

int test_func_0_NG (int a, int b, int c, int d)
{
  return (a != b || a != d) ? b : c;
}

which is actually OK for all variants except -m1 and -m4*:

cmp/eqr5,r4! 11cmpeqsi_t/3[length = 2]
bf.s.L6! 12branch_false[length = 2]
cmp/eqr7,r5! 14cmpeqsi_t/3[length = 2]
bf.L6! 15branch_false[length = 2]
movr6,r5! 8movsi_i/2[length = 2]
.L6:
rts! 42*return_i[length = 2]
movr5,r0! 23movsi_i/2[length = 2]


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-12-27 22:43:11 UTC 
---
Created attachment 26191
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26191
Proposed patch to improve some of the issues.

(In reply to comment #1)
 
 [...]
 
   mov #-1,rn
   negc rn,rm
   tst #255,rm
 
 which is essentially T_reg = T_reg.  Usually combine catches such
 situation, but negc might be too complex for combine.
 For this case, replacing current movnegt expander by insn, splitter
 and peephole something like
 
 [...]

 the above useless sequence could be removed, though we will miss
 the chance that the -1 can be CSE-ed when the cstore value is
 used.  This will cause a bit worse code for the loop like
 
 int
 foo (int *a, int x, int n)
 {
   int i;
   int count;
 
   for (i = 0; i  n; i++)
 count += (*(a + i) != x);
 
   return count;
 }
 

Thanks for your ideas and comments.  It was really useful.

The attached patch removes the useless sequence and still allows the -1
constant to be CSE-ed for such cases as the example function above.

I haven't ran all tests on it yet, but CSiBE shows average code size reduction
of approx. -0.1% for -m4* with some code size increases in some files.
Would something like that be OK for stage 3?


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #4 from Oleg Endo oleg.e...@t-online.de 2011-12-27 23:17:03 UTC 
---
(In reply to comment #1)
 
   return a = 0  b = 0 ? c : d;
 
 x = 0 is expanded to the sequence like
 
   ra = not x
   rb = -31
   rc = ra  (neg rb)
   T = (rc == 0)
   conditional jump
 
 and combine tries to simplify it.  combine simplifies b = 0
 successfully into shll and bt but fails to simplify a = 0.
 It seems that combine doesn't do constant propagation well and
 misses the constant -31.

Another simpler fail:

int test_func_22_NG (int a, int b, int c, int d)
{
  return a = 0;
}

becomes:
not r4,r0   ! 9one_cmplsi2[length = 2]
mov #-31,r1 ! 12movsi_ie/3[length = 2]
rts ! 31*return_i[length = 2]
shldr1,r0   ! 13lshrsi3_d[length = 2]

which could be:
cmp/pzr4
rts
movtr0

From what I could observe, this is caused by the various shift insns which
leads combine to this result.  For example, the shll, branch sequence that
is used instead of cmp/pz, branch is caused by the ashlsi_c insn, which
defines a lt:SI comparison.  Although that is correct, using cmp/pz could
be better, since it does not modify the reg, and on SH4 it is an MT group
insn.  The ashlsi_c insn / lt:SI picking can be avoided by adjusting the 
rtx costs, for instance (just tried it out briefly).

I think a peephole in this case could fix some of the symptoms but not the
actual cause.  I'll see if I can come up with something that works without a
peephole, even though all the shift stuff looks a bit suspicious ;)


[Bug target/51244] SH Target: Inefficient conditional branch

2011-12-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

--- Comment #5 from Oleg Endo oleg.e...@t-online.de 2011-12-28 02:44:05 UTC 
---
(In reply to comment #2)
 (In reply to comment #1)
  
  BTW, OT, (a != b || a != c) ? b : c could be reduced to b, I think.
  
 
 Yes, very much so.
 It is reduced to return b for -m2, -m2e, -m2a, -m3, -m3e
 but not for -m1 and -m4*.

This seems to be due to the following in sh.h:

#define BRANCH_COST(speed_p, predictable_p) \
(TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1)


[Bug target/51340] SH Target: Make -mfused-madd enabled by default

2011-12-18 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340

--- Comment #2 from Oleg Endo oleg.e...@t-online.de 2011-12-19 01:29:12 UTC 
---
(In reply to comment #1)
 (In reply to comment #0)
  Is there any particular reason why this should not be enabled by
  default for SH targets that support the FMAC insn?
 
 PR29100?

Uhm, yes...
The title should have been Enable -mfused-madd by -ffast-math

 
 BTW, if SH fmac satisfies the semantics for fused multiplication and
 add operation, the fmaf4 instruction pattern would be better now.

I don't know the exact semantics for the new patterns.  All I know is that
rounding is supposed to be done only once after the two operations.  This is
the case for the SH fmac insn.  Not sure whether this is enough though.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-12-11 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #20 from Oleg Endo oleg.e...@t-online.de 2011-12-12 02:11:12 UTC 
---
(In reply to comment #19)
 The results look way better now.  I've tested your latest patch for
 sh4-unknown-linux-gnu and found no new regressions for gcc testsuite.
 CSiBE with -O2 -fpic on that target shows that 144 improvements and
 28 dis-improvements for size on 896 files.  The worst case is
 -4.34783 net/ipv4/ip_forward 704 736
 which looks the case of the high r0 register pressure.  The best one is
 25.7426 arch/testplatform/kernel/traps 10160 8080
 which looks to be very impressive.

That looks nice!  Thanks for checking it out!
I haven't ran CSiBE with -O2 -fpic, only with -Os -mpretend-cmove
-mfused-madd -freg-struct-return.  I will use your params and try to see what
is happening in those cases where it gets worse.  Maybe it can be fixed with
a peephole.

 
/* We want to enable the use of SUBREGs as a means to
   VEC_SELECT a single element of a vector.  */
 +
 +  /* This effectively disallows using GENERAL_REGS for SFmode vector 
 subregs.
 + This can be problematic when SFmode vector subregs need to be accessed
 + on the stack with displacement addressing, as it happens with -O0.
 + Thus we allow the mode change for -O0.  */
if (to == SFmode  VECTOR_MODE_P (from)  GET_MODE_INNER (from) == 
  SFmode)
 -return (reg_classes_intersect_p (GENERAL_REGS, rclass));
 +return optimize ? (reg_classes_intersect_p (GENERAL_REGS, rclass)) : 
 false;


 + Thus we allow the mode change for -O0.  */

.. should be _disallow_ of course... 
Another note that should have went into the comment:

As far as I could observe it, this is mainly triggered by the following in
sh_legitimate_index_p:

+  if (mode == QImode  (unsigned) INTVAL (op)  16)
+return true;

.. probably because this makes the generic m constraint match QImode
displacement addressing and then it tries using it.

 Rather than that, I guess that the QI/HImode disp addressing would
 be an optimization unneeded for -O0 in the first place.  Perhaps
 something like -mpreferdisp option and TARGET_PREFER_DISP macro
 which are enable by default but disable at -O0 might be help.

Yeah, could also be an option. 

 It'll also help some unfortunate anormallies for which those optimizations
 will generate worse codes.

You mean, by giving the user the option to turn off displacement addressing for
e.g. some specific files / modules by specifying -mno-preferdisp or something
like that?  By anomalies do you mean code that gets worse because of too much
pressure on R0 and all the reloads around it, or do you have any other bad use
cases?

BTW, the vector mode handling seems a bit unfinished (see also PR13423).  I was
planning to address that at a later point...


 Maybe.  Implementing it with predicates and constraints would be
 smarter if possible but may be difficult because the register
 allocator handles the m constraint specially.

Yes, the m constraint is an obstacle in this case.
What I've tried out is splitting it into a memory constraint that allows
displacement addressing and another memory constraint that disallows it (Snd,
Sdd - they are added by the patch) and use those in the move / sign extend
insns instead of the generic m constraint.  For example something like that:

(define_insn *extendqisi2
  [(set (match_operand:SI 0 arith_reg_dest =z,r,r)
(sign_extend:SI
 (match_operand:QI 1 general_movsrc_operand Sdd,Snd,r)))]
  TARGET_SH1
  @
mov.b%1,%0
mov.b%1,%0
exts.b%1,%0
  [(set_attr type load,load,arith)])

This basically seems to work. But when there are consecutive loads, reload
would use displacement addressing for the first load, but not for the following
loads because R0 will be already allocated at that point.  Ideally, reload
should take into account that reloading around R0 is in most cases more
efficient than other strategies, especially on SH4.  However, I'm not sure
whether changing reload for this issue is a good idea ;)

Another thing I could try out is to have load/store insns that allow arbitrary
operands in displacement addressing like on SH2A, and split them into two insns
of one load/store and one reg-reg move after reload.  But that would probably
require the R0 clobber in the expander which could make worse code in cases
where displacement addressing is not used, I guess.
Do you think this approach could make sense?

 I think so, though we are in stage 3 and have to wait the trunk returns
 to stage 1 or 2 for committing such changes.

I was afraid you might say something like this :T

 You have the time for implementing HImode support.

Yep, sure.  I've noticed that the latest version of the patch seems to fix some
more testsuite failures.  I will investigate which hunk is responsible for the
fixes so that could be pulled out from the patch.  OK?

 BTW, the changes for white spaces, spells and other clean-ups which

[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-12-10 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #25932|0   |1
is obsolete||

--- Comment #18 from Oleg Endo oleg.e...@t-online.de 2011-12-11 00:24:17 UTC 
---
Created attachment 26046
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26046
Proposed patch to add QImode displacement addressing

This version of the patch was tested against rev 182090 and did not introduce
new failures.

It allows the generation of QImode displacement addressing move insns, but only
if the addresses can be fixed before reload.  If reload needs to load/store
e.g. pseudos on the stack where displacement addressing would be required the
generated code is not as good unfortunately.  One good example for this kind of
scenario is the gcc.dg/compat/struct-by-value-11 test from the testsuite, which
shows that there is still some room for improvement.
Still, on the CSiBE I could observe a code size decrease of -1.2% avg, where
mpeg2dec goes down to -5.65% and libmpeg2/motion_comp down to -20% (-m4 -ml
taken as example).  Due to SH2A's 4 byte QImode displacement insn the benefit
on SH2A code is not that big (-0.2% avg).

There are probably smarter ways of doing what the patch does.  I have also
tried out implementing it with predicates and constraints, few load/store insns
and lots of alternatives in the insns.  However, reload would refuse to select
the displacement addressing due to pressure on R0 in many cases.

Would something like the attached patch be acceptable (after some cleanups)? 
If so, I'd also start adding HImode displacement addressing support.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-11-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #25848|0   |1
is obsolete||

--- Comment #17 from Oleg Endo oleg.e...@t-online.de 2011-11-28 12:26:42 UTC 
---
Created attachment 25932
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25932
Proposed patch to add QImode displacement addressing

An updated version of the patch.  Tested against rev 181404, resolves some
previous failures for SH2A but also introduces the following libstdc++ related
failures:

FAIL: 22_locale/ctype/is/wchar_t/11740.cc (test for excess errors)
FAIL: 22_locale/money_get/get/char/5.cc execution test
FAIL: 22_locale/money_put/put/char/12971.cc execution test
FAIL: 22_locale/money_put/put/char/39168.cc execution test
FAIL: 22_locale/money_put/put/char/4.cc execution test
FAIL: 22_locale/money_put/put/char/5.cc execution test
FAIL: 22_locale/money_put/put/char/6.cc execution test
FAIL: 22_locale/money_put/put/wchar_t/12971.cc execution test
FAIL: 22_locale/money_put/put/wchar_t/39168.cc execution test
FAIL: 22_locale/money_put/put/wchar_t/4.cc execution test
FAIL: 22_locale/money_put/put/wchar_t/5.cc execution test
FAIL: 22_locale/money_put/put/wchar_t/6.cc execution test

Still investigating...


[Bug target/51337] New: SH Target: Various testsuite ICEs for -m2a -O0

2011-11-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51337

 Bug #: 51337
   Summary: SH Target: Various testsuite ICEs for -m2a -O0
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh2a-*-*


There are various testsuite ICEs for -m2a -mb -O0, which look similar to the
following:

gcc.c-torture/compile/2923-1.c:

...: error: insn does not satisfy its constraints:
(insn 142 34 35 (set (mem/c:SI (plus:SI (reg/f:SI 14 r14)
(const_int 36 [0x24])) [0 %sfp+-16 S4 A32])
(reg:SI 150 fpul)) ... {movsi_ie}
 (nil))
...: internal compiler error: in extract_constrain_insn_cached, at recog.c:2052

The problem seems to be the movsi_ie insn, for which reload thinks it is
possible to load the FPUL register from a stack allocated reg using SImode
displacement addressing, which is actually not possible.

The following patch seems to help the issues...

--- gcc/config/sh/sh.c.orig2011-11-28 10:03:04.0 +0900
+++ gcc/config/sh/sh.c2011-11-28 15:09:01.0 +0900
@@ -12432,6 +12432,10 @@ sh_secondary_reload (bool in_p, rtx x, r
   if (rclass != GENERAL_REGS  REG_P (x)
TARGET_REGISTER_P (REGNO (x)))
 return GENERAL_REGS;
+  /* If here fall back to loading FPUL register through general regs. 
+ Happens when FPUL has to be loaded from a reg allocated on the stack.  */
+  if (rclass == FPUL_REGS  !REG_P (x))
+return GENERAL_REGS;
   return NO_REGS;
 }

However, I think the following should be a better way to check the condition in
question...

+  if (rclass == FPUL_REGS  true_regnum (x) == -1)
+return GENERAL_REGS;

I'll submit a patch once testing is completed.


[Bug target/51340] New: SH Target: Make -mfused-madd enabled by default

2011-11-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340

 Bug #: 51340
   Summary: SH Target: Make -mfused-madd enabled by default
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Currently the FMAC insn has to be enabled by specifying the -mfused-madd
option.  Is there any particular reason why this should not be enabled by
default for SH targets that support the FMAC insn?


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-11-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #8 from Oleg Endo oleg.e...@t-online.de 2011-11-28 22:31:44 UTC 
---
(In reply to comment #7)
 The problem is that SH target can't do those simple array accesses
 well at QI/HImode because of the lack of displacement addressing
 for those modes.

In these particular cases, this is true.  Even if there was a working QI/HImode
displacement addressing, it would most likely result in worse code compared to
post-inc / pre-dec addressing opportunities, because of register pressure on
R0.  Apart from that there is no displacement addressing for FPU loads/stores,
which also misses some opportunities:

float test_func_5 (float* p, int c)
{
  float r = 0;
  do
  {
r += *p++;
r += *p++;
r += *p++;
  } while (--c);
  return r;
}

Compiled with -Os -m4-single:

fldi0fr0
.L11:
movr4,r1
fmov.s@r1+,fr1
dtr5
faddfr1,fr0
fmov.s@r1,fr1
movr4,r1
add#8,r1
add#12,r4
faddfr1,fr0
fmov.s@r1,fr1
bf/s.L11
faddfr1,fr0
rts
nop


..could be:
fldi0fr0
.L11:
fmov.s@r4+,fr1
dtr5
faddfr1,fr0
fmov.s@r4+,fr1
faddfr1,fr0
fmov.s@r4+,fr1
bf/s.L11
faddfr1,fr0
rts
nop


Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.


[Bug target/50814] SH Target: SHAD / SHLD instructions not used on SH2A

2011-11-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

--- Comment #2 from Oleg Endo oleg.e...@t-online.de 2011-11-27 22:38:34 UTC 
---
(In reply to comment #1)

 Will be slightly different because sh2a's shadshld are 4-byte
 insns.  Perhaps something like below will work, though I don't
 test it at all.
 

According to the SW manual document rej09b0051_sh2a.pdf the SHAD and SHLD insns
have the same 2-byte format as on SH3:

SHAD Rm, Rn: 01001100
SHLD Rm, Rn: 01001101 

Am I missing something there?


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-11-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #14 from Oleg Endo oleg.e...@t-online.de 2011-11-20 13:04:39 UTC 
---
With trunk rev 181517 I have observed the following problem, which happens when
compiling for -m2*, -m3*, -m4* and -Os:

The function compiled with -m2 -Os

int test_int64_lowword (long long* x)
{
  return *x  0x ? -20 : -40;
}

results in the following code:

mov#0,r2! 42movsi_i/3[length = 2]
tstr2,r2! 44cmpeqsi_t/1[length = 2]
bf.s.L4! 45branch_false[length = 2]
mov.l@(4,r4),r3! 12movsi_i/5[length = 2]
tstr3,r3! 46cmpeqsi_t/1[length = 2]
.L4:
bt.L3! 14branch_true[length = 2]
mov#-20,r0! 4movsi_i/3[length = 2]
rts
nop! 52*return_i[length = 4]
.L3:
rts! 54*return_i[length = 2]
mov#-40,r0! 5movsi_i/3[length = 2]

When compiled with -O1 or -O2 the high word test is completely eliminated
correctly:

mov.l@(4,r4),r1! 10movsi_i/5[length = 2]
tstr1,r1! 17cmpeqsi_t/1[length = 2]
bt.L4! 18branch_true[length = 2]
mov#-20,r0! 4movsi_i/3[length = 2]
rts
nop! 50*return_i[length = 4]
.L4:
rts! 52*return_i[length = 2]
mov#-40,r0! 5movsi_i/3[length = 2]

I'm not sure whether this is actually related to this PR, but have noticed it
with the test cases of this PR.  It seems only to happen for comparison-like
insns.  If I remember correctly, this problem did not exist when I started
working on this PR.


[Bug target/51241] New: SH Target: Unnecessary sign/zero extensions

2011-11-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51241

 Bug #: 51241
   Summary: SH Target: Unnecessary sign/zero extensions
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Created attachment 25868
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25868
Examples

Under some circumstances unnecessary sign/zero extension insns are generated as
shown in the attached example.

sh-elf-gcc -v
Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 2019 (experimental) (GCC)


[Bug target/51244] New: SH Target: Inefficient conditional branch

2011-11-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244

 Bug #: 51244
   Summary: SH Target: Inefficient conditional branch
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Created attachment 25869
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25869
Examples

It seems that the condition inversion sometimes gets confused, resulting in
unnecessary code bloat.  The attached examples were compiled with -Os but the
same problem happens also at -O2 and -O3.

sh-elf-gcc -v
Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 2019 (experimental) (GCC)


[Bug middle-end/50325] [4.7 Regression] 76 new fails with rev. 177691

2011-11-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50325

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

 CC||oleg.e...@t-online.de

--- Comment #19 from Oleg Endo oleg.e...@t-online.de 2011-11-21 00:48:16 UTC 
---
Hello,

the patch from 181405 also breaks things for the SH target when compiling for
big endian.  For instance when compiling the CSiBE set it bails out with:

/home/code/CSiBE/./src/./teem-1.6.0-src/src/air/754.c: In function
'airFPGen_d':
/home/code/CSiBE/./src/./teem-1.6.0-src/src/air/754.c:283:45: internal compiler
error: in int_mode_for_mode, at stor-layout.c:424

I haven't checked the testsuite results, but I guess they are similar to the
other reports.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-11-17 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #25684|0   |1
is obsolete||

--- Comment #16 from Oleg Endo oleg.e...@t-online.de 2011-11-17 23:58:11 UTC 
---
Created attachment 25848
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25848
Proposed patch to add QImode displacement addressing

Just wanted to share some progress on this issue...

The FPUL case from before seems to be solved, but another issue is left which
is related to vector modes.  For example the 20050113-1.c target test case
compiled with -O0 causes:

20050113-1.c:16:1: internal compiler error: in change_address_1, at
emit-rtl.c:2001
Please submit a full bug report,
with preprocessed source if appropriate.

This is because of the following in function sh_cannot_change_mode_class:

  if (to == SFmode  VECTOR_MODE_P (from)  GET_MODE_INNER (from) == SFmode)
return (reg_classes_intersect_p (GENERAL_REGS, rclass));


IRA's last words on that are:

Reloads for insn # 52
Reload 0: FPUL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine,
secondary_reload_p
reload_reg_rtx: (reg:SI 150 fpul)
Reload 1: reload_in (SI) = (reg:SI 65 fr1 [orig:171 retval ] [171])
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1)
reload_in_reg: (reg:SI 65 fr1 [orig:171 retval ] [171])
reload_reg_rtx: (reg/i:SI 0 r0)
secondary_in_reload = 0
deleting insn with uid = 26.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-11-01 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #13 from Oleg Endo oleg.e...@t-online.de 2011-11-02 00:15:44 UTC 
---
Created attachment 25684
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25684
Proposed patch to add QImode displacement addressing

This should be better now.  I have also put back support for the SH2A
displacement addressing insns.  The CSiBE set compiles fine for -m2a,
-m4-single, -m4a-single, both big and little endian.  However, it introduces
new failures in the testsuite.  I have examined a few of them and one common
problem seems to be of this nature:

udivmod4.c:59:1: error: insn does not satisfy its constraints:
(insn 313 312 194 (set (reg:SI 150 fpul)
(mem/c:SI (plus:SI (reg/f:SI 15 r15)
(const_int 8 [0x8])) [0 %sfp+-72 S4 A32])) udivmod4.c:54 192
{movsi_ie}
 (nil))
udivmod4.c:59:1: internal compiler error: in extract_constrain_insn_cached, at
recog.c:2052

.. when compiled with -O0 -m4-single -mb.

This is mainly caused by the following added lines in sh_legitimate_index_p:

+  if ((GET_MODE_SIZE (mode) == 1)  (unsigned) INTVAL (op)  16)
+return true;

Apparently this makes something believe that loading the FPUL register from a
displacement address is possible, which is of course not the case.  However, I
can't see any connection there...


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-11-01 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #15 from Oleg Endo oleg.e...@t-online.de 2011-11-02 01:38:26 UTC 
---
(In reply to comment #14)
 .ira dump would be your friend, though I suspect that your patch
 triggered off some other reload problem like PR48596.  Could you
 try the change in #5 of that PR?

I just tried it out, but it seems it has no effect in this case.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-30 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #5 from Oleg Endo oleg.e...@t-online.de 2011-10-30 12:36:51 UTC 
---
(In reply to comment #4)
 
 I'm a bit curious to see what happens if they are changed
 to non-zero for SI/DImode.

..not much actually. I've tried defining both as TARGET_SH1 and compared 
before/after CSiBE results.  No code size changes at all.  Also the micro test
cases do not seem to be affected by that.  Of course I might have missed some
border case...


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-30 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #9 from Oleg Endo oleg.e...@t-online.de 2011-10-30 12:45:30 UTC 
---
(In reply to comment #8)
 (In reply to comment #7)
 
 This problem doesn't require the theoretical/mathematical
 completeness.  There are many inappropriate combinations
 of options which don't get any warning when running compiler
 and configurations.  The important thing is to warn very
 confusing ones from the user's point of view.  So your patch
 in #6 or even one liner in #2 would be OK and enough for
 this PR, I think.

OK.  I think the SH1 check should be left out completely in this case.
If it is included, the default configuration (-m1 -mb) will fail to build
e.g. newlib.  I'd propose catching the SH2A case only, considering the user's
default -mb / -ml configuration.  If somebody configures the default to be -m2a
-ml then it will fail to build, but I guess only very few people will try to do
that anyway.


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-30 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-10-30 13:53:51 UTC 
---
(In reply to comment #1)
 GCC makes usual mem accesses into those with post_inc/pre_dec at
 auto_inc_dec pass.  I guess that auto_inc_dec pass can't find
 post_inc insns well in that case because other tree/rtl optimizers
 tweak the code already.  If this is the case, the problem would be
 not target specific.

Looks like so.  For the simple test case...

int test (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;
  return r;
}

...the load insns are initially expanded as:

(insn 2 5 3 2 (set (reg/v/f:SI 169 [ p ])
(reg:SI 4 r4 [ p ]))
 (nil))


(insn 7 6 8 3 (set (reg:SI 171)
(sign_extend:SI (mem:QI (reg/v/f:SI 169 [ p ]) [0 *p_2(D)+0 S1 A8]))) 
 (nil))


(insn 8 7 9 3 (set (reg:SI 172)
(reg/v/f:SI 169 [ p ]))
 (nil))


(insn 9 8 10 3 (set (reg/f:SI 173)
(plus:SI (reg/v/f:SI 169 [ p ])
(const_int 1 [0x1])))
 (nil))


(insn 10 9 11 3 (set (reg:SI 174)
(sign_extend:SI (mem:QI (reg/f:SI 173) [0 MEM[(char *)p_2(D) + 1B]+0 S1
A8])))
 (nil))

(insn 13 12 14 3 (set (reg/f:SI 177)
(plus:SI (reg/v/f:SI 169 [ p ])
(const_int 2 [0x2])))
 (nil))

(insn 14 13 15 3 (set (reg:SI 178)
(sign_extend:SI (mem:QI (reg/f:SI 177) [0 MEM[(char *)p_2(D) + 2B]+0 S1
A8])))
 (nil))


The auto_inc_dec pass then fails to realize that (reg:SI 177) = (reg:SI 169) +
2 = (reg:SI 173) + 1.

I wonder whether there might be something in the target code that suggests the
early optimizers to do that?  I've tried playing with the TARGET_ADDRESS_COST
hook but it didn't have any effect in this case.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #9 from Oleg Endo oleg.e...@t-online.de 2011-10-27 09:23:47 UTC 
---
Created attachment 25625
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25625
Stripped reload test case from bzip2

(In reply to comment #8)
 
 and RA chooses r1 and r0 as the registers to where memories will
 be loaded.  The problem is we have no direct way to load buf1[4]
 to r1.  In such situation, a secondary reload is needed.  See
 the description of TARGET_SECONDARY_RELOAD in the gcc manual.
 Here is a trial:

Thanks a lot!

 
 The ICE for your testcase went away with it, though I've got
 [...]
 spill_failure, at reload1.c:2118
 
 when bootstrapping.

So this passes the ball to the mov.b (R0,Rn),Rm insn...
The same happens with bzip2 in CSiBE.  I have attached a stripped down
snippet that reproduces the problem.

IRA's last words on that one are:

Using reg 2 for reload 0
Spilling for insn 365.
Using reg 2 for reload 0
Spilling for insn 371.
Using reg 3 for reload 0
Spilling for insn 374.
Using reg 2 for reload 0
Spilling for insn 377.
Using reg 0 for reload 1
reload failure for reload 2

Reloads for insn # 377
Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 15 r15)
(const_int 16 [0x10]))
GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2), can't combine
reload_in_reg: (plus:SI (reg/f:SI 15 r15)
(const_int 16 [0x10]))
Reload 1: reload_in (SI) = (reg:SI 3 r3 [693])
R0_REGS, RELOAD_FOR_INPUT (opnum = 0)
reload_in_reg: (reg:SI 3 r3 [693])
Reload 2: R0_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 2), can't combine,
secondary_reload_p
Reload 3: reload_in (QI) = (mem/c:QI (plus:SI (reg/f:SI 15 r15)
(const_int 16 [0x10]))
[6 %sfp+-60 S1 A32])
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 2), can't combine
reload_in_reg: (subreg:QI (reg/v:SI 177 [ curr ]) 0)
secondary_in_reload = 2


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #10 from Oleg Endo oleg.e...@t-online.de 2011-10-27 21:10:47 UTC 
---
@@ -12430,6 +12453,10 @@ sh_secondary_reload (bool in_p, rtx x, r
   if (rclass != GENERAL_REGS  REG_P (x)
TARGET_REGISTER_P (REGNO (x)))
 return GENERAL_REGS;
+  if (rclass == GENERAL_REGS  mode == QImode
+   MEM_P (x)  GET_CODE (XEXP (x, 0)) == PLUS
+   CONST_INT_P (XEXP (XEXP (x, 0), 1))
+   INTVAL (XEXP (XEXP (x, 0), 1))  16)
+return R0_REGS;
   return NO_REGS;
 }


Makes the stripped down bszip2 test pass, but the same problem pops up
in other CSiBE sources.


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-27 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #11 from Oleg Endo oleg.e...@t-online.de 2011-10-27 21:54:17 UTC 
---
Created attachment 25639
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25639
Stripped reload test case from jpeg-6b


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-26 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-10-26 22:36:31 UTC 
---
Created attachment 25621
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25621
Experimental not working patch for mov.b with displacement addressing


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-26 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #7 from Oleg Endo oleg.e...@t-online.de 2011-10-26 23:07:08 UTC 
---
Created attachment 25622
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25622
asmcons and ira pass log for the reload failure of z insn constraint

(In reply to comment #5)
 It seems that clobbering R0 in that expander is simply papering
 over the real problem.  

Yes, it is very much.

 Although the reload issue beyonds me,
 .ira dump file about that impossible insn which doesn't satisfy
 the z constraint would be a starting point.

I've been trying to make some sense out of it since, but I'm a bit clueless at
the moment here.

The following function is a reduced failure example:

int fail (char* buf0, char* buf1)
{
  char a,b,c;

  a = buf0[0] + buf1[1];
  b = buf0[5] + buf1[4];
  c = buf0[15] + buf1[14];
  return a+b-c;
}

In the attached log the problematic pseudo reg is reg:QI 191, where it is
reloaded into r1... 

Reloads for insn # 13
Reload 0: reload_in (QI) = (reg:QI 191 [ MEM[(char *)buf1_4(D) + 4B] ])
reload_out (SI) = (reg:SI 1 r1 [193])
GENERAL_REGS, RELOAD_OTHER (opnum = 0)


Strange thing is that the following variant does not cause the error:

int fail (char* buf0, char* buf1, char* out)
{
  char a,b,c;

  a = buf0[0] + buf1[1];
  b = buf0[5] + buf1[4];
  c = buf0[15] + buf1[14];
  *out = a+b-c;
  return *out;
}


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-23 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #4 from Oleg Endo oleg.e...@t-online.de 2011-10-23 21:56:56 UTC 
---
Created attachment 25582
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25582
Experimental patch for mov.b with displacement addressing

 (In reply to comment #2)
 
 Welcome to the spill-failure-for-class-'R0_REGS' club :-)

The attached patch is an experimental example only.  Please ignore wrong
formatting, comments, and the fact that it cripples some of the SH2A support :}

It adds support for 'mov.b @(disp, Rm), R0' and 'mov.b R0, @(disp, Rn)'
instructions.  I haven't tested it fully, only building the CSiBE
set for -m4-single -ml. 

Although it results in suboptimal code like ...

...
mov.l@(4,r15),r2
movr1,r0
mov.br0,@(5,r2)
rts
movr1,r0! redundant

.. or like ...

mov.b@(1,r4),r0
movr0,r4
mov.b@(2,r5),r0
addr0,r4! better: add r4, r0
movr4,r0! not needed if add operands are swapped
mov.br0,@(5,r6)
rts
movr4,r0! redundant

.. it already shows some code size improvements:

avg: -563.22 / -0.942376 %
max: compiler22804 - 22928 +124 / +0.543764 %
min: OpenTCP-1.0.4   27069 - 25989-1080 / -3.989804 %

top 5 files
mpeg2dec-0.3.1  libmpeg2/motion_comp 
  6044 - 4796 -1248 / -20.648577 %

libpng-1.2.5  pngrtran
  19668 - 18904 -764 / -3.884482 %

linux-2.4.23-pre3-testplatform  arch/testplatform/kernel/traps
  6192 - 5532  -660 / -10.658915 %

lwip-0.5.3.preproc  src/core/tcp_input
  5424 - 5040  -384 / -7.079646 %

libmspack  test/cabextract_md5
  21780 - 21424 -356 / -1.634527 %


The R0 clobber in the movqi expander and the explicit usage of R0 in the 
splits effectively disable some optimizations, but this is the only 
thing I could get to work so far.
I've left the straight forward but non-working patterns as comments in the
patch as a reference.  Basically, without the R0 clobber in the movqi expander
it eventually ends up like that...

error: insn does not satisfy its constraints:
(insn 737 40 42 4 (set (reg:QI 10 r10)
(mem/c:QI (plus:SI (reg:SI 1 r1 [386])
(const_int 1 [0x1])) [0 *D.4946_20+0 S1 A8]))
{*movqi_m_reg_disp_load}
 (nil))

internal compiler error: in reload_cse_simplify_operands, at postreload.c:403

I'm puzzled why the register allocator ignores the constraint z when it
starts to run out of registers.  In the error case above it tries to produce
something like 'mov.b @(1,r1),r10' which of course is impossible.
Any hints are highly appreciated.


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #7 from Oleg Endo oleg.e...@t-online.de 2011-10-20 18:44:32 UTC 
---
(In reply to comment #6)
 (In reply to comment #5)
  I'll send in a patch with a couple of other cosmetic changes later, OK?
 
 Please go for it.

..or maybe just leave it as it is :T
The change I was suggesting opens up another problem with multilib and endian
config / selection.
I think instead of adding / implementing the endian restrictions it would be
more useful to expand little endian support in binutils and drop the endian
restrictions in gcc altogether once binutils fully support it.

What do you think? Would that make more sense in the end?


[Bug target/50814] New: SH Target: SHAD / SHLD instructions not used on SH2A

2011-10-20 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50814

 Bug #: 50814
   Summary: SH Target: SHAD / SHLD instructions not used on SH2A
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh2a-*-*


Although there are some insns (e.g. ashlsi3_sh2a) that are supposed to handle
dynamic shifts on SH2A, somehow the dynamic shift instructions SHAD and SHLD
are never generated, no matter what the shift amount is.


int x_shad_right (int y)
{
  return y  15;
}

mov.l.L6,r1
sts.lpr,@-r15
jsr@r1
nop
movr4,r0
lds.l   @r15+,pr
rts/n

.align 2
.L6:
.long___ashiftrt_r4_15


int x_shad_left (int y)
{
  return y  15;
}

movr4,r0
shll8r0
shlrr0
rts
shll8r0



Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20111020 (experimental) (GCC) 



It is also not clear to me why SH2A seems to require different handling for
dynamic shifts than SH3 or SH4...


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-18 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-10-18 21:33:06 UTC 
---
(In reply to comment #2)
 Ah.  One liner
 
 -#define DRIVER_SELF_SPECS %{m2a:%{ml:%eSH2a does not support 
 little-endian}}
 +#define DRIVER_SELF_SPECS %{m2a*:%{ml:%eSH2a does not support
 little-endian}}
 
 should work.

Yep, does work. 
However, I've noticed that it will stop working when the compiler's default
SH CPU and endian configuration is not -m1 -mb.

The code below instead gets the job done but I'm not sure whether it could
be written in a better way.
I've also noticed, that SH1 also supports big endian only, and -m1 -ml is
explicitly excluded in the multilib config in t-sh.

#if TARGET_ENDIAN_DEFAULT == MASK_BIG_ENDIAN
#define IS_LITTLE_ENDIAN_OPTION %{ml:
#else
#define IS_LITTLE_ENDIAN_OPTION %{!mb:
#endif

#if TARGET_CPU_DEFAULT == SELECT_SH1
#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \
%{m1|!m2*:%{!m3*:%{!m4*:%{!m5*:%eSH1 does not support little-endian}
#else
#define UNSUPPORTED_SH1 IS_LITTLE_ENDIAN_OPTION \
%{m1:%eSH1 does not support little-endian}}
#endif

#if TARGET_CPU_DEFAULT  MASK_HARD_SH2A
#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \
%{m2a*|!m1:%{!m2*:%{!m3*:%{!m4*:{!m5*:%eSH2a does not support
little-endian}}
#else
#define UNSUPPORTED_SH2A IS_LITTLE_ENDIAN_OPTION \
%{m2a*:%eSH2a does not support little-endian}}
#endif

#define DRIVER_SELF_SPECS UNSUPPORTED_SH1, UNSUPPORTED_SH2A



Actually, SH2E also supports big endian only as per public specification.
But I don't think that it would cause any trouble for anyone.  Of course it
could be excluded as well. :)


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-18 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #5 from Oleg Endo oleg.e...@t-online.de 2011-10-18 22:37:13 UTC 
---
(In reply to comment #4)
 There are no real uses of SH1/SH2/SH2E/SH3E cores anymore, I think.

True. But the SH7020 (SH1) is still listed on digikey for an amazing
collector's price ;)

 I agree that taking care of -m2e is not worth.  Perhaps same for
 -m1.  Anyway, your change looks plausible to me.

I'll send in a patch with a couple of other cosmetic changes later, OK?


[Bug target/50749] SH Target: Post-increment addressing used only for first memory access

2011-10-18 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-10-19 00:00:01 UTC 
---
Kaz,

do you happen to know why the following is defined in sh.h?

#define USE_LOAD_POST_INCREMENT(mode)((mode == SImode || mode == DImode) \
   ? 0 : TARGET_SH1)

#define USE_STORE_PRE_DECREMENT(mode)((mode == SImode || mode == DImode) \
   ? 0 : TARGET_SH1)

Is there any (historical) reason to disable post-inc / pre-dec addressing
for SImode / DImode?

Thanks,
Oleg


[Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access

2011-10-16 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

 Bug #: 50749
   Summary: SH Target: Post-increment addressing used only for
first memory access
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Post-increment addressing is generated only for the first
memory access. Any subsequent memory access does not use 
post-increment addressing.
The following two functions are reduced examples and result
in the same code being generated.
The problem exists with any number of memory accesses  1
and at any optimization level.


int test_0 (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;
  return r;
}

int test_1 (char* p, int c)
{
  int r = 0;
  r += p[0];
  r += p[1];
  r += p[2];
  return r;
}

compiled with -fno-ivopts -Os -m4-single -ml ... 

movr4,r1
mov.b@r1+,r0
add#2,r4
mov.b@r1,r1
addr1,r0
mov.b@r4,r1
rts
addr1,r0

This could be done better ...

mov.b@r4+,r0
mov.b@r4+,r1
addr1,r0
mov.b@r4+,r1
rts
addr1,r0



Another example with a loop:

int test_func_1 (char* p, int c)
{
  int r = 0;
  do
  {
r += *p++;
r += *p++;
  } while (--c);
  return r;
}

compiled with -fno-ivopts -Os -m4-single -ml ... 

mov#0,r0
.L5:
movr4,r1
mov.b@r1+,r2
dtr5
mov.b@r1,r1
addr2,r0
add#2,r4
bf/s.L5
addr1,r0
rts
nop

would be better as:

mov#0, r0
.L5:
mov.b@r4+, r1
dtr5
mov.b@r4+, r2
addr1, r0
bf/s.L5
addr2, r0

rts
nop




Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20111016 (experimental) (GCC)


[Bug target/50750] New: SH Target: Pre-decrement addressing used only for first memory access

2011-10-16 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50750

 Bug #: 50750
   Summary: SH Target: Pre-decrement addressing used only for
first memory access
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Pre-decrement addressing is generated only for the first
memory access. Any subsequent memory access does not use 
pre-decrement addressing.
The following is a reduced example.
The problem exists with any number of memory accesses  1
and at any optimization level.


int test_func_2_1 (int* p, int c)
{
  int r = 0;
  do
  {
r += *--p;
r += *--p;
r += *--p;
  } while (--c);
  return r;
}

compiled with -fno-tree-loop-optimize -Os -m4-single -ml:

mov#0,r0
.L11:
movr4,r1
add#-64,r1
mov.l@(60,r1),r2
dtr5
add#-12,r4
addr2,r0
mov.l@(56,r1),r2
mov.l@(52,r1),r1
addr2,r0
bf/s.L11
addr1,r0
rts
nop


would be better as:

mov#0,r0
.L11:
mov.l@-r4,r1
dtr5
mov.l@-r4,r2
addr1,r0
mov.l@-r4,r3
addr2,r0
bf/s.L11
addr3,r0
rts
nop



Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20111016 (experimental) (GCC)


[Bug target/50751] New: SH Target: Displacement addressing does not work for QImode and HImode

2011-10-16 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

 Bug #: 50751
   Summary: SH Target: Displacement addressing does not work for
QImode and HImode
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de
CC: kkoj...@gcc.gnu.org
Target: sh*-*-*


Displacement addressing is only used for SImode but not
for QImode nor HImode. The following example summarizes the 
problem:

struct X
{
  chara, b, c, d;
  short e, f;
  int   g, h;
};


int test_func_4 (X* x)
{
  return x-b + x-e + x-g;
}

compiled with: -Os -m4-single -ml -S

movr4,r1
add#1,r1
mov.b@r1,r0
add#3,r1
mov.w@r1,r1
addr1,r0
mov.l@(8,r4),r1
rts
addr1,r0

would be better as:


mov.b@(1,r4),r0
movr0,r1
mov.w@(4,r4),r0
addr0,r1
mov.l@(8,r4),r0
addr1,r0
rts
nop


Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20111016 (experimental) (GCC)


[Bug target/50751] SH Target: Displacement addressing does not work for QImode and HImode

2011-10-16 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50751

--- Comment #2 from Oleg Endo oleg.e...@t-online.de 2011-10-17 00:37:42 UTC 
---
(In reply to comment #1)
 This is a known issue.  See the comment just before 
 sh.c:sh_legitimate_index_p.
 Unfortunately, I guess this PR might be marked as WONTFIX.

Yeah, I know this has been around for a while.
I'd like to take my chances anyway :)

Cheers,
Oleg


[Bug target/50694] SH Target: SH2A little endian does not actually work

2011-10-13 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

--- Comment #1 from Oleg Endo oleg.e...@t-online.de 2011-10-13 19:54:31 UTC 
---
As it turns out, this is already handled by the line

#define DRIVER_SELF_SPECS %{m2a:%{ml:%eSH2a does not support little-endian}}

in sh.h.

However, it doesn't catch -m2a-nofpu, -m2a-single, -m2a-single-only options.


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-13 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #24412|0   |1
is obsolete||

--- Comment #11 from Oleg Endo oleg.e...@t-online.de 2011-10-13 22:54:54 UTC 
---
Created attachment 25491
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25491
Proposed patch including test case


  3) only zero_extract special cases
 
 looks to be dominant.

Yes.  I've briefly looked through the test sources.  A popular use case
for bit test instructions seem to be single bit tests, which the patch
is basically adding. 

 I see.  I also expect that the experts have some idea for
 this issue.

Hm .. http://gcc.gnu.org/ml/gcc/2011-10/msg00189.html
Eric pointed me to the i386 back-end.  Unfortunately, what I found there is
where I originally started...

;; Combine likes to form bit extractions for some tests.  Humor it.

I.e. it is also coded against the behavior of the combine pass with a bunch
of pattern variations.  I guess that's the way it's supposed to be done then :T

 I don't think that it's too much.  Those numbers can be easily
 collected for CSiBE.  If your patterns are named, you could
 simply add -dap -save-temps to the compiler option which is
 specified when ruining CSiBE's create-config and then get
 the occurrences of testsi_6, for example, with something like
   grep testsi_6 `find . -name *.s -print` | wc -l
 after running the CSiBE size test.

Ah, right!  With the attached latest patch applied to trunk rev 179778
the numbers for 
  -ml -m4-single -Os -mnomacsave -mpretend-cmove -mfused-madd
-freg-struct-return
look something like that:

tstsi_t: 1391
tsthi_t: 4
tstqi_t: 23
tstqi_t_zero: 667
tstsi_t_and_not: 598
tstsi_t_zero_extract_eq: 70
tstsi_t_zero_extract_xor: 923

Notice that the split contributes to the tstsi_t number.
Also, the 3 patterns 
  tstsi_t_zero_extract_xor
  tstsi_t_zero_extract_subreg_xor_little
  tstsi_t_zero_extract_subreg_xor_big

are basically one and the same. On SH4A the subreg variants are required,
because tstsi_t_zero_extract_xor will never match.

I've also added a special case to sh_rtx_costs to detect at least the tstsi_t
pattern. However, the other patterns are not really covered by that and the 
combine pass calculates the cost as a sum of all the operations of the pattern.
I guess the selection of the test instruction could be stimulated a bit more 
by a more accurate costs calculation, but my feeling is that it won't do a lot.


Cheers,
Oleg


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-10 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #9 from Oleg Endo oleg.e...@t-online.de 2011-10-10 23:48:17 UTC 
---
Created attachment 25461
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25461
CSiBE comparisons

(In reply to comment #8)
 
 Another combine pass to reduce size less than 0.3% on one target
 would be not acceptable, I guess.  

I'm sorry, I forgot to mention that it was just a proof of concept hack
of mine, just to see whether it has any chance to work at all.
I think it would be better to change/fix the behavior of the combine pass
in this regard, so that it tries matching combined patterns without
sophisticated transformations. I will try asking on the gcc list about that.

 ~10 new patterns would be
 overkill for that result, though I'm still expecting that a few
 patterns of them were dominant. 

Yep, even if it turned out to be actually only 8 patterns in total, but
still.. I haven't looked at the issue with SH4A but most likely it would add
another one or two patterns... so basically ~10 :)

 Could you get numbers which pattern
 was used in the former option?

I think it would be a bit too much checking out each individual pattern.
Instead I grouped them into what they are effectively doing.
While I was at it, I also added your peephole idea, and a top 10 listing of
the individual files.


[Bug target/50694] New: SH Target: SH2A little endian does not actually work

2011-10-10 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50694

 Bug #: 50694
   Summary: SH Target: SH2A little endian does not actually work
Classification: Unclassified
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de


While testing a patch I have noticed some failures for SH2A and -ml.
The problem is that GCC happily generates SH2A code in little endian, but
when the output is assembled with GAS and it contains a movi20 instruction,
GAS will abort at some is big endian? check. 

Maybe for SH2A the -ml option should either output a warning and do big endian,
or just abort with an error message?
As far as I know, the SH2A does big endian only anyway...

Cheers,
Oleg


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-10-09 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #7 from Oleg Endo oleg.e...@t-online.de 2011-10-09 23:34:45 UTC 
---
(In reply to comment #6)

 Yep, maintenance burden but I don't mean ack/nak for anything.
 If it's enough fruitful, we should take that route.  When it
 gives 5% improvement in the usual working set like as CSiBE,
 hundreds lines would be OK, but if it's ~0.5% or less, it doesn't
 look worth to add many patterns for that.
 
  Isn't there a way to tell the combine pass not to do so, but instead first 
  look
  deeper at what is in the MD?
 
 I don't know how to do it cleanly.
 

I've tried out a couple of things and got some CSiBE numbers based on 
trunk rev 179430. Unfortunately only code size comparisons, no run time 
performance numbers. The tests were compiled with 
-ml -m4-single -Os -mnomacsave -mpretend-cmove -mfused-madd -freg-struct-return

Option 1)
  Use many (~10) patterns in the MD and some cost calculation tuning.
  The last patch required some adaptation, because the combine pass 
  started trying to match things slightly differently. I've also 
  noticed that it requires a special case for one pattern on SH4A...

  size of all modules: 2916390 - 2909026-7364 / -0.252504 %
  avg difference over all modules: -409.11 / -0.273887 %
  max: compiler   22808 - 22812   +4 / +0.017538 %
  min: libpng-1.2.5   99120 - 97804-1316 / -1.327684 %

Option 2)
  I've added another combine pass which has the make_compound_operation
  function turned off. The make_compound_operation function is used to
  produce zero_extract patterns. If the resulting simplified pattern does
  not match anything in the MD, combine reverts the transformation and 
  proceeds with the next insn. That way, it never tries to match the 
  tst #imm pattern in the MD. 
  With this option only ~5 patterns seem to be required and a small
  extension of the costs calculation.

  size of all modules:  2916390 - 2909170-7220 / -0.247566 %
  avg difference over all modules: -401.11 / -0.254423 %
  max: compiler   22808 - 22812   +4 / +0.017538 %
  min: libpng-1.2.5   99120 - 97836-1284 / -1.295400 %

Not so spectacular on average. It highly depends on the type of SW being
compiled, but it hits quite a lot of files in CSiBE.

Option 2 seems more robust even if it seems less effective, what do you think?


[Bug target/49468] SH Target: inefficient integer abs code

2011-09-25 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #24625|0   |1
is obsolete||

--- Comment #7 from Oleg Endo oleg.e...@t-online.de 2011-09-25 12:48:24 UTC 
---
Created attachment 25360
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25360
Proposed patch

The last version of the patch fails the test
gcc.c-torture/execute/arith-rand-ll.c for -m2a-single -mb and multiple
optimization levels with the following error:

internal compiler error: in change_address_1, at emit-rtl.c:1994


The attached version fixes some of the failures but still fails the test above
with -m2a-single -mb -O2. Other optimization levels work fine.

The problem is caused by the define_insn_and_split *abssi2. It even fails if
the *abssi2 splits into nothing but a simple register copy (movdi) or
comparison insn. 

I'm now testing the patch without the DI abs parts and will submit it if passes
without new failures.

Cheers,
Oleg


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-28 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #24603|0   |1
is obsolete||

--- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-06-29 01:36:46 UTC 
---
Created attachment 24625
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24625
Proposed patch

Added missing constraints to define_insn_and_split.
Fixed the formatting (hopefully ... there seems to be a problem with the tabs.
I'm using 4 spaces wide tabs)


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-26 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

Oleg Endo oleg.e...@t-online.de changed:

   What|Removed |Added

  Attachment #24560|0   |1
is obsolete||

--- Comment #4 from Oleg Endo oleg.e...@t-online.de 2011-06-26 20:48:19 UTC 
---
Created attachment 24603
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24603
Proposed patch

Thanks for checking.
I missed to tell the expanders that the T bit is clobbered by insns such as
negdi2 and abssi2 / absdi2. The negdi2 expander that I have changed caused the
libstdc++ tests to fail when formatting a -1LL.

Another (reduced) example:


long long x (long long i, int j, long long k)
{
if (j  5)
return -i;

return -k;
}

..ended up as (the j  5 got lost)...

mov.l@(4,r15),r1
clrt
mov.l@r15,r2
negcr1,r1
negcr2,r0
bt/s.L8
clrt
negcr5,r1
negcr4,r0
.L8:
rts
nop


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-06-26 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #5 from Oleg Endo oleg.e...@t-online.de 2011-06-26 22:30:05 UTC 
---
 Yes, that peephole doesn't catch all the patterns which could
 make tst #imm8,r0 use.  Perhaps it would be a good idea to get
 numbers for the test like CSiBE test with the vanilla and new
 insns/peepholes patched compilers.  Something covers 80% of
 the possible cases in the usual working set, it would be enough
 successful for such a micro-optimization, I guess.

I'd also like to see some numbers on those micro-improvements. I'll have a look
at CSiBE.
Anyway, why not just add all the currently known-to-work cases? What are your
concerns regarding that? I can imagine that it is a maintenance burden to keep
all those definitions and special cases in the MD up-to-date (bit rot etc). Do
you have anything other than that in mind? 

It seems that others have been trying to solve the same problem in a very
similar way: http://patchwork.ozlabs.org/patch/58832/ ;)

I've figured that the following pattern works quite well for this particular
case. It seems to catch all the bit patterns. (sh_tst_const_ok and
sh_zero_extract_val are added functions in sh.c)

(define_insn_and_split *tstsi3
  [(set (reg:SI T_REG)
(zero_extract:SI (match_operand:SI 0 arith_reg_operand )
(match_operand:SI 1 const_int_operand)
(match_operand:SI 2 const_int_operand)))]
  TARGET_SH1  sh_tst_const_ok (sh_zero_extract_val (operands[1],
operands[2]))
  #
   1
  [(const_int 0)]
  {
int tstval = sh_zero_extract_val (operands[1], operands[2]);
emit_insn (gen_tstsi (operands[0], GEN_INT (tstval)));
DONE;
  }
)

...however, the problem with that one is that the T bit is inverted afterwards,
and thus the following branch logic (or whatever) gets inverted as well. One
option could be to emit some T bit inversion insn after gen_tstsi and then
optimize it away later on with e.g. a peephole (?), but I haven't tried that
yet.

The actual problem is that the combine pass first sees the andsi, eq, ...
insns and correctly matches them to those in the MD. But instead of continuing
to match patterns from the MD it expands the and insn into something like
zero_extract, which in turn will never be combined back into the tst insn.
Isn't there a way to tell the combine pass not to do so, but instead first look
deeper at what is in the MD?


Regarding the peephole:

 (satisfies_constraint_I08 (operands[1])
   || satisfies_constraint_K08 (operands[1])

I guess this might generate wrong code for e.g. if (x  -2). When x has any
bits[31:1] set this must return true. The code after the peephole optimization
will look only at the lower 8 bits and would possibly return false for x =
0xFF00, which is wrong. So it should be satisfies_constraint_K08 only,
shouldn't it?


 
 Cost patch looks fine to me.  Could you propose it as a separate
 patch on gcc-patches list with an appropriate ChangeLog entry?
 When proposing it, please refer how you've tested it.  Also
 the numbers got with the patch are highly welcome.
 
 BTW, do you have FSF copyright assignment for your GCC work?
 Although the cost patch itself is essentially several lines which
 doesn't require copyright assignment, the other changes you've
 proposed clearly require the paper work, I think.

I'll contact you directly regarding that.


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-19 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

--- Comment #2 from Oleg Endo oleg.e...@t-online.de 2011-06-19 15:48:41 UTC 
---
Created attachment 24561
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24561
Before/After Examples


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-06-19 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #3 from Oleg Endo oleg.e...@t-online.de 2011-06-19 16:42:01 UTC 
---
Thanks for having a look at it Kaz.
Yes, the fiddling with the combine pass is fragile. I've tried out your
peephole idea. It works in most cases but not all the time. E.g. it doesn't
catch the following example:

int test_imm (int i)
{
  return (i  3) ? 20 : 40;
}


mov#3,r1
andr1,r4
tstr4,r4
bt/s.L5
mov#40,r0
mov#20,r0
.L5:
rts
nop


Also the following... 

int test_imm (int* i)
{
   return (*i  255) ? 20 : 40;
}

becomes on little endian (OK):
mov.b@r4,r1
tstr1,r1
bt/s.L10
mov#40,r0
mov#20,r0
.L10:
rts
nop

but on big endian (NG):
mov.l@r4,r1
extu.br1,r1
tstr1,r1
bt/s.L10
mov#40,r0
mov#20,r0
.L10:
rts
nop


I'll give the thing another try.

Regarding the change to the andcosts function, the following should be better:

--- sh.c(revision 175188)
+++ sh.c(working copy)
@@ -243,7 +243,7 @@
 static int flow_dependent_p (rtx, rtx);
 static void flow_dependent_p_1 (rtx, const_rtx, void *);
 static int shiftcosts (rtx);
-static int andcosts (rtx);
+static int and_xor_ior_costs (rtx, int code);
 static int addsubcosts (rtx);
 static int multcosts (rtx);
 static bool unspec_caller_rtx_p (rtx);
@@ -2841,14 +2841,14 @@
 return shift_insns[value];
 }

-/* Return the cost of an AND operation.  */
+/* Return the cost of an AND/XOR/IOR operation.  */

 static inline int
-andcosts (rtx x)
+and_xor_ior_costs (rtx x, int code)
 {
   int i;

-  /* Anding with a register is a single cycle and instruction.  */
+  /* register x register is a single cycle instruction.  */
   if (!CONST_INT_P (XEXP (x, 1)))
 return 1;

@@ -2864,17 +2864,17 @@
 }

   /* These constants are single cycle extu.[bw] instructions.  */
-  if (i == 0xff || i == 0x)
+  if ((i == 0xff || i == 0x)  code == AND)
 return 1;
-  /* Constants that can be used in an and immediate instruction in a single
+  /* Constants that can be used in an instruction with an immediate are a
single
  cycle, but this requires r0, so make it a little more expensive.  */
   if (CONST_OK_FOR_K08 (i))
 return 2;
-  /* Constants that can be loaded with a mov immediate and an and.
+  /* Constants that can be loaded with a mov immediate need one more cycle.
  This case is probably unnecessary.  */
   if (CONST_OK_FOR_I08 (i))
 return 2;
-  /* Any other constants requires a 2 cycle pc-relative load plus an and.
+  /* Any other constant requires an additional 2 cycle pc-relative load.
  This case is probably unnecessary.  */
   return 3;
 }
@@ -3043,7 +3043,9 @@
   return true;

 case AND:
-  *total = COSTS_N_INSNS (andcosts (x));
+case XOR:
+case IOR:
+  *total = COSTS_N_INSNS (and_xor_ior_costs (x, code));
   return true;

 case MULT:


[Bug target/49468] SH Target: inefficient integer abs code

2011-06-19 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

--- Comment #1 from Oleg Endo oleg.e...@t-online.de 2011-06-19 15:29:45 UTC 
---
Created attachment 24560
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24560
Proposed patch

The patch adds explicit handling of abs:SI and abs:DI to the machine
description instead of relying on the default abs handling. On SH4 the
zero-offset branch way is definitely faster. It reduces pressure on EX group
instructions and improves parallel instruction execution. On other than SH4
zero-offset branches are not as fast, but the resulting code should still be
faster than the default branch-free abs code, at least it is more compact. 
The patch also handles the case of neg (abs (...)) by simply inverting the
branch condition.

Feedback appreciated.


[Bug target/49468] New: SH Target: inefficient integer abs code

2011-06-19 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49468

   Summary: SH Target: inefficient integer abs code
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de


The generated code for abs:SI and abs:DI is a bit inefficient:

int abs (int i)
{
  return (i  0) ? -i : i;
}

movr4,r1
shllr1
subcr1,r1
movr1,r0
xorr4,r0
rts
subr1,r0


long long abs (long long i)
{
  return (i  0) ? -i : i;
}

movr4,r3
shllr3
subcr3,r3
movr5,r1
xorr3,r1
movr3,r0
clrt
xorr4,r0
subcr3,r1
rts
subcr3,r0


There is a define_split in sh.md which is supposed to handle the special case
for SH4 but it is not doing anything. The problem has been around since a
couple of GCC 4.x versions. 



sh-elf-gcc -v
Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --without-headers
--disable-nls --disable-werror --enable-lto --with-newlib --with-gnu-as
--with-gnu-ld --with-system-zlib : (reconfigured) ../gcc-trunk/configure
--target=sh-elf --prefix=/usr/local --enable-languages=c,c++ --enable-multilib
--disable-libssp --without-headers --disable-nls --disable-werror --enable-lto
--with-newlib --with-gnu-as --with-gnu-ld --with-system-zlib : (reconfigured)
../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --without-headers
--disable-nls --disable-werror --enable-lto --with-newlib --with-gnu-as
--with-gnu-ld --with-system-zlib : (reconfigured) ../gcc-trunk/configure
--target=sh-elf --prefix=/usr/local --enable-multilib --disable-libssp
--without-headers --disable-nls --disable-werror --enable-lto --with-newlib
--with-gnu-as --with-gnu-ld --with-system-zlib target_alias=sh-elf CFLAGS=-Os
CXXFLAGS=-Os --enable-languages=c,c++,lto --no-create --no-recursion
Thread model: single
gcc version 4.7.0 20110619 (experimental) (GCC)


[Bug target/49305] New: SH Target: internal compiler error: in reload_cse_simplify_operands, at postreload.c:403

2011-06-06 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49305

   Summary: SH Target:  internal compiler error: in
reload_cse_simplify_operands, at postreload.c:403
   Product: gcc
   Version: 4.7.0
Status: UNCONFIRMED
  Severity: major
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de


When building newlib 1.19.0 it fails to build the SH2A lib:

sh-elf-gcc -B/home/y/code/gcc/newlib-1.19.0-build-sh-elf/sh-elf/m2a/newlib/
-isystem
/home/y/code/gcc/newlib-1.19.0-build-sh-elf/sh-elf/m2a/newlib/targ-include
-isystem /home/y/code/gcc/newlib-1.19.0/newlib/libc/include
-B/home/y/code/gcc/newlib-1.19.0-build-sh-elf/sh-elf/m2a/libgloss/sh
-L/home/y/code/gcc/newlib-1.19.0-build-sh-elf/sh-elf/m2a/libgloss/libnosys
-L/home/y/code/gcc/newlib-1.19.0/libgloss/sh  -m2a -DPACKAGE_NAME=\newlib\
-DPACKAGE_TARNAME=\newlib\ -DPACKAGE_VERSION=\1.19.0\
-DPACKAGE_STRING=\newlib\ 1.19.0\ -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\
-I. -I../../../../../../newlib-1.19.0/newlib/libc/stdio -fno-builtin  -Os
-mnomacsave -mpretend-cmove -mfused-madd -freg-struct-return  -m2a -c
../../../../../../newlib-1.19.0/newlib/libc/stdio/vfwscanf.c -o
lib_a-vfwscanf.o
../../../../../../newlib-1.19.0/newlib/libc/stdio/vfwscanf.c: In function
'__svfwscanf_r':
../../../../../../newlib-1.19.0/newlib/libc/stdio/vfwscanf.c:1454:1: error:
insn does not satisfy its constraints:
(insn 2358 4549 2359 471 (set (reg:SI 0 r0 [1023])
(sign_extend:SI (mem/s:HI (plus:SI (reg:SI 2 r2)
(const_int 12 [0xc])) [5 fp_167(D)-_flags+0 S2 A32])))
../../../../../../newlib-1.19.0/newlib/libc/stdio/vfwscanf.c:1447 164
{*extendqisi2_compact}
 (nil))
../../../../../../newlib-1.19.0/newlib/libc/stdio/vfwscanf.c:1454:1: internal
compiler error: in reload_cse_simplify_operands, at postreload.c:403



sh-elf-gcc -v
Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --without-headers
--disable-nls --disable-werror --enable-lto --with-newlib --with-gnu-as
--with-gnu-ld --with-system-zlib
Thread model: single
gcc version 4.7.0 20110606 (experimental) (GCC)


[Bug target/49263] New: SH Target: underutilized TST #imm, R0 instruction

2011-06-01 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

   Summary: SH Target: underutilized TST #imm, R0 instruction
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: oleg.e...@t-online.de


Created attachment 24411
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24411
test for various integer types and constant values 0...255

The TST #imm, R0 instruction is a bit underutilized on SH targets. For some
bit patterns of the immediate constant it tries to extract the bits in question
by various means and test the result against zero/non-zero and misses the
straight forward instruction.
In particular: 
* one single bit 
* n contiguous bits starting at bit 0

When testing a byte against 0x80 it uses CMP/PZ, which is as good as a TST
instruction (in terms of costs), so this is OK. 

I've spotted this in version around 4.4. It still happens with 4.6 and the
following config:

Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.6.1/lto-wrapper
Target: sh-elf
Configured with: ../gcc-4.6-20110527/configure --target=sh-elf
--prefix=/usr/local --enable-languages=c,c++ --enable-multilib --disable-libssp
--without-headers --disable-nls --disable-werror --enable-lto --with-newlib
--with-gnu-as --with-gnu-ld --with-system-zlib
Thread model: single
gcc version 4.6.1 20110527 (prerelease) (GCC)


[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction

2011-06-01 Thread oleg.e...@t-online.de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #1 from Oleg Endo oleg.e...@t-online.de 2011-06-01 20:42:00 UTC 
---
Created attachment 24412
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24412
Proposed Patch

Although the patch gets the job done, programmer's sense tells me it is fishy,
or at least pretty much brute forced cure of the symptoms, not the cause. It's
my first GCC patch, so any feedback is highly appreciated.

What I did was looking at the RTL, in particular the combine pass, identifying
patterns it failed to find a shortcut (tst insn) for and adding them to the
insn descriptions. 

I also had to expand the costs calculation of the AND instruction to cover AND,
OR and XOR (on SH they are the same anyways), or else the cost of a matched
replacement insn would result in a rejection in the combine pass.