Re: Switch statement optimization

2022-04-19 Thread Martin Liška

On 4/18/22 16:41, Paul Koning via Gcc wrote:

In switch statements with dense case values, the typical result is a jump 
table, which is fast.  If the values are sparse, a tree of compares is 
generated instead.

What if nearly all cases are dense but there are a few outliers?  An example appears in 
the NFS protocol parser, where you get a switch statement with cases for each of the 
opcode values.  All but one are small integers assigned in sequence, but one is 10044.  
So the "sparse" case kicks in and a compare tree is generated for everything.

I can avoid this by putting in special case code for the 10044 case, then all the rest 
ends up being a jump table.  That brings up the question if GCC should recognize such 
scenarios and break up the switch statement into "dense parts" handled by a 
jump table, leaving the sorting between those as a compare tree.


Hello.

We currently support identification of such dense/interesting groups of case 
values (we name them clusters).
See e.g. gcc/testsuite/gcc.dg/tree-ssa/switch-1.c where you can have a decision 
tree that contains
mix of individual cases, jump-tables (JT) and bit-tests (BT).

Can you please share your test-case so that I can analyze it?
You can investigate with -fdump-tree-switchlower1.

Cheers.
Martin



paul





Switch statement optimization

2022-04-18 Thread Paul Koning via Gcc
In switch statements with dense case values, the typical result is a jump 
table, which is fast.  If the values are sparse, a tree of compares is 
generated instead.

What if nearly all cases are dense but there are a few outliers?  An example 
appears in the NFS protocol parser, where you get a switch statement with cases 
for each of the opcode values.  All but one are small integers assigned in 
sequence, but one is 10044.  So the "sparse" case kicks in and a compare tree 
is generated for everything.

I can avoid this by putting in special case code for the 10044 case, then all 
the rest ends up being a jump table.  That brings up the question if GCC should 
recognize such scenarios and break up the switch statement into "dense parts" 
handled by a jump table, leaving the sorting between those as a compare tree.

paul



[Bug tree-optimization/70604] switch statement optimization creates dead code

2016-04-13 Thread jpoimboe at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604

Josh Poimboeuf  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #4 from Josh Poimboeuf  ---
Richard, just realized you weren't on CC for my response to your question.

[Bug tree-optimization/70604] switch statement optimization creates dead code

2016-04-12 Thread jpoimboe at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604

--- Comment #3 from Josh Poimboeuf  ---
Hi Richard, thanks for looking at it!

(In reply to Richard Biener from comment #2)
> Are the cases you still see indirect jumps to only one active case or
> is it just that if()s with multiple cases would have avoided the dead
> code?

I don't quite understand the second part of your question, but maybe this
information will answer it.

With GCC 6 I've only seen one occurrence of this issue, which is the one for
which I posted the assembler and .i file above.  It has an indirect jump which
is hard-coded to use only one entry in the jump table, as shown in the
assembler code above.

Note that the C code has two switch statements, which seem to correspond to the
two "normal" cases of indirect jump table patterns.  But then there's the third
unusual indirect jump, shown above, which also corresponds to one of the two
switch statements -- so there are two jump tables for a single switch
statement, where one of the tables appears to be optimized for a single case. 
Hopefully I'm making sense :-)

With GCC 5, the other occurrences of this issue were very similar, with switch
statements and indirect jumps to a single entry in the jump table.

[Bug tree-optimization/70604] switch statement optimization creates dead code

2016-04-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-04-12
 CC||jamborm at gcc dot gnu.org
  Component|c   |tree-optimization
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
For GCC 6 stronger value-numbering might have fixed most cases.

Are the cases you still see indirect jumps to only one active case or
is it just that if()s with multiple cases would have avoided the dead
code?

I can see

  :
  _139 = [(void *)cmd_84(D) + 930B];
  _141 = conn_140(D)->sess;
  _142 = _141->se_sess;
  _143 = _84(D)->se_cmd;
  transport_init_se_cmd (_143, _ops, _142, 0, 3, 32, _139);
  target_get_sess_cmd (_143, 1);
  switch (_58) , case 1: , case 2: , case 3: ,
case 4: , case 5: , case 6: , case 7: >

:
  goto  ();

:
  goto  ();

:
  goto  ();

:
  goto  ();

:
  goto  ();

:
  _146 = (int) _58;
  printk ("\13Unknown iSCSI TMR Function: 0x%02x\n", _146);
  _351 = iscsit_add_reject_from_cmd (cmd_84(D), 10, 1, buf_55(D)); [tail call]
  goto ;

:

  # prephitmp_14 = PHI <1(18), 2(11), 3(12), 4(13), 5(14), 6(15), 7(16)>
:


with the default case looping back.  switch conversion should probably handle
this but is run too early.  OTOH it seems to be confused by the default
case not falling thru, failing to see the "simple" transform to a

  if (... >= 1 && ... <= 7)
   ;
  else
   old default;

beginning to process the following SWITCH statement
(drivers/target/iscsi/iscsi_target.c:1807) : ---
switch (_74) , case 1: , case 2: , case 3: ,
case 4: , case 5: , case 6: , case 7: >

Bailing out - no common successor to all case label target blocks found

beginning to process the following SWITCH statement
(drivers/target/iscsi/iscsi_target.c:1866) : ---
switch (_74) , case 1: , case 2 ... 5: , case 6:
, case 7: , case 8: >

Bailing out - no common successor to all case label target blocks found



Not sure what the other cases are.

[Bug c/70604] switch statement optimization creates dead code

2016-04-08 Thread jpoimboe at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604

--- Comment #1 from Josh Poimboeuf  ---
Created attachment 38227
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38227=edit
Linux .config

[Bug c/70604] New: switch statement optimization creates dead code

2016-04-08 Thread jpoimboe at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604

Bug ID: 70604
   Summary: switch statement optimization creates dead code
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jpoimboe at redhat dot com
  Target Milestone: ---

Created attachment 38226
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38226=edit
iscsi_target.i.gz

The linux kernel has a new tool named "objtool" which follows all possible code
paths for every .o file, looking for abnormalities.  In rare cases it has
stumbled on a gcc optimization issue related to switch statements, which leaves
some dead code (unreachable instructions) laying around, resulting in increased
code size and confusing assembler code.

Note: For some reason the likelihood of seeing this problem seems to have
diminished from gcc 5.3.1 to gcc 6.0.  We've seen the problem in three separate
object files in the kernel with gcc 5.3.1, but with gcc 6.0 there's only one
known occurrence of this issue.  Further, even in that one case, the size of
the dead code has decreased from several instructions in 5.3.1 to only a single
unreachable instruction in 6.0.

Here are the relevant assembler code excerpts, using gcc 6.0:

iscsit_handle_task_mgt_cmd:

[...snip...]

jmp *.L6212+64(%rip)#
.section.rodata
.align 8
.align 4
.L6212:
.quad   .L6211
.quad   .L6070
.quad   .L6214
.quad   .L6214
.quad   .L6214
.quad   .L6214
.quad   .L6082
.quad   .L6089
.quad   .L6096
.text

[...snip...]

jmp .L6196  #
.L6211:
movl$8, %esi#, _315
.L6213:
movq$.LC48, %rdi#,


There's an indirect jump instruction in .text, along with a jump table in
.rodata, which is a common pattern for switch statements.  But this one's a
little different than the normal pattern: the indirect jump is hard-coded to
use a single entry in the jump table.  The other jump table entries are unused.
 Further, the code referenced in one of the entries (.L6211) is dead code and
completely unreachable from anywhere in the function.

Note that the -fsanitize=unreachable option is enabled, but this seems
unrelated: __builtin_unreachable() doesn't appear to have been used for this
code path, and the unreachable code block doesn't have a call to
__ubsan_handle_builtin_unreachable.

So to summarize, the issues are:

1) dead code in .text (in this case, the function only has one unreachable
instruction)

2) mostly unused (and completely unnecessary) jump table in .rodata

3) unnecessary indirect jump (a direct jump could be used instead)


$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.0.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=gnu --enable-plugin --enable-initfini-array
--disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 6.0.0 20160311 (Red Hat 6.0.0-0.17) (GCC) 

$ gcc -Wp,-MD,drivers/target/iscsi/.iscsi_target.o.d  -nostdinc
-I./arch/x86/include -Iarch/x86/include/generated/uapi
-Iarch/x86/include/generated  -Iinclude -I./arch/x86/include/uapi
-Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi
-include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse
-mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1
-mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup
-mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time
-maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
-DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
-DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1
-DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables
-fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
-Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable
-fomit-frame-pointer -fno-var-tracking-assignments
-Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow
-fconserve-stack -Werror

[Bug testsuite/48809] [4.4 Regression] switch statement optimization error

2011-05-04 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-04 
09:19:10 UTC ---
Author: jakub
Date: Wed May  4 09:19:07 2011
New Revision: 173358

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173358
Log:
Backport from mainline
2011-04-30  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/48809
* tree-switch-conversion.c (build_arrays): Compute tidx in unsigned
type.
(gen_inbound_check): Don't compute index_expr - range_min in utype
again, instead reuse SSA_NAME initialized in build_arrays.
Remove two useless gsi_for_stmt calls.

* gcc.c-torture/execute/pr48809.c: New test.

Added:
branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c
Modified:
branches/gcc-4_4-branch/gcc/ChangeLog
branches/gcc-4_4-branch/gcc/testsuite/ChangeLog
branches/gcc-4_4-branch/gcc/tree-switch-conversion.c


[Bug testsuite/48809] [4.4 Regression] switch statement optimization error

2011-05-04 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED

--- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-04 
09:33:21 UTC ---
Fixed.


[Bug testsuite/48809] [4.4/4.5 Regression] switch statement optimization error

2011-05-03 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-03 
16:37:19 UTC ---
Author: jakub
Date: Tue May  3 16:37:12 2011
New Revision: 173328

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173328
Log:
Backport from mainline
2011-04-30  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/48809
* tree-switch-conversion.c (build_arrays): Compute tidx in unsigned
type.
(gen_inbound_check): Don't compute index_expr - range_min in utype
again, instead reuse SSA_NAME initialized in build_arrays.
Remove two useless gsi_for_stmt calls.

* gcc.c-torture/execute/pr48809.c: New test.

Added:
branches/gcc-4_5-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c
Modified:
branches/gcc-4_5-branch/gcc/ChangeLog
branches/gcc-4_5-branch/gcc/testsuite/ChangeLog
branches/gcc-4_5-branch/gcc/tree-switch-conversion.c


[Bug testsuite/48809] [4.4 Regression] switch statement optimization error

2011-05-03 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[4.4/4.5 Regression] switch |[4.4 Regression] switch
   |statement optimization  |statement optimization
   |error   |error

--- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-03 
17:17:57 UTC ---
Fixed for 4.5.4+ too.


[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error

2011-04-30 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 
06:54:06 UTC ---
Author: jakub
Date: Sat Apr 30 06:54:02 2011
New Revision: 173207

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173207
Log:
PR tree-optimization/48809
* tree-switch-conversion.c (build_arrays): Compute tidx in unsigned
type.
(gen_inbound_check): Don't compute index_expr - range_min in utype
again, instead reuse SSA_NAME initialized in build_arrays.
Remove two useless gsi_for_stmt calls.

* gcc.c-torture/execute/pr48809.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/execute/pr48809.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-switch-conversion.c


[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error

2011-04-30 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 
06:55:15 UTC ---
Author: jakub
Date: Sat Apr 30 06:55:11 2011
New Revision: 173208

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173208
Log:
PR tree-optimization/48809
* tree-switch-conversion.c (build_arrays): Compute tidx in unsigned
type.
(gen_inbound_check): Don't compute index_expr - range_min in utype
again, instead reuse SSA_NAME initialized in build_arrays.
Remove two useless gsi_for_stmt calls.

* gcc.c-torture/execute/pr48809.c: New test.

Added:
branches/gcc-4_6-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c
Modified:
branches/gcc-4_6-branch/gcc/ChangeLog
branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
branches/gcc-4_6-branch/gcc/tree-switch-conversion.c


[Bug c++/48809] [4.4/4.5 Regression] switch statement optimization error

2011-04-30 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

Summary|[4.4/4.5/4.6/4.7|[4.4/4.5 Regression] switch
   |Regression] switch  |statement optimization
   |statement optimization  |error
   |error   |

--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 
07:35:10 UTC ---
Fixed for 4.6+ so far.


[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error

2011-04-29 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 AssignedTo|unassigned at gcc dot   |jakub at gcc dot gnu.org
   |gnu.org |

--- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-29 
13:38:08 UTC ---
Created attachment 24145
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24145
gcc46-pr48809.patch

Untested fix.


[Bug c++/48809] New: switch statement optimization error

2011-04-28 Thread andrewcmartin at msn dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

   Summary: switch statement optimization error
   Product: gcc
   Version: 4.5.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: andrewcmar...@msn.com


Created attachment 24131
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24131
C++ source

The code below should return the answer 18.  However, built with g++ v4.5.2
and -O2, it returns -386954936 or some other random value.  Removing the line
case -62:... from the switch statement should have no effect, but the
resulting optimized code will execute correctly.

---

#include iostream
int switch_test(signed char code)
{
  int val = 0;
  switch (code) {
case   0: val = 1;  break;
case   1: val = 2;  break;
case   2: val = 3;  break;
case   3: val = 4;  break;
case   4: val = 5;  break;
case   5: val = 6;  break;
case   6: val = 7;  break;
case   7: val = 8;  break;
case   8: val = 9;  break;
case   9: val = 10; break;
case  10: val = 11; break;
case  11: val = 12; break;
case  12: val = 13; break;
case  13: val = 14; break;
case  14: val = 15; break;
case  15: val = 16; break;
case  16: val = 17; break;
case  98: val = 18; break;
case -62: val = 19; break;
  }
  return val;
}

int main(int argc, char* argv[])
{
  const char input = 98;
  int return_val = switch_test(input);
  std::cout  input:input  std::endl;
  std::cout  return:   return_val  std::endl;
}

---

Command line to trigger the error:

[amartin@jeanie:~]$ g++ -O2 -save-temps switch_test.cc  ./a.out
input:  b
return: -386954936

If we don't optimize we don't get the error:

[amartin@jeanie:~]$ g++ switch_test.cc  ./a.out
input:  b
return: 18


Compiler information:

[amartin@jeanie:~]$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/libexec/gcc/x86_64-unknown-linux-gnu/4.5.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure
--prefix=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/
--with-gmp=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/
--with-mpfr=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/
--with-mpc=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/
--with-ppl=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/
--with-cloog=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --enable-static
--enable-languages=c,c++,objc,obj-c++
LDFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib
-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64
-I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include '
CXXFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib
-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64
-I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include '
CFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib
-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64
-I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include'
Thread model: posix
gcc version 4.5.2 (GCC)

[amartin@jeanie:~]$ uname -a
Linux jeanie 2.6.9-78.0.22.ELsmp #1 SMP Thu Apr 30 19:17:40 EDT 2009 x86_64
x86_64 x86_64 GNU/Linux

I have also observed this error in g++ 4.4.x.


[Bug c++/48809] switch statement optimization error

2011-04-28 Thread andrewcmartin at msn dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #2 from andrewcmartin at msn dot com 2011-04-28 18:33:40 UTC ---
Created attachment 24133
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24133
-save-temps .o file


[Bug c++/48809] switch statement optimization error

2011-04-28 Thread andrewcmartin at msn dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #1 from andrewcmartin at msn dot com 2011-04-28 18:33:10 UTC ---
Created attachment 24132
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24132
-save-temps .ii file


[Bug c++/48809] switch statement optimization error

2011-04-28 Thread andrewcmartin at msn dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

--- Comment #3 from andrewcmartin at msn dot com 2011-04-28 18:34:10 UTC ---
Created attachment 24134
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24134
-save-temps .s file


[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error

2011-04-28 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2011.04.28 18:50:45
 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |4.4.7
Summary|switch statement|[4.4/4.5/4.6/4.7
   |optimization error  |Regression] switch
   ||statement optimization
   ||error
 Ever Confirmed|0   |1

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-28 
18:50:45 UTC ---
Confirmed, goes away with -fno-tree-switch-conversion.

extern void abort (void);

int
foo (signed char x)
{
  int y = 0;
  switch (x)
{
case 0: y = 1; break;
case 1: y = 2; break;
case 2: y = 3; break;
case 3: y = 4; break;
case 4: y = 5; break;
case 5: y = 6; break;
case 6: y = 7; break;
case 7: y = 8; break;
case 8: y = 9; break;
case 9: y = 10; break;
case 10: y = 11; break;
case 11: y = 12; break;
case 12: y = 13; break;
case 13: y = 14; break;
case 14: y = 15; break;
case 15: y = 16; break;
case 16: y = 17; break;
case 98: y = 18; break;
case -62: y = 19; break;
}
  return y;
}

int
main ()
{
  signed char x = 98;
  if (foo (x) != 18)
abort ();
}

Will look at it tomorrow.


Re: Switch statement optimization

2006-07-11 Thread Christian Hildner

Joern RENNECKE schrieb:


In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote:

We could use a cheap hash and base start compare / branch trees in 
every hash bin.
Say we have a 16 k range, 200 nodes, and want to keep the hash 
bin/node ratio between 2 and 4.

Let i be the switch argument.  Then we can calculate a 9 bit hash as
(i ^ (x  n))  0x3fff, where n is a value between 5 and 9.  We 
can pick the one which produces the flattest

average search trees.
Note that we no longer need to check that i is in range, as for 
ordinary switch dispatch tables.  


What I suggest is to implement a cost-estimation based decision. This 
should include an architecture dependent modelling that could deliver an 
exact cost (in terms of size) for size-optimization and as well a good 
approximation for speed. For example in my case of 1500 labels inside 
the 16K range I would perfer a branch table instead of a binary compare 
tree. The size of the table is 16K * 4/8 bytes and this is really ok for 
a huge program. The frequently used entries are likely to be in the 
cache, so I expect the whole thing to perform well. However if I would 
optimize it for my case that may not help in general? So all 
modifications should be verified with a common standard benchmark like 
SPEC. Is there someone on the list who wants to support me with these 
tests or may deliver models for common architectures?


Christian



Switch statement optimization

2006-07-10 Thread Christian Hildner

Hi,

while dealing with autogenerated code I found that GCC often outputs compare 
trees instead of branch tables for switch statements. I found that GCC uses the 
density of case labels within their spanned range as the key criterion. If the 
density of labels is smaller than 10% (33% for size optimization) then GCC uses 
compare instruction rather than a branch table. What I am wondering is why no 
real cost estimation is done here. In my case there are around 200 to 1500 
labels within the 16K range, so they are completely resolved to compares. In 
contrast the Intel compiler outputs branch tables more likely.

The question now is whether the factors 10 (3 for size opt) have been found 
using any standard benchmark. And shouldn't at least the factor be optimized 
for a standard benchmark (like the upcoming SPEC CPU 2006 INT) if this simple 
linear method is used for modeling a logarithmic behavior? Or should a cost 
estimation be introduced?

For size optimization there could be a quite accurate architecture dependent 
size estimation.

Opinions, Comments?

Christian

PS: Please CC me.




Re: Switch statement optimization

2006-07-10 Thread Ben Elliston
A paper at this year's GCC Summit talked about this:

  http://www.gccsummit.org/2006/view_abstract.php?content_key=18

You might like to follow up with Edmar (the author of the paper).

Cheers, Ben


Re: Switch statement optimization

2006-07-10 Thread Joern RENNECKE

In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote:


A paper at this year's GCC Summit talked about this:
  http://www.gccsummit.org/2006/view_abstract.php?content_key=18
You might like to follow up with Edmar (the author of the paper).


But that was about optimizing the trees for an uneven probability distribution 
which could be found by profiling.
This does not address the issue what to do when the distribution is mostly 
uniform or unknown.

We could use a cheap hash and base start compare / branch trees in every hash 
bin.
Say we have a 16 k range, 200 nodes, and want to keep the hash bin/node ratio 
between 2 and 4.
Let i be the switch argument.  Then we can calculate a 9 bit hash as
(i ^ (x  n))  0x3fff, where n is a value between 5 and 9.  We can pick 
the one which produces the flattest
average search trees.
Note that we no longer need to check that i is in range, as for ordinary switch 
dispatch tables.

Moreover, on a target that can do effectively multi-way compares like the x86, 
in order
to increase ILP, we can put into the table entry for each hash bin, in addition 
to the
jump address, a value to compare i against before the dispatch jump takes place.
If a cheap hash can be chosen so that there is no more than one entry per hash 
bin, we
can even do a single compare in the dispatch code, go to the default case for 
non-match,
and dispatch right to the handling code if we have a match.