Re: Switch statement optimization
On 4/18/22 16:41, Paul Koning via Gcc wrote: In switch statements with dense case values, the typical result is a jump table, which is fast. If the values are sparse, a tree of compares is generated instead. What if nearly all cases are dense but there are a few outliers? An example appears in the NFS protocol parser, where you get a switch statement with cases for each of the opcode values. All but one are small integers assigned in sequence, but one is 10044. So the "sparse" case kicks in and a compare tree is generated for everything. I can avoid this by putting in special case code for the 10044 case, then all the rest ends up being a jump table. That brings up the question if GCC should recognize such scenarios and break up the switch statement into "dense parts" handled by a jump table, leaving the sorting between those as a compare tree. Hello. We currently support identification of such dense/interesting groups of case values (we name them clusters). See e.g. gcc/testsuite/gcc.dg/tree-ssa/switch-1.c where you can have a decision tree that contains mix of individual cases, jump-tables (JT) and bit-tests (BT). Can you please share your test-case so that I can analyze it? You can investigate with -fdump-tree-switchlower1. Cheers. Martin paul
Switch statement optimization
In switch statements with dense case values, the typical result is a jump table, which is fast. If the values are sparse, a tree of compares is generated instead. What if nearly all cases are dense but there are a few outliers? An example appears in the NFS protocol parser, where you get a switch statement with cases for each of the opcode values. All but one are small integers assigned in sequence, but one is 10044. So the "sparse" case kicks in and a compare tree is generated for everything. I can avoid this by putting in special case code for the 10044 case, then all the rest ends up being a jump table. That brings up the question if GCC should recognize such scenarios and break up the switch statement into "dense parts" handled by a jump table, leaving the sorting between those as a compare tree. paul
[Bug tree-optimization/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 Josh Poimboeuf changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #4 from Josh Poimboeuf --- Richard, just realized you weren't on CC for my response to your question.
[Bug tree-optimization/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 --- Comment #3 from Josh Poimboeuf --- Hi Richard, thanks for looking at it! (In reply to Richard Biener from comment #2) > Are the cases you still see indirect jumps to only one active case or > is it just that if()s with multiple cases would have avoided the dead > code? I don't quite understand the second part of your question, but maybe this information will answer it. With GCC 6 I've only seen one occurrence of this issue, which is the one for which I posted the assembler and .i file above. It has an indirect jump which is hard-coded to use only one entry in the jump table, as shown in the assembler code above. Note that the C code has two switch statements, which seem to correspond to the two "normal" cases of indirect jump table patterns. But then there's the third unusual indirect jump, shown above, which also corresponds to one of the two switch statements -- so there are two jump tables for a single switch statement, where one of the tables appears to be optimized for a single case. Hopefully I'm making sense :-) With GCC 5, the other occurrences of this issue were very similar, with switch statements and indirect jumps to a single entry in the jump table.
[Bug tree-optimization/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2016-04-12 CC||jamborm at gcc dot gnu.org Component|c |tree-optimization Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- For GCC 6 stronger value-numbering might have fixed most cases. Are the cases you still see indirect jumps to only one active case or is it just that if()s with multiple cases would have avoided the dead code? I can see : _139 = [(void *)cmd_84(D) + 930B]; _141 = conn_140(D)->sess; _142 = _141->se_sess; _143 = _84(D)->se_cmd; transport_init_se_cmd (_143, _ops, _142, 0, 3, 32, _139); target_get_sess_cmd (_143, 1); switch (_58) , case 1: , case 2: , case 3: , case 4: , case 5: , case 6: , case 7: > : goto (); : goto (); : goto (); : goto (); : goto (); : _146 = (int) _58; printk ("\13Unknown iSCSI TMR Function: 0x%02x\n", _146); _351 = iscsit_add_reject_from_cmd (cmd_84(D), 10, 1, buf_55(D)); [tail call] goto ; : # prephitmp_14 = PHI <1(18), 2(11), 3(12), 4(13), 5(14), 6(15), 7(16)> : with the default case looping back. switch conversion should probably handle this but is run too early. OTOH it seems to be confused by the default case not falling thru, failing to see the "simple" transform to a if (... >= 1 && ... <= 7) ; else old default; beginning to process the following SWITCH statement (drivers/target/iscsi/iscsi_target.c:1807) : --- switch (_74) , case 1: , case 2: , case 3: , case 4: , case 5: , case 6: , case 7: > Bailing out - no common successor to all case label target blocks found beginning to process the following SWITCH statement (drivers/target/iscsi/iscsi_target.c:1866) : --- switch (_74) , case 1: , case 2 ... 5: , case 6: , case 7: , case 8: > Bailing out - no common successor to all case label target blocks found Not sure what the other cases are.
[Bug c/70604] switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 --- Comment #1 from Josh Poimboeuf --- Created attachment 38227 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38227=edit Linux .config
[Bug c/70604] New: switch statement optimization creates dead code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70604 Bug ID: 70604 Summary: switch statement optimization creates dead code Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jpoimboe at redhat dot com Target Milestone: --- Created attachment 38226 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38226=edit iscsi_target.i.gz The linux kernel has a new tool named "objtool" which follows all possible code paths for every .o file, looking for abnormalities. In rare cases it has stumbled on a gcc optimization issue related to switch statements, which leaves some dead code (unreachable instructions) laying around, resulting in increased code size and confusing assembler code. Note: For some reason the likelihood of seeing this problem seems to have diminished from gcc 5.3.1 to gcc 6.0. We've seen the problem in three separate object files in the kernel with gcc 5.3.1, but with gcc 6.0 there's only one known occurrence of this issue. Further, even in that one case, the size of the dead code has decreased from several instructions in 5.3.1 to only a single unreachable instruction in 6.0. Here are the relevant assembler code excerpts, using gcc 6.0: iscsit_handle_task_mgt_cmd: [...snip...] jmp *.L6212+64(%rip)# .section.rodata .align 8 .align 4 .L6212: .quad .L6211 .quad .L6070 .quad .L6214 .quad .L6214 .quad .L6214 .quad .L6214 .quad .L6082 .quad .L6089 .quad .L6096 .text [...snip...] jmp .L6196 # .L6211: movl$8, %esi#, _315 .L6213: movq$.LC48, %rdi#, There's an indirect jump instruction in .text, along with a jump table in .rodata, which is a common pattern for switch statements. But this one's a little different than the normal pattern: the indirect jump is hard-coded to use a single entry in the jump table. The other jump table entries are unused. Further, the code referenced in one of the entries (.L6211) is dead code and completely unreachable from anywhere in the function. Note that the -fsanitize=unreachable option is enabled, but this seems unrelated: __builtin_unreachable() doesn't appear to have been used for this code path, and the unreachable code block doesn't have a call to __ubsan_handle_builtin_unreachable. So to summarize, the issues are: 1) dead code in .text (in this case, the function only has one unreachable instruction) 2) mostly unused (and completely unnecessary) jump table in .rodata 3) unnecessary indirect jump (a direct jump could be used instead) $ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/6.0.0/lto-wrapper Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --disable-libgcj --with-isl --enable-libmpx --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 6.0.0 20160311 (Red Hat 6.0.0-0.17) (GCC) $ gcc -Wp,-MD,drivers/target/iscsi/.iscsi_target.o.d -nostdinc -I./arch/x86/include -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi -I./include/uapi -Iinclude/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror
[Bug testsuite/48809] [4.4 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-04 09:19:10 UTC --- Author: jakub Date: Wed May 4 09:19:07 2011 New Revision: 173358 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173358 Log: Backport from mainline 2011-04-30 Jakub Jelinek ja...@redhat.com PR tree-optimization/48809 * tree-switch-conversion.c (build_arrays): Compute tidx in unsigned type. (gen_inbound_check): Don't compute index_expr - range_min in utype again, instead reuse SSA_NAME initialized in build_arrays. Remove two useless gsi_for_stmt calls. * gcc.c-torture/execute/pr48809.c: New test. Added: branches/gcc-4_4-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c Modified: branches/gcc-4_4-branch/gcc/ChangeLog branches/gcc-4_4-branch/gcc/testsuite/ChangeLog branches/gcc-4_4-branch/gcc/tree-switch-conversion.c
[Bug testsuite/48809] [4.4 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-04 09:33:21 UTC --- Fixed.
[Bug testsuite/48809] [4.4/4.5 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-03 16:37:19 UTC --- Author: jakub Date: Tue May 3 16:37:12 2011 New Revision: 173328 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173328 Log: Backport from mainline 2011-04-30 Jakub Jelinek ja...@redhat.com PR tree-optimization/48809 * tree-switch-conversion.c (build_arrays): Compute tidx in unsigned type. (gen_inbound_check): Don't compute index_expr - range_min in utype again, instead reuse SSA_NAME initialized in build_arrays. Remove two useless gsi_for_stmt calls. * gcc.c-torture/execute/pr48809.c: New test. Added: branches/gcc-4_5-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c Modified: branches/gcc-4_5-branch/gcc/ChangeLog branches/gcc-4_5-branch/gcc/testsuite/ChangeLog branches/gcc-4_5-branch/gcc/tree-switch-conversion.c
[Bug testsuite/48809] [4.4 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Summary|[4.4/4.5 Regression] switch |[4.4 Regression] switch |statement optimization |statement optimization |error |error --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2011-05-03 17:17:57 UTC --- Fixed for 4.5.4+ too.
[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #6 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 06:54:06 UTC --- Author: jakub Date: Sat Apr 30 06:54:02 2011 New Revision: 173207 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173207 Log: PR tree-optimization/48809 * tree-switch-conversion.c (build_arrays): Compute tidx in unsigned type. (gen_inbound_check): Don't compute index_expr - range_min in utype again, instead reuse SSA_NAME initialized in build_arrays. Remove two useless gsi_for_stmt calls. * gcc.c-torture/execute/pr48809.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr48809.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-switch-conversion.c
[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 06:55:15 UTC --- Author: jakub Date: Sat Apr 30 06:55:11 2011 New Revision: 173208 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173208 Log: PR tree-optimization/48809 * tree-switch-conversion.c (build_arrays): Compute tidx in unsigned type. (gen_inbound_check): Don't compute index_expr - range_min in utype again, instead reuse SSA_NAME initialized in build_arrays. Remove two useless gsi_for_stmt calls. * gcc.c-torture/execute/pr48809.c: New test. Added: branches/gcc-4_6-branch/gcc/testsuite/gcc.c-torture/execute/pr48809.c Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/testsuite/ChangeLog branches/gcc-4_6-branch/gcc/tree-switch-conversion.c
[Bug c++/48809] [4.4/4.5 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Summary|[4.4/4.5/4.6/4.7|[4.4/4.5 Regression] switch |Regression] switch |statement optimization |statement optimization |error |error | --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-30 07:35:10 UTC --- Fixed for 4.6+ so far.
[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |jakub at gcc dot gnu.org |gnu.org | --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-29 13:38:08 UTC --- Created attachment 24145 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24145 gcc46-pr48809.patch Untested fix.
[Bug c++/48809] New: switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Summary: switch statement optimization error Product: gcc Version: 4.5.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: andrewcmar...@msn.com Created attachment 24131 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24131 C++ source The code below should return the answer 18. However, built with g++ v4.5.2 and -O2, it returns -386954936 or some other random value. Removing the line case -62:... from the switch statement should have no effect, but the resulting optimized code will execute correctly. --- #include iostream int switch_test(signed char code) { int val = 0; switch (code) { case 0: val = 1; break; case 1: val = 2; break; case 2: val = 3; break; case 3: val = 4; break; case 4: val = 5; break; case 5: val = 6; break; case 6: val = 7; break; case 7: val = 8; break; case 8: val = 9; break; case 9: val = 10; break; case 10: val = 11; break; case 11: val = 12; break; case 12: val = 13; break; case 13: val = 14; break; case 14: val = 15; break; case 15: val = 16; break; case 16: val = 17; break; case 98: val = 18; break; case -62: val = 19; break; } return val; } int main(int argc, char* argv[]) { const char input = 98; int return_val = switch_test(input); std::cout input:input std::endl; std::cout return: return_val std::endl; } --- Command line to trigger the error: [amartin@jeanie:~]$ g++ -O2 -save-temps switch_test.cc ./a.out input: b return: -386954936 If we don't optimize we don't get the error: [amartin@jeanie:~]$ g++ switch_test.cc ./a.out input: b return: 18 Compiler information: [amartin@jeanie:~]$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/libexec/gcc/x86_64-unknown-linux-gnu/4.5.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ./configure --prefix=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --with-gmp=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --with-mpfr=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --with-mpc=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --with-ppl=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --with-cloog=/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/ --enable-static --enable-languages=c,c++,objc,obj-c++ LDFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib -L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64 -I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include ' CXXFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib -L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64 -I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include ' CFLAGS='-L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib -L/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/lib64 -I/apps/gcc/4.5.2/Linux_2.6.9_x86_64-dynamic/include' Thread model: posix gcc version 4.5.2 (GCC) [amartin@jeanie:~]$ uname -a Linux jeanie 2.6.9-78.0.22.ELsmp #1 SMP Thu Apr 30 19:17:40 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux I have also observed this error in g++ 4.4.x.
[Bug c++/48809] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #2 from andrewcmartin at msn dot com 2011-04-28 18:33:40 UTC --- Created attachment 24133 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24133 -save-temps .o file
[Bug c++/48809] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #1 from andrewcmartin at msn dot com 2011-04-28 18:33:10 UTC --- Created attachment 24132 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24132 -save-temps .ii file
[Bug c++/48809] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 --- Comment #3 from andrewcmartin at msn dot com 2011-04-28 18:34:10 UTC --- Created attachment 24134 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24134 -save-temps .s file
[Bug c++/48809] [4.4/4.5/4.6/4.7 Regression] switch statement optimization error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48809 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011.04.28 18:50:45 CC||jakub at gcc dot gnu.org Target Milestone|--- |4.4.7 Summary|switch statement|[4.4/4.5/4.6/4.7 |optimization error |Regression] switch ||statement optimization ||error Ever Confirmed|0 |1 --- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org 2011-04-28 18:50:45 UTC --- Confirmed, goes away with -fno-tree-switch-conversion. extern void abort (void); int foo (signed char x) { int y = 0; switch (x) { case 0: y = 1; break; case 1: y = 2; break; case 2: y = 3; break; case 3: y = 4; break; case 4: y = 5; break; case 5: y = 6; break; case 6: y = 7; break; case 7: y = 8; break; case 8: y = 9; break; case 9: y = 10; break; case 10: y = 11; break; case 11: y = 12; break; case 12: y = 13; break; case 13: y = 14; break; case 14: y = 15; break; case 15: y = 16; break; case 16: y = 17; break; case 98: y = 18; break; case -62: y = 19; break; } return y; } int main () { signed char x = 98; if (foo (x) != 18) abort (); } Will look at it tomorrow.
Re: Switch statement optimization
Joern RENNECKE schrieb: In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote: We could use a cheap hash and base start compare / branch trees in every hash bin. Say we have a 16 k range, 200 nodes, and want to keep the hash bin/node ratio between 2 and 4. Let i be the switch argument. Then we can calculate a 9 bit hash as (i ^ (x n)) 0x3fff, where n is a value between 5 and 9. We can pick the one which produces the flattest average search trees. Note that we no longer need to check that i is in range, as for ordinary switch dispatch tables. What I suggest is to implement a cost-estimation based decision. This should include an architecture dependent modelling that could deliver an exact cost (in terms of size) for size-optimization and as well a good approximation for speed. For example in my case of 1500 labels inside the 16K range I would perfer a branch table instead of a binary compare tree. The size of the table is 16K * 4/8 bytes and this is really ok for a huge program. The frequently used entries are likely to be in the cache, so I expect the whole thing to perform well. However if I would optimize it for my case that may not help in general? So all modifications should be verified with a common standard benchmark like SPEC. Is there someone on the list who wants to support me with these tests or may deliver models for common architectures? Christian
Switch statement optimization
Hi, while dealing with autogenerated code I found that GCC often outputs compare trees instead of branch tables for switch statements. I found that GCC uses the density of case labels within their spanned range as the key criterion. If the density of labels is smaller than 10% (33% for size optimization) then GCC uses compare instruction rather than a branch table. What I am wondering is why no real cost estimation is done here. In my case there are around 200 to 1500 labels within the 16K range, so they are completely resolved to compares. In contrast the Intel compiler outputs branch tables more likely. The question now is whether the factors 10 (3 for size opt) have been found using any standard benchmark. And shouldn't at least the factor be optimized for a standard benchmark (like the upcoming SPEC CPU 2006 INT) if this simple linear method is used for modeling a logarithmic behavior? Or should a cost estimation be introduced? For size optimization there could be a quite accurate architecture dependent size estimation. Opinions, Comments? Christian PS: Please CC me.
Re: Switch statement optimization
A paper at this year's GCC Summit talked about this: http://www.gccsummit.org/2006/view_abstract.php?content_key=18 You might like to follow up with Edmar (the author of the paper). Cheers, Ben
Re: Switch statement optimization
In http://gcc.gnu.org/ml/gcc/2006-07/msg00156.html, your wrote: A paper at this year's GCC Summit talked about this: http://www.gccsummit.org/2006/view_abstract.php?content_key=18 You might like to follow up with Edmar (the author of the paper). But that was about optimizing the trees for an uneven probability distribution which could be found by profiling. This does not address the issue what to do when the distribution is mostly uniform or unknown. We could use a cheap hash and base start compare / branch trees in every hash bin. Say we have a 16 k range, 200 nodes, and want to keep the hash bin/node ratio between 2 and 4. Let i be the switch argument. Then we can calculate a 9 bit hash as (i ^ (x n)) 0x3fff, where n is a value between 5 and 9. We can pick the one which produces the flattest average search trees. Note that we no longer need to check that i is in range, as for ordinary switch dispatch tables. Moreover, on a target that can do effectively multi-way compares like the x86, in order to increase ILP, we can put into the table entry for each hash bin, in addition to the jump address, a value to compare i against before the dispatch jump takes place. If a cheap hash can be chosen so that there is no more than one entry per hash bin, we can even do a single compare in the dispatch code, go to the default case for non-match, and dispatch right to the handling code if we have a match.