Hi. I have watched Martin's talk on switch lowering improvements ( https://slideslive.com/38902416/switch-lowering-improvements), the last slide has a question about benchmarks that can be used for tuning the switch statement optimizer. Martin mentioned one common use case - bytecode interpreters (such as perlbench from spec CPU 2006 and 2017). But there is a caveat with modern bytecode interpreters, such as CPython: they use computed gotos instead of switch statements and also implement the "Threaded code" technique to improve utilization of the CPU's branch predictor. (see this comment for detailed explanation: https://github.com/python/cpython/blob/master/Python/ceval.c#L585)
Another common use case involving hot switch statements are various lexers and parsers (either hand-coded or generated by tools such as ragel and re2c). For example, a well-known web server Nginx uses several huge hand-coded switch statements to parse HTTP requests ( http://lxr.nginx.org/source/src/http/ngx_http_parse.c). I found an isolated benchmark for this parser: https://natsys-lab.blogspot. ru/2014/11/the-fast-finite-state-machine-for-http.html (code: https://github.com/natsys/blog/tree/master/http_benchmark). I hope this can be helpful for performance analysis. On Fri, Oct 6, 2017 at 4:46 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > Martin Liska wrote: > > > There are some numbers for cc1plus: > > > > $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus > > VM SIZE FILE SIZE > > +3.8% +1.11Mi TOTAL +1.03Mi +0.5% > > > insn-attrtab.o: > > VM SIZE FILE SIZE > > +214% +682Ki .rodata +682Ki +214% > > -50.1% -63.3Ki .text -63.3Ki -50.1% > > So is that a 3.8% codesize increase or decrease? If an increase, > I can't see how replacing 63KB of instructions with 682KB of data > is a good tradeoff... There should be an accurate calculation > of the density, taking the switch table width into account (really small > tables can use 1-byte offsets, large tables are typically forced to > use 4-byte offsets). This may need new target callbacks - I changed > PARAM_CASE_VALUES_THRESHOLD on AArch64 to get smaller > code and better performance since the current density calculations > are hardcoded and quite wrong for big tables... > > Also what is the codesize difference on SPEC2006/2017? I don't see > any mention of performance impact either... > > Wilco -- Regards, Mikhail Maltsev