Hi.

I have watched Martin's talk on switch lowering improvements (
https://slideslive.com/38902416/switch-lowering-improvements), the last
slide has a question about benchmarks that can be used for tuning the
switch statement optimizer. Martin mentioned one common use case - bytecode
interpreters (such as perlbench from spec CPU 2006 and 2017). But there is
a caveat with modern bytecode interpreters, such as CPython: they use
computed gotos instead of switch statements and also implement the
"Threaded code" technique to improve utilization of the CPU's branch
predictor. (see this comment for detailed explanation:
https://github.com/python/cpython/blob/master/Python/ceval.c#L585)

Another common use case involving hot switch statements are various lexers
and parsers (either hand-coded or generated by tools such as ragel and
re2c). For example, a well-known web server Nginx uses several huge
hand-coded switch statements to parse HTTP requests (
http://lxr.nginx.org/source/src/http/ngx_http_parse.c).

I found an isolated benchmark for this parser: https://natsys-lab.blogspot.
ru/2014/11/the-fast-finite-state-machine-for-http.html (code:
https://github.com/natsys/blog/tree/master/http_benchmark). I hope this can
be helpful for performance analysis.


On Fri, Oct 6, 2017 at 4:46 PM, Wilco Dijkstra <wilco.dijks...@arm.com>
wrote:

> Martin Liska wrote:
>
> > There are some numbers for cc1plus:
> >
> > $ bloaty ./objdir2/gcc/cc1plus -- ./objdir/gcc/cc1plus
> >     VM SIZE                      FILE SIZE
> >   +3.8% +1.11Mi TOTAL          +1.03Mi  +0.5%
>
> > insn-attrtab.o:
> >     VM SIZE                          FILE SIZE
> >   +214%  +682Ki .rodata             +682Ki  +214%
> >  -50.1% -63.3Ki .text              -63.3Ki -50.1%
>
> So is that a 3.8% codesize increase or decrease? If an increase,
> I can't see how replacing 63KB of instructions with 682KB of data
> is a good tradeoff... There should be an accurate calculation
> of the density, taking the switch table width into account (really small
> tables can use 1-byte offsets, large tables are typically forced to
> use 4-byte offsets). This may need new target callbacks - I changed
> PARAM_CASE_VALUES_THRESHOLD on AArch64 to get smaller
> code and better performance since the current density calculations
> are hardcoded and quite wrong for big tables...
>
> Also what is the codesize difference on SPEC2006/2017? I don't see
> any mention of performance impact either...
>
> Wilco




-- 
Regards,
   Mikhail Maltsev

Reply via email to