Richard Henderson <r...@twiddle.net> writes: > From: "Emilio G. Cota" <c...@braap.org> > > Optimizations to cross-page chaining and indirect branches make > performance more sensitive to the hit rate of tb_jmp_cache. > The constraint of reserving some bits for the page number > lowers the achievable quality of the hashing function. > > However, user-mode does not have this requirement. Thus, > with this change we use for user-mode a hashing function that > is both faster and of better quality than the previous one. > > Measurements: > > Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0. > > - SPECint06 (test set), x86_64-linux-user. Host: > Intel i7-6700K @ 4.00GHz > > 2.2x > +-+--------------------------------------------------------------------------------------------------------------+-+ > | > | > | jr > | > 2x +jr+multhash > +....................................................+++++...................................+-+ > | jr+hash > |$$$ | > | > |$+$ | > | > ### $ | > 1.8x > +-+......................................................................#|#.$...................................+-+ > | > ++#+# $ | > | > |# # $ | > 1.6x > +-+....................................................................***.#.$....................++$$$..........+-+ > | $$$ > *+* # $ |$+$ | > | ++$$$ ### $ > * * # $ +++|$ $ | > | ++###+$ # # $ > * * # $ ### ****## $ | > 1.4x > +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+ > | *+* # $ * * # $ > * * # $ # # $ * *+# $ | > | * * # $ +++++ * * # $ > * * # $ *** # $ * * # $ ###$$ | > 1.2x > +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+ > | * * # $ *+* # $ * * # $ +++ > * * # $ ++###$$ * * # $ * * # $ * * # $ | > | ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### > * * # $ *** #+$ * * # $ * * # $ * * # $ | > | *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# > * * # $ * * # $ * * # $ * * # $ * * # $ | > 1x > +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+ > | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ > * * # $ * * # $ * * # $ * * # $ * * # $ | > | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ > * * # $ * * # $ * * # $ * * # $ * * # $ | > 0.8x > +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+ > astar bzip2 gcc gobmk h264ref hmmlibquantum mcf > omnetpperlbench sjengxalancbmk hmean > png: http://imgur.com/4UXTrEc > > Here I also tried the hash function suggested by Paolo ("multhash"): > > return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1); > > As you can see it is just as good as the other new function ("hash"), > which is what I ended up going with. > > - SPECint06 (train set), x86_64-linux-user. Host: > Intel i7-6700K @ 4.00GHz > > 2.6x > +-+--------------------------------------------------------------------------------------------------------------+-+ > | > | > | jr > ### | > 2.4x > +jr+hash...........................................................................................#.#...........+-+ > | > # # | > | > # # | > 2.2x > +-+................................................................................................#.#...........+-+ > | > # # | > | > # # | > 2x > +-+................................................................................................#.#...........+-+ > | > **** # | > | > * * # | > 1.8x > +-+.............................................................................................*..*.#...........+-+ > | > +++ * * # | > | > #### #### * * # | > 1.6x > +-+......................................####.............................#..#.****..#..........*..*.#...........+-+ > | +++ #++# > **** # * * # #### * * # | > | ### # # > * * # * * # # # * * # | > 1.4x > +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+ > | *++* # * * # > * * # * * # *** # * * # #### | > | * * # #### * * # > * * # * * # * * # * * # **** # | > 1.2x > +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+ > | ****### * * # * * # * * # > * * # * * # * * # * * # * * # | > | * * # ***### * * # * * # * * # ****## > * * # * * # * * # * * # * * # | > 1x > +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+ > astar bzip2 gcc gobmk h264ref hmmlibquantum mcf > omnetpperlbench sjengxalancbmk hmean > png: http://imgur.com/ArCbHqo > > - NBench, x86_64-linux-user. Host: Intel > i7-6700K @ 4.00GHz > > 1.12x > +-+-------------------------------------------------------------------------------------------------------------+-+ > | > | > | jr +++ > | > 1.1x > +jr+hash...........................................................####.........................................+-+ > | +++#| > # | > | | > #++# | > 1.08x > +-+................................+++................+++.+++..*****..#.........................................+-+ > | | +++ | | * | * > # | > | | | | | *+++* > # | > 1.06x > +-+................................****###.............|...|...*...*..#.........................+++.............+-+ > | *| * |# ****### * * > # | | > | *| *++# *| * |# * * > # #### | > 1.04x > +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+ > | * * # *++*++# * * > # +++#++# | > | * * # * * # * * > # | # # +++#### | > 1.02x > +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+ > | +++ * * # +++ | * * # * * > # +++ *| * # *+++* # | > | +++ | +++ +++ ++++++ * * # *****### * * # * * > # | +++ ++++++ *++* # * * # | > 1x > +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+ > | *****| # *++* |# *****| # * * # * *++# * * # * * > # **** |# * * # * * # * * # | > | * | *| # * *++# * | *++# * * # * * # * * # * * > # *| *++# * * # * * # * * # | > 0.98x > +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+ > | *+++* # * * # * * # * * # * * # * * # * * > # * * # * * # * * # * * # | > | * * # * * # * * # * * # * * # * * # * * > # * * # * * # * * # * * # | > 0.96x > +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+ > ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU > DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean > png: http://imgur.com/ZXFX0hJ > > - NBench, arm-linux-user. Host: Intel > i7-4790K @ 4.00GHz > > 1.3x > +-+-------------------------------------------------------------------------------------------------------------+-+ > | #### > | > | jr # # > +++ | > 1.25x > +jr+hash.....................#..#...........................................####................................+-+ > | # # > # # | > | # # > # # | > 1.2x > +-+..........................#..#...........................................#..#................................+-+ > | # # > # # | > | # # > # # | > 1.15x > +-+..........................#..#...........................................#..#................................+-+ > | # # > #### # # | > | # # # > # # # | > 1.1x > +-+..........................#..#..................................#..#.....#..#................................+-+ > | # # # > # # # +++ | > | # # #### # > # # # #### | > 1.05x > +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+ > | # # # # # # # > # # # +++ # # | > | +++ ***** # #### ***** # # # +++# > # **** # ****### # # | > 1x > +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+ > | * * # * * | * * # * * # * * # **** # * * > # * * # * *### * *++# * * # | > | * * # * *### * * # * * # * * # * * # * * > # * * # * * # * * # * * # | > 0.95x > +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+ > | * * # * * |# * * # * * # * * # * * # * * > # * * # * * # * * # * * # | > | * * # * * |# * * # * * # * * # * * # * * > # * * # * * # * * # * * # | > 0.9x > +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+ > ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU > DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean > png: http://imgur.com/FfD27ey > > Reviewed-by: Richard Henderson <r...@twiddle.net> > Signed-off-by: Emilio G. Cota <c...@braap.org> > Message-Id: <1493263764-18657-12-git-send-email-c...@braap.org> > Signed-off-by: Richard Henderson <r...@twiddle.net> > --- > include/exec/tb-hash.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h > index 2c27490..b1fe2d0 100644 > --- a/include/exec/tb-hash.h > +++ b/include/exec/tb-hash.h > @@ -22,6 +22,8 @@ > > #include "exec/tb-hash-xx.h" > > +#ifdef CONFIG_SOFTMMU > + > /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for > addresses on the same page. The top bits are the same. This allows > TLB invalidation to quickly clear a subset of the hash table. */ > @@ -45,6 +47,16 @@ static inline unsigned int > tb_jmp_cache_hash_func(target_ulong pc) > | (tmp & TB_JMP_ADDR_MASK)); > } > > +#else > + > +/* In user-mode we can get better hashing because we do not have a TLB */ > +static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) > +{ > + return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1); > +} > + > +#endif /* CONFIG_SOFTMMU */ > + > static inline > uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t > flags) > {
I'll note when I've plotted hit-rates against the cache we don't seem to be making a good even use of the cache over time. But I suspect there is more that could be done here. That said the numbers are compelling so: Reviewed-by: Alex Bennée <alex.ben...@linaro.org> -- Alex Bennée