On 2014-12-08 11:44 AM, Martin Nowak wrote:
This is for the most performance critical instructions during GC marking.
If we can come up with some good SIMD this will result in a good speedup.

Yesterday I was surprised to learn that my unsigned wrap trick actually
slowed down some GC benchmarks by 3%. The branch predictor had more
trouble with the single branch because that resulted in a fifty-fifty
chance. There is some correlation between the 2 branch bounds checks and
one of them could be predicted fairly well, resulting in a better
combined result.
Always profile!

Are you using it for the binary search part (find pool) ?

http://cs.nyu.edu/~lerner/spring12/Preso06-SIMDTree.pdf

The savings are would be best when the tree (pool info?) is under 64KB, savings being up to 30%. It could be worth doing it. Here's a quick solution:

http://stackoverflow.com/questions/20616605/using-simd-avx-sse-for-tree-traversal

Most of the code circulating on the web uses the _mm256 or _mm128 SIMD format, which I ported to multiple platforms already and I can share it through boost license too if you want:

https://github.com/etcimon/botan/blob/master/source/botan/utils/simd/immintrin.d

Reply via email to