Re: The GC and performance, but not what you expect

via Digitalmars-d Fri, 30 May 2014 08:56:37 -0700

On Friday, 30 May 2014 at 09:46:10 UTC, Marco Leise wrote:

simplicity. But as soon as I added a single CAS I was already
over the time that TCMalloc needs. That way I learned that CAS
is not as cheap as it looks and the fastest allocators work
thread local as long as possible.


22 cycles latency if on a valid cacheline?
+ overhead of going to memory

Did you try to add explicit prefetch, maybe that would help?

Prefetch is expensive on Ivy Brigde (43 cycles throughput, 0.5cycles on Haswell). You need instructions to fill the pipelinebetween PREFETCH and LOCK CMPXCHG. So you probably need to go ASMand do a lot of testing on different CPUs. Explicit prefetching,lock free strategies etc are tricky to get right. Get it wrongand it is worse than the naive implementation.

Re: The GC and performance, but not what you expect

Reply via email to