Try the various Linux "perf events" tools, e.g. $ perf record ..., or some of the following to get more focused in.
https://github.com/RRZE-HPC/likwid http://jpbempel.blogspot.co.uk/2013/08/hardware-performance-counters.html On 13 April 2017 at 17:22, J Crawford <latencyfigh...@mail.com> wrote: > Hi Martin! Thanks for trying to help out. I'm indeed testing this on > loopback. Can you give me pointers on how to measure L1 and L2 cache > hit/miss? I've never done that before. I was able to confirm that it also > happens on Windows. We are getting close to understanding this mystery. > > Thanks! > > -JC > > On Thursday, April 13, 2017 at 11:17:38 AM UTC-5, Martin Thompson wrote: >> >> OSR can be avoided if you put the body of your loops in their own methods >> so they get normal JIT support but this is unlikely to explain such a >> significant step in latency. >> >> As Gil mentions using loopback will give very different results to a real >> network. The Linux kernel bypasses OSI layer 2 for loopback so no QDiscs. >> For example Nagle not only does not apply on loopback, it WILL also >> increase latency a little when disabled, really! >> >> Have you measured L1 and L2 cache hit and miss rates in each case? Even >> with ISOCPUS the Intel private caches (L1 & L2) are inclusive with the >> shared L3 so that if the L3 has to evict lines then they need to go from >> the corresponding L1/L2 caches. You can use CAT (Cache Allocation >> Technology), CoD (Cluster on Die), or separate sockets to help avoid this. >> >> On Thursday, 13 April 2017 16:01:49 UTC+1, J Crawford wrote: >>> >>> Thanks for everyone who threw some ideas. I was able to prove that it is* >>> *not** a JIT/HotSpot de-optimization. >>> >>> First I got the following output when I used "-XX:+PrintCompilation >>> -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining": >>> >>> Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000 >>> currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831 >>> zeroReads=2 partialReads=0 >>> *77543 560 % ! 4 Client::run @ -2 (270 bytes) made not >>> entrant* >>> Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001 >>> currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531 >>> realRead=2363 zeroReads=1 partialReads= >>> >> -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mechanical-sympathy+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.