OSR can be avoided if you put the body of your loops in their own methods so they get normal JIT support but this is unlikely to explain such a significant step in latency.
As Gil mentions using loopback will give very different results to a real network. The Linux kernel bypasses OSI layer 2 for loopback so no QDiscs. For example Nagle not only does not apply on loopback, it WILL also increase latency a little when disabled, really! Have you measured L1 and L2 cache hit and miss rates in each case? Even with ISOCPUS the Intel private caches (L1 & L2) are inclusive with the shared L3 so that if the L3 has to evict lines then they need to go from the corresponding L1/L2 caches. You can use CAT (Cache Allocation Technology), CoD (Cluster on Die), or separate sockets to help avoid this. On Thursday, 13 April 2017 16:01:49 UTC+1, J Crawford wrote: > > Thanks for everyone who threw some ideas. I was able to prove that it is* > *not** a JIT/HotSpot de-optimization. > > First I got the following output when I used "-XX:+PrintCompilation > -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining": > > Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000 > currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831 > zeroReads=2 partialReads=0 > *77543 560 % ! 4 Client::run @ -2 (270 bytes) made not > entrant* > Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001 > currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531 > realRead=2363 zeroReads=1 partialReads= > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.