Thanks for everyone who threw some ideas. I was able to prove that it is* *not** a JIT/HotSpot de-optimization.
First I got the following output when I used "-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining": Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000 currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831 zeroReads=2 partialReads=0 *77543 560 % ! 4 Client::run @ -2 (270 bytes) made not entrant* Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001 currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531 realRead=2363 zeroReads=1 partialReads=0 Even a single branch like "this.interval = totalMessagesSent >= WARMUP ? 30000 : 1;" will trigger the "made not entrant". I even tried to make it a method, but still got "made not entrant". So I thought: *THAT'S IT!* Nope, that was **not** it :) I got rid of the branch (i.e. IF) by replacing "this.interval = totalMessagesSent >= WARMUP ? 30000 : 1;" with: // not mathematical equivalent but close enough for our purposes // for totalMessagesSent >= WARMUP it return 30001 instead of 30000 this.interval = (totalMessagesSent / WARMUP * 30000) + 1; Then I ran it again with "-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and got: Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99998 currInterval=1 latency=4122 timeToWrite=2476 timeToRead=1626 realRead=756 zeroReads=2 partialReads=0 Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99999 currInterval=1 latency=4041 timeToWrite=2387 timeToRead=1630 realRead=760 zeroReads=2 partialReads=0 Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=100000 currInterval=1 latency=4223 timeToWrite=2504 timeToRead=1690 realRead=739 zeroReads=2 partialReads=0 Thu Apr 13 10:36:22 EDT 2017 Results: totalMessagesSent=100001 currInterval=30001 latency=*10245* timeToWrite=5457 timeToRead=4755 realRead=2223 zeroReads=6 partialReads=0 Thu Apr 13 10:36:45 EDT 2017 Results: totalMessagesSent=100002 currInterval=30001 latency=*10908* timeToWrite=4648 timeToRead=6237 realRead=719 zeroReads=7 partialReads=0 Thu Apr 13 10:37:08 EDT 2017 Results: totalMessagesSent=100003 currInterval=30001 latency=*10126* timeToWrite=4088 timeToRead=6005 realRead=2077 zeroReads=11 partialReads=0 So no JIT/HotSpot de-optimization. I've also ruled out thread issues by pinning my thread to an isolated CPU core through affinity and isolcpus. This behavior happens in MacOS and in Linux. I wonder if it also happens on Windows. And in C. That would give us more clues. Thanks, -JC On Thursday, April 13, 2017 at 2:44:49 AM UTC-5, Wojciech Kudla wrote: > > I'd also monitor /proc/interrupts and /proc/softirqs for your target cpu > > On Thu, 13 Apr 2017, 08:36 Gil Tene, <g...@azul.com <javascript:>> wrote: > >> If I read this right. You are running this on localhost (according to SO >> code). If that's the case, there is no actual network, and no actual TCP >> stack... UDP or TCP won't make a difference then, and neither will any TCP >> tweaking. I think this rules out the network, the switch, the NICs, and >> most of the OS's network stack. >> >> Now you're looking at the JVM, the OS scheduling, power management, cache >> behavior, etc. >> >> Some more things to play with to rule out or find some insight: >> >> - Rule out de-optimization (you may be de-optimizing when the if >> (totalMessagesSent == WARMUP) triggers). Do this by examining at >> -XX:+PrintCompilation output >> >> - Rule out scheduling and cpu migration effects: use isolcpus and pin >> your processes to specific cores >> >> - How do you know that your actually disabled all power management. I'd >> monitor cstate and pstate to see what they actually are over time. >> Cool anecdote: We once had a case where something in the system was >> mysteriously elevating cstate away from 0 after we set to to 0. We never >> did find out what it was. The case was "resolved" with a cron job that set >> cstate to 0 every minute (yuck. I know). >> >> - Slew with different interval time in your tests to find out how long >> the interval needs to be before you see the perf drop. The value at which >> this effect starts may be an interesting hint >> >> >> On Wednesday, April 12, 2017 at 12:56:22 PM UTC-7, J Crawford wrote: >>> >>> The SO question has the source codes of a simple server and client that >>> demonstrate and isolate the problem. Basically I'm timing the latency of a >>> ping-pong (client-server-client) message. I start by sending one message >>> every 1 millisecond. I wait for 200k messages to be sent so that the >>> HotSpot has a chance to optimize the code. Then I change my pause time from >>> 1 millisecond to 30 seconds. For my surprise my write and read operation >>> become considerably slower. >>> >>> I don't think it is a JIT/HotSpot problem. I was able to pinpoint the >>> slower method to the native JNI calls to write (write0) and read. Even if I >>> change the pause from 1 millisecond to 1 second, problem persists. >>> >>> I was able to observe that on MacOS and Linux. >>> >>> Does anyone here have a clue of what can be happening? >>> >>> Note that I'm disabling Nagle's Algorithm with setTcpNoDelay(true). >>> >>> SO question with code and output: >>> http://stackoverflow.com/questions/43377600/socketchannel-why-if-i-write-msgs-quickly-the-latency-of-each-message-is-low-b >>> >>> Thanks! >>> >>> -JC >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to mechanical-sympathy+unsubscr...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.