Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

J Crawford Thu, 13 Apr 2017 08:02:29 -0700

Thanks for everyone who threw some ideas. I was able to prove that it is* 
*not** a JIT/HotSpot de-optimization.


First I got the following output when I used "-XX:+PrintCompilation 
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining":

    Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000 
currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831 
zeroReads=2 partialReads=0
      *77543  560 % !   4       Client::run @ -2 (270 bytes)   made not 
entrant*
    Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001 
currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531 
realRead=2363 zeroReads=1 partialReads=0

Even a single branch like "this.interval = totalMessagesSent >= WARMUP ? 
30000 : 1;" will trigger the "made not entrant". I even tried to make it a 
method, but still got "made not entrant".

So I thought: *THAT'S IT!*

Nope, that was **not** it :)

I got rid of the branch (i.e. IF) by replacing "this.interval = 
totalMessagesSent >= WARMUP ? 30000 : 1;" with:

    // not mathematical equivalent but close enough for our purposes
    // for totalMessagesSent >= WARMUP it return 30001 instead of 30000
    this.interval = (totalMessagesSent / WARMUP * 30000) + 1;

Then I ran it again with "-XX:+PrintCompilation 
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining" and got:

    Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99998 
currInterval=1 latency=4122 timeToWrite=2476 timeToRead=1626 realRead=756 
zeroReads=2 partialReads=0
    Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=99999 
currInterval=1 latency=4041 timeToWrite=2387 timeToRead=1630 realRead=760 
zeroReads=2 partialReads=0
    Thu Apr 13 10:36:00 EDT 2017 Results: totalMessagesSent=100000 
currInterval=1 latency=4223 timeToWrite=2504 timeToRead=1690 realRead=739 
zeroReads=2 partialReads=0
    Thu Apr 13 10:36:22 EDT 2017 Results: totalMessagesSent=100001 
currInterval=30001 latency=*10245* timeToWrite=5457 timeToRead=4755 
realRead=2223 zeroReads=6 partialReads=0
    Thu Apr 13 10:36:45 EDT 2017 Results: totalMessagesSent=100002 
currInterval=30001 latency=*10908* timeToWrite=4648 timeToRead=6237 
realRead=719 zeroReads=7 partialReads=0
    Thu Apr 13 10:37:08 EDT 2017 Results: totalMessagesSent=100003 
currInterval=30001 latency=*10126* timeToWrite=4088 timeToRead=6005 
realRead=2077 zeroReads=11 partialReads=0

So no JIT/HotSpot de-optimization.

I've also ruled out thread issues by pinning my thread to an isolated CPU 
core through affinity and isolcpus.

This behavior happens in MacOS and in Linux. I wonder if it also happens on 
Windows. And in C. That would give us more clues.

Thanks,

-JC

On Thursday, April 13, 2017 at 2:44:49 AM UTC-5, Wojciech Kudla wrote:
>
> I'd also monitor /proc/interrupts and /proc/softirqs for your target cpu 
>
> On Thu, 13 Apr 2017, 08:36 Gil Tene, <g...@azul.com <javascript:>> wrote:
>
>> If I read this right. You are running this on localhost (according to SO 
>> code). If that's the case, there is no actual network, and no actual TCP 
>> stack... UDP or TCP won't make a difference then, and neither will any TCP 
>> tweaking. I think this rules out the network, the switch, the NICs, and 
>> most of the OS's network stack. 
>>
>> Now you're looking at the JVM, the OS scheduling, power management, cache 
>> behavior, etc.
>>
>> Some more things to play with to rule out or find some insight:
>>
>> - Rule out de-optimization (you may be de-optimizing when the if 
>> (totalMessagesSent == WARMUP) triggers). Do this by examining at 
>> -XX:+PrintCompilation output
>>
>> - Rule out scheduling and cpu migration effects: use isolcpus and pin 
>> your processes to specific cores
>>
>> - How do you know that your actually disabled all power management. I'd 
>> monitor cstate and pstate to see what they actually are over time. 
>>   Cool anecdote: We once had a case where something in the system was 
>> mysteriously elevating cstate away from 0 after we set to to 0. We never 
>> did find out what it was. The case was "resolved" with a cron job that set 
>> cstate to 0 every minute (yuck. I know).
>>
>> - Slew with different interval time in your tests to find out how long 
>> the interval needs to be before you see the perf drop. The value at which 
>> this effect starts may be an interesting hint
>>
>>
>> On Wednesday, April 12, 2017 at 12:56:22 PM UTC-7, J Crawford wrote:
>>>
>>> The SO question has the source codes of a simple server and client that 
>>> demonstrate and isolate the problem. Basically I'm timing the latency of a 
>>> ping-pong (client-server-client) message. I start by sending one message 
>>> every 1 millisecond. I wait for 200k messages to be sent so that the 
>>> HotSpot has a chance to optimize the code. Then I change my pause time from 
>>> 1 millisecond to 30 seconds. For my surprise my write and read operation 
>>> become considerably slower.
>>>
>>> I don't think it is a JIT/HotSpot problem. I was able to pinpoint the 
>>> slower method to the native JNI calls to write (write0) and read. Even if I 
>>> change the pause from 1 millisecond to 1 second, problem persists.
>>>
>>> I was able to observe that on MacOS and Linux.
>>>
>>> Does anyone here have a clue of what can be happening?
>>>
>>> Note that I'm disabling Nagle's Algorithm with setTcpNoDelay(true).
>>>
>>> SO question with code and output: 
>>> http://stackoverflow.com/questions/43377600/socketchannel-why-if-i-write-msgs-quickly-the-latency-of-each-message-is-low-b
>>>
>>> Thanks!
>>>
>>> -JC
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to mechanical-sympathy+unsubscr...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

Reply via email to