Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

Martin Thompson Thu, 13 Apr 2017 09:32:29 -0700

Try the various Linux "perf events" tools, e.g. $ perf record ..., or some
of the following to get more focused in.


https://github.com/RRZE-HPC/likwid

http://jpbempel.blogspot.co.uk/2013/08/hardware-performance-counters.html


On 13 April 2017 at 17:22, J Crawford <latencyfigh...@mail.com> wrote:

> Hi Martin! Thanks for trying to help out. I'm indeed testing this on
> loopback. Can you give me pointers on how to measure L1 and L2 cache
> hit/miss? I've never done that before. I was able to confirm that it also
> happens on Windows. We are getting close to understanding this mystery.
>
> Thanks!
>
> -JC
>
> On Thursday, April 13, 2017 at 11:17:38 AM UTC-5, Martin Thompson wrote:
>>
>> OSR can be avoided if you put the body of your loops in their own methods
>> so they get normal JIT support but this is unlikely to explain such a
>> significant step in latency.
>>
>> As Gil mentions using loopback will give very different results to a real
>> network. The Linux kernel bypasses OSI layer 2 for loopback so no QDiscs.
>> For example Nagle not only does not apply on loopback, it WILL also
>> increase latency a little when disabled, really!
>>
>> Have you measured L1 and L2 cache hit and miss rates in each case? Even
>> with ISOCPUS the Intel private caches (L1 & L2) are inclusive with the
>> shared L3 so that if the L3 has to evict lines then they need to go from
>> the corresponding L1/L2 caches. You can use CAT (Cache Allocation
>> Technology), CoD (Cluster on Die), or separate sockets to help avoid this.
>>
>> On Thursday, 13 April 2017 16:01:49 UTC+1, J Crawford wrote:
>>>
>>> Thanks for everyone who threw some ideas. I was able to prove that it is*
>>> *not** a JIT/HotSpot de-optimization.
>>>
>>> First I got the following output when I used "-XX:+PrintCompilation
>>> -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining":
>>>
>>>     Thu Apr 13 10:21:16 EDT 2017 Results: totalMessagesSent=100000
>>> currInterval=1 latency=4210 timeToWrite=2514 timeToRead=1680 realRead=831
>>> zeroReads=2 partialReads=0
>>>       *77543  560 % !   4       Client::run @ -2 (270 bytes)   made not
>>> entrant*
>>>     Thu Apr 13 10:21:39 EDT 2017 Results: totalMessagesSent=100001
>>> currInterval=30000 latency=11722 timeToWrite=5645 timeToRead=4531
>>> realRead=2363 zeroReads=1 partialReads=
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Why would SocketChannel be slower when sending a single msg instead of 1k msgs after proper warmup?

Reply via email to