> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Serguei Osokine
> Sent: Sunday, December 16, 2007 6:15 PM
> To: theory and practice of decentralized computer networks
> Subject: Re: [p2p-hackers] MTU in the real world
> 
> On Tuesday, May 31, 2005 Serguei Osokine wrote:
> > We tried to use UDP to transfer stuff over a gigabit LAN inside
> > the cluster. Pretty soon we discovered that with small (~1500 byte) 
> > packets the CPU was the bottleneck, because you can send 
> only so many
> > packets per second, and the resulting throughput was 
> nowhere close to
> > a gigabit. 
> > ...
> > (Datagrams smaller than MTU sucked performance-wise when compared to
> > TCP, but that is another story - gigabit cards tend to 
> offload plenty
> > of TCP functionality from the CPU, so it was not that the UDP was
> > particularly bad, but rather that TCP performance was very good.)
> 
>       An update for anyone who still cares after two and a half years:
> it turns out that UDP *was* particularly bad. We have discovered this
> almost as an accident, and it looks like a Windows problem - probably
> in WinSock UDP implementation. 
>
>       As it turned out, the CPU percent vs sending rate chart has a 
> clear 'hockey stick' shape - CPU use is zero until some middle point,
> and then it starts to grow linearly, which is already unexpected by
> itself. What's even more funny, the sendto() call time is always the
> same regardless of the sending rate (controlled by sleeps between 
> sends) and regardless of CPU usage percent, and it is this time that
> is limiting the single thread sending performance. 

Thanks for sharing this information, this is very interesting.
In fact so interesting that I had to run few tests ASAP, which
I just did. Unfortunately I cannot reproduce your findings -

* the execution time of sendto() on my machine clearly depends on 
  a size of the packet and it is virtually the same for blocking
  and non-blocking sockets.

        bytes           microseconds
        256             25
        1024            87
        4096            345
        16384           1370

* sendto() on non-blocking socket does fail with WOULDBLOCK. The
  larger the packet size, the more frequently it fails. With the
  test code described below, the failure rate was 20% for 1024
  byte packets and nearly 95% for 2048 byte ones.

* CPU usage patterns of sendto() loop in blocking and non-blocking
  cases are virtually the same. 

  The test is a simple busy loop (i.e. without any sleeping) 
  that calls sendto() and (conditionally) select() to wait for 
  socket's writability if sendto() fails with WOULDBLOCK.

  CPU was maxed when the packet size was between 16 and 256 bytes. 
  The usage dropped to 70-80% for 512-2048 byte packets, to 30% -
  for 4096 byte one, to 20% - for 16K and to 15% - for 64K.

  The network link utilization was close 100% at all times.     

I tested over 100MBit cable and 802.11g wireless connections. I have
also tested the case when the recepient had a receiving socket open,
and the case when it did not (thus generating backflow of ICMPs).

Admittedly, I didn't run the test over a one-gig link, but still the
discrepancy between your findings and my results is a quite a bit odd.

Is there any chance your test machine was running some sort of 3rd
party firewall or, perhaps, a network monitor ? If it was a socket- 
or TDI-level filter, this could explain the constant sendto() 
execution time and other observations.

Alex

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to