Re: [libuv] ring-buffer problem in echo server sample

Michael Kilburn Sun, 25 Mar 2018 18:21:12 -0700

On Sun, Mar 25, 2018 at 2:52 PM, Jameson Nash <vtjn...@gmail.com> wrote:


> There is likely performance reasons the IOCP model is more efficient (e.g
> faster / higher bandwidth utilization), even if you have to emulate it via
> epoll. Even where you could get gradual notifications, you don't want to be
> filling a buffer one byte at a time. This would be massively wasteful of
> processor time and communication bandwidth. The underlying layers will
> generally try to do block memory moves, since those are more efficient (due
> to dividing the overhead over a larger amount of data). You have the
> beginning of the right idea with "break it into 1-byte write requests", but
> this is the wrong granularity (too many context switches wastes effort).
> You want to instead provide a large enough block of data in each request to
> ensure each packet is full, and only get notified when there is at least
> that much space in the write buffer. Then you can decide how much memory
> you want to pre-fill vs. how bad it is to miss a deadline. For each chunk,
> we can estimate the TCP MTU to give us a starting point (1500 byte) and
> then add two orders of magnitude (let's say N=128kB) to drive the overhead
> error towards zero and because there are many request blocks in flight on
> the wire simultaneously(*). Finally, decide how many extra blocks you want
> libuv to be ready to transmit. For a good start, it's probably reasonable
> just to pick K=two (this would also let it use a ping-pong buffer strategy,
> rather than a ring buffer, but `malloc` is usually also just fine). Later,
> if it's not meeting requirements, you can use queuing theory to estimate
> the optimal thresholds. (it's been a few years since I took a class in
> networking, so sorry if I'm a bit hand-wavy on some of the details, and let
> me know if I missed anything in my estimation attempts)
>
> Finally, to tie this all together, inside the application, the goal would
> be to perform `uv_write` operations on blocks of size N whenever the
> pending count (should be feasible to manage this yourself, or look at the
> libuv field of outstanding write reqs) is below your chosen threshold on
> queue length (e.g. K=2).
>
> (*) Another way to derive this is the formula `buffer-size = ping-time *
> bandwidth`, which more directly and precisely estimates just how much data
> could theoretically could get removed from the buffer when the ACK packets
> return.
>

Yes, I thought about TCP segment size/etc, but:
- this means client has to be aware of underlying protocol details (and as
it turns out there is no way to know precisely how large your TCP segment
can be -- TCP header can have optional fields and things like VPN can get
in a way too)
- and make a correct call about granularity of buffers being submitted to
IOCP
- and even if it is completely correct -- there will be waste of space as
most user-level requests (that needs to be sent out) won't fit the buffer
perfectly

now compare this to a situation where application maintain one ring buffer
and simply adds to it and reuses portions released by fictional "buffer
release" notification:
- application doesn't need to be aware of underlying layer specifics -- as
data flows to network buffer gets released in whatever granularity that is
right in given circumstances
- there is no wasted space (well, maybe you'd want to consider to align
your data to avoid cpu cache line sharing)
- kernel doesn't need to notify user code about every byte written out --
all it needs is to raise a flag and maintain a "data written out" counter;
once client finally gets to process the notification -- he'll reclaim all
buffer space already sent out in one go





> On Sat, Mar 24, 2018 at 2:51 PM Michael Kilburn <crusader.m...@gmail.com>
> wrote:
>
>> On Sat, Mar 24, 2018 at 4:51 AM, Ben Noordhuis <i...@bnoordhuis.nl>
>> wrote:
>>
>>> On Thu, Mar 22, 2018 at 11:32 PM, CM <crusader.m...@gmail.com> wrote:
>>> > To be more precise -- I wonder if you can get "gradual" buffer write
>>> > notifications. I.e. as OS "drains" my buffer (writes it out to the
>>> network)
>>> > I'd like to receive notifications indicating how much of the buffer was
>>> > written out. Is this possible with libuv (or Windows IOCP)?
>>>
>>> In general that's not possible.  You could hack libuv's UNIX port to
>>> give you that kind of notification but it won't work on Windows.
>>>
>>
>> Hmm... Indeed, Unix "readiness" model (where you get notified of socket
>> being "ready", write as much as can fit and wait for next notification)
>> works very nicely here, but it leads to one extra memory copy (moving data
>> from user buffer to socket buffer). IOCP model -- not so much, but it
>> potentially enables zero-copy direct-memory protocol (where you register
>> you buffer(s) and network card reads it directly).
>>
>> So for this to happen on Linux all I need is to "pierce" the "conversion
>> layer" libuv put on top of readiness model for it to work like IOCP model.
>> On Windows -- the only thing that comes to mind is to break every write
>> request into 1-byte write requests :-)
>>
>> If only we had "buffer readiness" notification added to IOCP -- similar
>> to how edge-triggered epoll works... I.e. once NIC sends out some data --
>> it updates "bytes sent" counter and (unless it is already set) sets the
>> "alarm", once app receives notification about alarm -- it'll read counter,
>> "disarm" alarm and do smth with (now free) buffer. Very similar to Unix
>> readiness model, but instead of "socket buffer is ready to receive data" it
>> means "your buffer is no longer needed".
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to libuv+unsubscr...@googlegroups.com.
To post to this group, send email to libuv@googlegroups.com.
Visit this group at https://groups.google.com/group/libuv.
For more options, visit https://groups.google.com/d/optout.

Re: [libuv] ring-buffer problem in echo server sample

Reply via email to