On Sun, Mar 25, 2018 at 2:52 PM, Jameson Nash <vtjn...@gmail.com> wrote:

> There is likely performance reasons the IOCP model is more efficient (e.g
> faster / higher bandwidth utilization), even if you have to emulate it via
> epoll. Even where you could get gradual notifications, you don't want to be
> filling a buffer one byte at a time. This would be massively wasteful of
> processor time and communication bandwidth. The underlying layers will
> generally try to do block memory moves, since those are more efficient (due
> to dividing the overhead over a larger amount of data). You have the
> beginning of the right idea with "break it into 1-byte write requests", but
> this is the wrong granularity (too many context switches wastes effort).
> You want to instead provide a large enough block of data in each request to
> ensure each packet is full, and only get notified when there is at least
> that much space in the write buffer. Then you can decide how much memory
> you want to pre-fill vs. how bad it is to miss a deadline. For each chunk,
> we can estimate the TCP MTU to give us a starting point (1500 byte) and
> then add two orders of magnitude (let's say N=128kB) to drive the overhead
> error towards zero and because there are many request blocks in flight on
> the wire simultaneously(*). Finally, decide how many extra blocks you want
> libuv to be ready to transmit. For a good start, it's probably reasonable
> just to pick K=two (this would also let it use a ping-pong buffer strategy,
> rather than a ring buffer, but `malloc` is usually also just fine). Later,
> if it's not meeting requirements, you can use queuing theory to estimate
> the optimal thresholds. (it's been a few years since I took a class in
> networking, so sorry if I'm a bit hand-wavy on some of the details, and let
> me know if I missed anything in my estimation attempts)
> Finally, to tie this all together, inside the application, the goal would
> be to perform `uv_write` operations on blocks of size N whenever the
> pending count (should be feasible to manage this yourself, or look at the
> libuv field of outstanding write reqs) is below your chosen threshold on
> queue length (e.g. K=2).
> (*) Another way to derive this is the formula `buffer-size = ping-time *
> bandwidth`, which more directly and precisely estimates just how much data
> could theoretically could get removed from the buffer when the ACK packets
> return.

Yes, I thought about TCP segment size/etc, but:
- this means client has to be aware of underlying protocol details (and as
it turns out there is no way to know precisely how large your TCP segment
can be -- TCP header can have optional fields and things like VPN can get
in a way too)
- and make a correct call about granularity of buffers being submitted to
- and even if it is completely correct -- there will be waste of space as
most user-level requests (that needs to be sent out) won't fit the buffer

now compare this to a situation where application maintain one ring buffer
and simply adds to it and reuses portions released by fictional "buffer
release" notification:
- application doesn't need to be aware of underlying layer specifics -- as
data flows to network buffer gets released in whatever granularity that is
right in given circumstances
- there is no wasted space (well, maybe you'd want to consider to align
your data to avoid cpu cache line sharing)
- kernel doesn't need to notify user code about every byte written out --
all it needs is to raise a flag and maintain a "data written out" counter;
once client finally gets to process the notification -- he'll reclaim all
buffer space already sent out in one go

> On Sat, Mar 24, 2018 at 2:51 PM Michael Kilburn <crusader.m...@gmail.com>
> wrote:
>> On Sat, Mar 24, 2018 at 4:51 AM, Ben Noordhuis <i...@bnoordhuis.nl>
>> wrote:
>>> On Thu, Mar 22, 2018 at 11:32 PM, CM <crusader.m...@gmail.com> wrote:
>>> > To be more precise -- I wonder if you can get "gradual" buffer write
>>> > notifications. I.e. as OS "drains" my buffer (writes it out to the
>>> network)
>>> > I'd like to receive notifications indicating how much of the buffer was
>>> > written out. Is this possible with libuv (or Windows IOCP)?
>>> In general that's not possible.  You could hack libuv's UNIX port to
>>> give you that kind of notification but it won't work on Windows.
>> Hmm... Indeed, Unix "readiness" model (where you get notified of socket
>> being "ready", write as much as can fit and wait for next notification)
>> works very nicely here, but it leads to one extra memory copy (moving data
>> from user buffer to socket buffer). IOCP model -- not so much, but it
>> potentially enables zero-copy direct-memory protocol (where you register
>> you buffer(s) and network card reads it directly).
>> So for this to happen on Linux all I need is to "pierce" the "conversion
>> layer" libuv put on top of readiness model for it to work like IOCP model.
>> On Windows -- the only thing that comes to mind is to break every write
>> request into 1-byte write requests :-)
>> If only we had "buffer readiness" notification added to IOCP -- similar
>> to how edge-triggered epoll works... I.e. once NIC sends out some data --
>> it updates "bytes sent" counter and (unless it is already set) sets the
>> "alarm", once app receives notification about alarm -- it'll read counter,
>> "disarm" alarm and do smth with (now free) buffer. Very similar to Unix
>> readiness model, but instead of "socket buffer is ready to receive data" it
>> means "your buffer is no longer needed".

You received this message because you are subscribed to the Google Groups 
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to libuv+unsubscr...@googlegroups.com.
To post to this group, send email to libuv@googlegroups.com.
Visit this group at https://groups.google.com/group/libuv.
For more options, visit https://groups.google.com/d/optout.

Reply via email to