On Sun, Mar 25, 2018 at 2:52 PM, Jameson Nash <vtjn...@gmail.com> wrote:
> There is likely performance reasons the IOCP model is more efficient (e.g > faster / higher bandwidth utilization), even if you have to emulate it via > epoll. Even where you could get gradual notifications, you don't want to be > filling a buffer one byte at a time. This would be massively wasteful of > processor time and communication bandwidth. The underlying layers will > generally try to do block memory moves, since those are more efficient (due > to dividing the overhead over a larger amount of data). You have the > beginning of the right idea with "break it into 1-byte write requests", but > this is the wrong granularity (too many context switches wastes effort). > You want to instead provide a large enough block of data in each request to > ensure each packet is full, and only get notified when there is at least > that much space in the write buffer. Then you can decide how much memory > you want to pre-fill vs. how bad it is to miss a deadline. For each chunk, > we can estimate the TCP MTU to give us a starting point (1500 byte) and > then add two orders of magnitude (let's say N=128kB) to drive the overhead > error towards zero and because there are many request blocks in flight on > the wire simultaneously(*). Finally, decide how many extra blocks you want > libuv to be ready to transmit. For a good start, it's probably reasonable > just to pick K=two (this would also let it use a ping-pong buffer strategy, > rather than a ring buffer, but `malloc` is usually also just fine). Later, > if it's not meeting requirements, you can use queuing theory to estimate > the optimal thresholds. (it's been a few years since I took a class in > networking, so sorry if I'm a bit hand-wavy on some of the details, and let > me know if I missed anything in my estimation attempts) > > Finally, to tie this all together, inside the application, the goal would > be to perform `uv_write` operations on blocks of size N whenever the > pending count (should be feasible to manage this yourself, or look at the > libuv field of outstanding write reqs) is below your chosen threshold on > queue length (e.g. K=2). > > (*) Another way to derive this is the formula `buffer-size = ping-time * > bandwidth`, which more directly and precisely estimates just how much data > could theoretically could get removed from the buffer when the ACK packets > return. > Yes, I thought about TCP segment size/etc, but: - this means client has to be aware of underlying protocol details (and as it turns out there is no way to know precisely how large your TCP segment can be -- TCP header can have optional fields and things like VPN can get in a way too) - and make a correct call about granularity of buffers being submitted to IOCP - and even if it is completely correct -- there will be waste of space as most user-level requests (that needs to be sent out) won't fit the buffer perfectly now compare this to a situation where application maintain one ring buffer and simply adds to it and reuses portions released by fictional "buffer release" notification: - application doesn't need to be aware of underlying layer specifics -- as data flows to network buffer gets released in whatever granularity that is right in given circumstances - there is no wasted space (well, maybe you'd want to consider to align your data to avoid cpu cache line sharing) - kernel doesn't need to notify user code about every byte written out -- all it needs is to raise a flag and maintain a "data written out" counter; once client finally gets to process the notification -- he'll reclaim all buffer space already sent out in one go > On Sat, Mar 24, 2018 at 2:51 PM Michael Kilburn <crusader.m...@gmail.com> > wrote: > >> On Sat, Mar 24, 2018 at 4:51 AM, Ben Noordhuis <i...@bnoordhuis.nl> >> wrote: >> >>> On Thu, Mar 22, 2018 at 11:32 PM, CM <crusader.m...@gmail.com> wrote: >>> > To be more precise -- I wonder if you can get "gradual" buffer write >>> > notifications. I.e. as OS "drains" my buffer (writes it out to the >>> network) >>> > I'd like to receive notifications indicating how much of the buffer was >>> > written out. Is this possible with libuv (or Windows IOCP)? >>> >>> In general that's not possible. You could hack libuv's UNIX port to >>> give you that kind of notification but it won't work on Windows. >>> >> >> Hmm... Indeed, Unix "readiness" model (where you get notified of socket >> being "ready", write as much as can fit and wait for next notification) >> works very nicely here, but it leads to one extra memory copy (moving data >> from user buffer to socket buffer). IOCP model -- not so much, but it >> potentially enables zero-copy direct-memory protocol (where you register >> you buffer(s) and network card reads it directly). >> >> So for this to happen on Linux all I need is to "pierce" the "conversion >> layer" libuv put on top of readiness model for it to work like IOCP model. >> On Windows -- the only thing that comes to mind is to break every write >> request into 1-byte write requests :-) >> >> If only we had "buffer readiness" notification added to IOCP -- similar >> to how edge-triggered epoll works... I.e. once NIC sends out some data -- >> it updates "bytes sent" counter and (unless it is already set) sets the >> "alarm", once app receives notification about alarm -- it'll read counter, >> "disarm" alarm and do smth with (now free) buffer. Very similar to Unix >> readiness model, but instead of "socket buffer is ready to receive data" it >> means "your buffer is no longer needed". >> > -- You received this message because you are subscribed to the Google Groups "libuv" group. To unsubscribe from this group and stop receiving emails from it, send an email to libuv+unsubscr...@googlegroups.com. To post to this group, send email to libuv@googlegroups.com. Visit this group at https://groups.google.com/group/libuv. For more options, visit https://groups.google.com/d/optout.