On Tue, Feb 22, 2011 at 3:26 PM, Nicholas Thomas <n...@bytemark.co.uk> wrote: > On Mon, 2011-02-21 at 20:10 +0000, Stefan Hajnoczi wrote: >> On Mon, Feb 21, 2011 at 12:37 PM, Kevin Wolf <kw...@redhat.com> wrote: >> > Am 18.02.2011 13:55, schrieb Nick Thomas: >> >> +retry: >> >> + if (do_read) { >> >> + ret = recvmsg(sockfd, &msg, 0); >> >> + } else { >> >> + ret = sendmsg(sockfd, &msg, 0); >> >> + } >> >> + >> >> + /* recoverable error */ >> >> + if (ret == -1 && (errno == EAGAIN || errno == EINTR)) { >> >> + goto retry; >> >> + } >> > >> > Make this a do...while loop, no reason for goto here. >> >> This is not asynchronous. >> >> If we're writing and the in-kernel socket buffer is full then we'll spin >> retrying - this burns CPU for no reason. Instead we should get a >> callback when the socket becomes writable again so we can fill it with >> more data. >> >> If we're reading and enough data hasn't arrived yet we will spin >> retrying. Again, we should set up a callback to be notified when the >> socket becomes readable again. >> >> It seems sheepdog suffers from the same problem. > > Here, I'm a little concerned by the possibility of getting reads and > writes from different requests interleaved - so, writing half of the > data for one write request, returning, then getting another write > request through and starting that before the other one is completed. > > Obviously, I 'just' need to write clever code that avoids that ;) > > Since sheepdog has the same issue, it might be worth me looking at this > as a separate issue in both code bases, starting from a patch that gets > nbd to the same kind of state as sheepdog in this area? The intermediate > stage is definitely better than the starting point, after all.
Please try to solve it in the same patch series. This is not a limitation where we're being sub-optimal but well-behaved. Spinning is a bug and introducing it is a problem. ui/vnc.c has a solution to this problem, it has a Buffer abstraction. Data can be copied into the buffer. It then drains the buffer to the socket in vnc_client_write(). For reads the caller says how many bytes they are expecting and their callback will not be invoked until the buffer has filled at least that much. For NBD I would look at something more like QEMUIOVector instead of copying between a Buffer. It should be possible to concatenate iovec arrays that need to be transmitted without any interleaving issues. Stefan