On Tue, Feb 22, 2011 at 3:26 PM, Nicholas Thomas <n...@bytemark.co.uk> wrote:
> On Mon, 2011-02-21 at 20:10 +0000, Stefan Hajnoczi wrote:
>> On Mon, Feb 21, 2011 at 12:37 PM, Kevin Wolf <kw...@redhat.com> wrote:
>> > Am 18.02.2011 13:55, schrieb Nick Thomas:
>> >> +retry:
>> >> +    if (do_read) {
>> >> +        ret = recvmsg(sockfd, &msg, 0);
>> >> +    } else {
>> >> +        ret = sendmsg(sockfd, &msg, 0);
>> >> +    }
>> >> +
>> >> +    /* recoverable error */
>> >> +    if (ret == -1 && (errno == EAGAIN || errno == EINTR)) {
>> >> +        goto retry;
>> >> +    }
>> >
>> > Make this a do...while loop, no reason for goto here.
>>
>> This is not asynchronous.
>>
>> If we're writing and the in-kernel socket buffer is full then we'll spin
>> retrying - this burns CPU for no reason.  Instead we should get a
>> callback when the socket becomes writable again so we can fill it with
>> more data.
>>
>> If we're reading and enough data hasn't arrived yet we will spin
>> retrying.  Again, we should set up a callback to be notified when the
>> socket becomes readable again.
>>
>> It seems sheepdog suffers from the same problem.
>
> Here, I'm a little concerned by the possibility of getting reads and
> writes from different requests interleaved - so, writing half of the
> data for one write request, returning, then getting another write
> request through and starting that before the other one is completed.
>
> Obviously, I 'just' need to write clever code that avoids that ;)
>
> Since sheepdog has the same issue, it might be worth me looking at this
> as a separate issue in both code bases, starting from a patch that gets
> nbd to the same kind of state as sheepdog in this area? The intermediate
> stage is definitely better than the starting point, after all.

Please try to solve it in the same patch series.  This is not a
limitation where we're being sub-optimal but well-behaved.  Spinning
is a bug and introducing it is a problem.

ui/vnc.c has a solution to this problem, it has a Buffer abstraction.
Data can be copied into the buffer.  It then drains the buffer to the
socket in vnc_client_write().  For reads the caller says how many
bytes they are expecting and their callback will not be invoked until
the buffer has filled at least that much.

For NBD I would look at something more like QEMUIOVector instead of
copying between a Buffer.  It should be possible to concatenate iovec
arrays that need to be transmitted without any interleaving issues.

Stefan

Reply via email to