On Mon, 2008-11-03 at 14:40 +0200, Avi Kivity wrote: > Mark McLoughlin wrote: > > On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote: > > > >> Mark McLoughlin wrote: > >>> The main patch in this series is 5/6 - it just kills off the > >>> virtio_net tx mitigation timer and does all the tx I/O in the > >>> I/O thread. > >>> > >>> > >>> > >> What will it do to small packet, multi-flow loads (simulated by ping -f > >> -l 30 $external)? > >> > > > > It should improve the latency - the packets will be flushed more quickly > > than the 150us timeout without blocking the guest. > > > > > > But it will increase overhead, since suddenly we aren't queueing > anymore. One vmexit per small packet.
Yes in theory, but the packet copies are acting to mitigate exits since we don't re-enable notifications again until we're sure the ring is empty. With copyless, though, we'd have an unacceptable vmexit rate. > >> Where does the benefit come from? > >> > > > > There are two things going on here, I think. > > > > First is that the timer affects latency, removing the timeout helps > > that. > > > > If the timer affects latency, then something is very wrong. We're > lacking an adjustable window. > > The way I see it, the notification window should be adjusted according > to the current workload. If the link is idle, the window should be one > packet -- notify as soon as something is queued. As the workload > increases, the window increases to (safety_factor * allowable_latency / > packet_rate). The timer is set to allowable_latency to catch changes in > workload. > > For example: > > - allowable_latency 1ms (implies 1K vmexits/sec desired) > - current packet_rate 20K packets/sec > - safety_factor 0.8 > > So we request notifications every 0.8 * 20K * 1m = 16 packets, and set > the timer to 1ms. Usually we get a notification every 16 packets, just > before timer expiration. If the workload increases, we get > notifications sooner, so we increase the window. If the workload drops, > the timer fires and we decrease the window. > > The timer should never fire on an all-out benchmark, or in a ping test. Yeah, I do like the sound of this. However, since it requires a new guest feature and I don't expect it'll improve the situation over the proposed patch until we have copyless transmit, I think we should do this as part of the copyless effort. One thing I'd worry about with this scheme is all-out receive - e.g. any delay in returning a TCP ACK to the sending side, might cause us to hit the TCP window size. > > Second is that currently when we fill up the ring we block the guest > > vcpu and flush. Thus, while we're copying a entire ring full of packets > > that guest isn't making progress. Doing the copying in the I/O thread > > helps there. > > > > We're hurting our cache, and this won't work well with many nics. At > the very least this should be done in a dedicated thread. A thread per nic is doable, but it'd be especially tricky on the receive side without more "short-cut the one producer, one consumer case" work. Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html