On Mon, 2008-11-03 at 14:40 +0200, Avi Kivity wrote:
> Mark McLoughlin wrote:
> > On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote:
> >   
> >> Mark McLoughlin wrote:
> >>> The main patch in this series is 5/6 - it just kills off the
> >>> virtio_net tx mitigation timer and does all the tx I/O in the
> >>> I/O thread.
> >>>
> >>>   
> >>>       
> >> What will it do to small packet, multi-flow loads (simulated by ping -f 
> >> -l 30 $external)?
> >>     
> >
> > It should improve the latency - the packets will be flushed more quickly
> > than the 150us timeout without blocking the guest.
> >
> >   
> 
> But it will increase overhead, since suddenly we aren't queueing 
> anymore.  One vmexit per small packet.

Yes in theory, but the packet copies are acting to mitigate exits since
we don't re-enable notifications again until we're sure the ring is
empty.

With copyless, though, we'd have an unacceptable vmexit rate.

> >> Where does the benefit come from?
> >>     
> >
> > There are two things going on here, I think.
> >
> > First is that the timer affects latency, removing the timeout helps
> > that.
> >   
> 
> If the timer affects latency, then something is very wrong.  We're 
> lacking an adjustable window.
> 
> The way I see it, the notification window should be adjusted according 
> to the current workload.  If the link is idle, the window should be one 
> packet -- notify as soon as something is queued.  As the workload 
> increases, the window increases to (safety_factor * allowable_latency / 
> packet_rate).  The timer is set to allowable_latency to catch changes in 
> workload.
> 
> For example:
> 
> - allowable_latency 1ms (implies 1K vmexits/sec desired)
> - current packet_rate 20K packets/sec
> - safety_factor 0.8
> 
> So we request notifications every 0.8 * 20K * 1m = 16 packets, and set 
> the timer to 1ms.  Usually we get a notification every 16 packets, just 
> before timer expiration.  If the workload increases, we get 
> notifications sooner, so we increase the window.  If the workload drops, 
> the timer fires and we decrease the window.
> 
> The timer should never fire on an all-out benchmark, or in a ping test.

Yeah, I do like the sound of this.

However, since it requires a new guest feature and I don't expect it'll
improve the situation over the proposed patch until we have copyless
transmit, I think we should do this as part of the copyless effort.

One thing I'd worry about with this scheme is all-out receive - e.g. any
delay in returning a TCP ACK to the sending side, might cause us to hit
the TCP window size.

> > Second is that currently when we fill up the ring we block the guest
> > vcpu and flush. Thus, while we're copying a entire ring full of packets
> > that guest isn't making progress. Doing the copying in the I/O thread
> > helps there.
> >   
> 
> We're hurting our cache, and this won't work well with many nics.  At 
> the very least this should be done in a dedicated thread.

A thread per nic is doable, but it'd be especially tricky on the receive
side without more "short-cut the one producer, one consumer case" work.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to