Mark McLoughlin wrote:
But it will increase overhead, since suddenly we aren't queueing anymore. One vmexit per small packet.

Yes in theory, but the packet copies are acting to mitigate exits since
we don't re-enable notifications again until we're sure the ring is
empty.

You mean, the guest and the copy proceed in parallel, and while they do, exits are disabled?

With copyless, though, we'd have an unacceptable vmexit rate.

Right.

If the timer affects latency, then something is very wrong. We're lacking an adjustable window.

The way I see it, the notification window should be adjusted according to the current workload. If the link is idle, the window should be one packet -- notify as soon as something is queued. As the workload increases, the window increases to (safety_factor * allowable_latency / packet_rate). The timer is set to allowable_latency to catch changes in workload.

For example:

- allowable_latency 1ms (implies 1K vmexits/sec desired)
- current packet_rate 20K packets/sec
- safety_factor 0.8

So we request notifications every 0.8 * 20K * 1m = 16 packets, and set the timer to 1ms. Usually we get a notification every 16 packets, just before timer expiration. If the workload increases, we get notifications sooner, so we increase the window. If the workload drops, the timer fires and we decrease the window.

The timer should never fire on an all-out benchmark, or in a ping test.

Yeah, I do like the sound of this.

However, since it requires a new guest feature and I don't expect it'll
improve the situation over the proposed patch until we have copyless
transmit, I think we should do this as part of the copyless effort.

Hopefully copyless and this can be done in parallel. I think they have value independently.

One thing I'd worry about with this scheme is all-out receive - e.g. any
delay in returning a TCP ACK to the sending side, might cause us to hit
the TCP window size.

Consider a real NIC, that also has latency for ACKs that is determined by the queue length. The proposal doesn't change that, except momentarily when transitioning from high throughput to low throughput.

In any case, latency is never more than allowable_latency (not including time spent in the guest network stack queues, but we aren't responsible for that).

(one day we can add a queue for acks and other high priority stuff, but we have enough on our hands now)

We're hurting our cache, and this won't work well with many nics. At the very least this should be done in a dedicated thread.

A thread per nic is doable, but it'd be especially tricky on the receive
side without more "short-cut the one producer, one consumer case" work.

We can start with transmit. I'm somewhat worried about further divergence from qemu mainline (just completed a merge...).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to