Mark McLoughlin wrote:
But it will increase overhead, since suddenly we aren't queueing
anymore. One vmexit per small packet.
Yes in theory, but the packet copies are acting to mitigate exits since
we don't re-enable notifications again until we're sure the ring is
empty.
You mean, the guest and the copy proceed in parallel, and while they do,
exits are disabled?
With copyless, though, we'd have an unacceptable vmexit rate.
Right.
If the timer affects latency, then something is very wrong. We're
lacking an adjustable window.
The way I see it, the notification window should be adjusted according
to the current workload. If the link is idle, the window should be one
packet -- notify as soon as something is queued. As the workload
increases, the window increases to (safety_factor * allowable_latency /
packet_rate). The timer is set to allowable_latency to catch changes in
workload.
For example:
- allowable_latency 1ms (implies 1K vmexits/sec desired)
- current packet_rate 20K packets/sec
- safety_factor 0.8
So we request notifications every 0.8 * 20K * 1m = 16 packets, and set
the timer to 1ms. Usually we get a notification every 16 packets, just
before timer expiration. If the workload increases, we get
notifications sooner, so we increase the window. If the workload drops,
the timer fires and we decrease the window.
The timer should never fire on an all-out benchmark, or in a ping test.
Yeah, I do like the sound of this.
However, since it requires a new guest feature and I don't expect it'll
improve the situation over the proposed patch until we have copyless
transmit, I think we should do this as part of the copyless effort.
Hopefully copyless and this can be done in parallel. I think they have
value independently.
One thing I'd worry about with this scheme is all-out receive - e.g. any
delay in returning a TCP ACK to the sending side, might cause us to hit
the TCP window size.
Consider a real NIC, that also has latency for ACKs that is determined
by the queue length. The proposal doesn't change that, except
momentarily when transitioning from high throughput to low throughput.
In any case, latency is never more than allowable_latency (not including
time spent in the guest network stack queues, but we aren't responsible
for that).
(one day we can add a queue for acks and other high priority stuff, but
we have enough on our hands now)
We're hurting our cache, and this won't work well with many nics. At
the very least this should be done in a dedicated thread.
A thread per nic is doable, but it'd be especially tricky on the receive
side without more "short-cut the one producer, one consumer case" work.
We can start with transmit. I'm somewhat worried about further
divergence from qemu mainline (just completed a merge...).
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html