On 07/08/2015 06:50 PM, Stefan Hajnoczi wrote: > On Tue, Jul 07, 2015 at 04:45:41PM +0800, Jason Wang wrote: >> >> On 07/06/2015 11:21 PM, Stefan Hajnoczi wrote: >>> On Mon, Jul 06, 2015 at 11:32:25AM +0800, Jason Wang wrote: >>>> On 07/02/2015 08:46 PM, Stefan Hajnoczi wrote: >>>>> On Tue, Jun 30, 2015 at 04:35:24PM +0800, Jason Wang wrote: >>>>>> On 06/30/2015 11:06 AM, Fam Zheng wrote: >>>>>>> virtio_net_receive still does the check by calling >>>>>>> virtio_net_can_receive, if the device or driver is not ready, the packet >>>>>>> is dropped. >>>>>>> >>>>>>> This is necessary because returning false from can_receive complicates >>>>>>> things: the peer would disable sending until we explicitly flush the >>>>>>> queue. >>>>>>> >>>>>>> Signed-off-by: Fam Zheng <f...@redhat.com> >>>>>>> --- >>>>>>> hw/net/virtio-net.c | 1 - >>>>>>> 1 file changed, 1 deletion(-) >>>>>>> >>>>>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >>>>>>> index d728233..dbef0d0 100644 >>>>>>> --- a/hw/net/virtio-net.c >>>>>>> +++ b/hw/net/virtio-net.c >>>>>>> @@ -1503,7 +1503,6 @@ static int virtio_net_load_device(VirtIODevice >>>>>>> *vdev, QEMUFile *f, >>>>>>> static NetClientInfo net_virtio_info = { >>>>>>> .type = NET_CLIENT_OPTIONS_KIND_NIC, >>>>>>> .size = sizeof(NICState), >>>>>>> - .can_receive = virtio_net_can_receive, >>>>>>> .receive = virtio_net_receive, >>>>>>> .link_status_changed = virtio_net_set_link_status, >>>>>>> .query_rx_filter = virtio_net_query_rxfilter, >>>>>> A side effect of this patch is it will read and then drop packet is >>>>>> guest driver is no ok. >>>>> I think that the semantics of .can_receive() and .receive() return >>>>> values are currently incorrect in many NICs. They have .can_receive() >>>>> functions that return false for conditions where .receive() would >>>>> discard the packet. So what happens is that packets get queued when >>>>> they should actually be discarded. >>>> Yes, but they are bugs more or less. >>>> >>>>> The purpose of the flow control (queuing) mechanism is to tell the >>>>> sender to hold off until the receiver has more rx buffers available. >>>>> It's a short-term thing that doesn't included link down, rx disable, or >>>>> NIC reset states. >>>>> >>>>> Therefore, I think this patch will not introduce a regression. It is >>>>> adjusting the code to stop queuing packets when they should actually be >>>>> dropped. >>>>> >>>>> Thoughts? >>>> I agree there's no functional issue. But it cause wasting of cpu cycles >>>> (consider guest is being flooded). Sometime it maybe even dangerous. For >>>> tap, we're probably ok since we have 756ae78b but for other backend, we >>>> don't. >>> If the guest uses iptables rules or other mechanisms to drop bogus >>> packets the cost is even higher than discarding them at the QEMU layer. >> But it was the choice of guest. >> >>> What's more is that if you're using link down as a DoS mitigation >>> strategy then you might as well hot unplug the NIC. >>> >>> Stefan >> I think there're two problems for virtio-net: >> >> 1) mitigation method when guest driver is ok. For tx, we have either >> timer or bh, for rx and only for tap, we have 756ae78b. We probably need >> fixes for other backends. > I agree. It's not directly related to the change Fam is proposing > though. > >> 2) when driver is not ok, the point is we should not poll the backend at >> all (I believe this is one of the main objects of main loop). Something >> like tap_can_send() and the commit that drops tap_can_send() all follow >> this rule. But this patch does not, we end up with: >> >> - driver is not ok or no driver, qemu keep reading and dropping packets. > It's correct from a functionality perspective. When rx is disabled or > the link is down, packets should be dropped. > > From a performance perspective I agree it would be better to drop > packets earlier in the stack. > > If you want tap to drop packets efficiently during link down/receive > disabled/etc: > > What's the best way to disable tap rx so the host kernel drops packets > instead of queuing them?
There's no such API currently especially for a unprivileged qemu. But tuntap will drop the packet if it exceeds tx_queue_length. This looks still much better than current behaviour. > >> - driver is ok but not enough rx buffer, qemu will disable tap read poll. > QEMU relies on the kernel's socket receive buffer or netif receive > queue to hold incoming packets. When QEMU enables tap read poll again > it receives the queued packets. > > Queuing is necessary so USB net emulation (the rx buffer is only 1 > packet!) and maybe slirp work. Yes. > > I think real NICs drop packets when the rx ring is exhausted, but we > need to keep the queuing behavior unless someone proposes a way to > eliminate it. Yes, queueing seems useless if backend can do it. So I'm ok with the patch, and we can optimize in the future. Thanks