On Fri, May 27, 2016 at 10:56:44AM -0700, Cong Wang wrote: > > Commit 412ca1550cbecb2c ("macvlan: Move broadcasts into a work queue") > > moved processing of all macvlan multicasts into a work queue. This > > causes a noticable performance regression when there is heavy multicast > > traffic on the underlying interface for multicast groups that the > > macvlan subinterfaces are not members of, in which case we end up > > cloning all those packets and then freeing them again from a work queue > > without really doing any useful work with them in between. > > But we only queue up to 1000 packets in our backlog. > > How about adding a quick check before cloning it? > > diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c > index cb01023..1c73d0f 100644 > --- a/drivers/net/macvlan.c > +++ b/drivers/net/macvlan.c > @@ -315,6 +315,9 @@ static void macvlan_broadcast_enqueue(struct > macvlan_port *port, > struct sk_buff *nskb; > int err = -ENOMEM; > > + if (skb_queue_len(&port->bc_queue) >= MACVLAN_BC_QUEUE_LEN) > + return; > + > nskb = skb_clone(skb, GFP_ATOMIC); > if (!nskb) > goto err;
We're not hitting the bc_queue skb limit in our environment, as the machine can keep up with the traffic -- it's just that taking an extra clone of the skb and queueing and running the work queue item to free it again is eating up a lot of cycles. But doing the queue length check before the clone might not be a bad idea? (You'd probably want to atomic_long_inc(&skb->dev->rx_dropped) before returning, though?)