Rusty Russell wrote: > On Friday 14 December 2007 23:12:05 Christian Borntraeger wrote: > >> Rusty, Anthony, Dor, >> >> I need your brain power :-) >> >> On smp guests I have seen a problem with virtio (the version in curent >> Avi's git) which do not occur on single processor guests: >> >> kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:228! >> illegal operation: 0001 [#1] >> Modules linked in: ipv6 >> CPU: 2 Not tainted >> Process swapper (pid: 0, task: 000000000f83e038, ksp: 000000000f877d70) >> Krnl PSW : 0704000180000000 000000000045df2a (vring_restart+0x5a/0x70) >> R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0 EA:3 >> Krnl GPRS: 00000000c0a80101 0000000000000000 000000000eb35000 >> 0000000010005800 000000000045ded0 000000000000192f 000000000eb21000 >> 000000000eb21000 000000000000000e 000000000eb21900 000000000eb21920 >> 000000000f867cb8 0700000000d9b058 0000000000000010 000000000045c06a >> 000000000f867cb8 Krnl Code: 000000000045df1e: e3b0b0700004 lg >> %r11,112(%r11) 000000000045df24: 07fe bcr 15,%r14 >> 000000000045df26: a7f40001 brc 15,45df28 >> >> >000000000045df2a: a7f4ffe1 brc 15,45deec >> >> 000000000045df2e: e31020300004 lg %r1,48(%r2) >> 000000000045df34: a7480000 lhi %r4,0 >> 000000000045df38: 96011001 oi 1(%r1),1 >> 000000000045df3c: a7f4ffef brc 15,45df1a >> Call Trace: >> ([<000000000045c016>] virtnet_poll+0x96/0x42c) >> [<000000000048cda2>] net_rx_action+0xca/0x150 >> [<0000000000137f7a>] __do_softirq+0x9e/0x130 >> [<00000000001105d6>] do_softirq+0xae/0xb4 >> [<0000000000138182>] irq_exit+0x96/0x9c >> [<000000000010d710>] do_extint+0xcc/0xf8 >> [<00000000001135d0>] ext_no_vtime+0x16/0x1a >> [<000000000010a57e>] cpu_idle+0x216/0x238 >> >> >> I think there is a valid code path, triggering this bug: >> >> CPU1 CPU2 >> ----------------------- ----------------------- >> - virtnet_poll found no >> more packets on queue >> - netif_rx_complete allow >> poll to be called >> - vq_ops->restart is called >> - vq Interrupts are enabled >> . <new packets arrive> >> <vcpu is scheduled away> >> . - interrupt is delivered >> . - poll is called >> . - poll work is done >> . - netif_rx_complete >> . - vq_ops->restart is called >> . - check if vq interrupts are >> . enable --> BUG >> > > Shouldn't this be BUG()ing in START_USE()? Or did you disable that because > of previous false positives? > > virtnet_poll() should not be reentering, AFAICT. virtio relies on the caller > to ensure it never calls two virtio functions on the same queue > simultaneously, and virtio_net in turn relies on the core net code to enforce > this. > > So, how did this happen? > > Looks like we can have an interrupt on one CPU, which does: > > if (vq->vq.callback && !vq->vq.callback(&vq->vq)) > vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT; > > The callback schedules the softirq to run, which usually runs on same CPU... > but if somehow it runs somewhere else, it could hit restart before the flag > is set. This means removing the BUG_ON() is bad, because we could restart, > and then set the NO_INTERRUPT flag from the handler, and stall. > > Note that interrupt suppression is advisory, so we can be sloppy on the > setting side. > > To me this points to doing interrupt suppression a different way. If we > have a ->disable_cb() virtio function, and call it before we call > netif_rx_schedule, does that fix it? > > How's this? > Rusty. > > Looks good to me. The only thing is the naming.. Maybe one can find better name than [dis|en]able_cb since it is more like disable interrupts than disable_cb and enable_cb is more like run_callbacks (and enable interrupts).
Actually while looking at it some more, there's might be a performance optimization using your new patch: Up to now, if the tx/rx paths were running out of descriptors they called enable_cb. enable_cb told the host it should trigger an irq the next time it has data for the guest. From now on, enable_cb will run the callbacks inside shortening latency. btw, I tried to apply this patch on my source tree without luck, after doing a manual merge, the guest opssed on the BUG_ON. I think it's because my sources are not aligned with yours. Can you please post a mercurial/git repo? Please specify the relatively repository in case you choose mercurial. Thanks, Dor > --- > virtio: explicit enable_cb/disable_cb rather than callback return. > > It seems that virtio_net wants to disable callbacks (interrupts) before > calling netif_rx_schedule(), so we can't use the return value to do so. > > Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback > now returns void, rather than a boolean. > > Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> > > diff -r 0eabf082c13a drivers/block/virtio_blk.c > --- a/drivers/block/virtio_blk.c Tue Dec 18 13:51:13 2007 +1100 > +++ b/drivers/block/virtio_blk.c Tue Dec 18 15:49:18 2007 +1100 > @@ -36,7 +36,7 @@ struct virtblk_req > struct virtio_blk_inhdr in_hdr; > }; > > -static bool blk_done(struct virtqueue *vq) > +static void blk_done(struct virtqueue *vq) > { > struct virtio_blk *vblk = vq->vdev->priv; > struct virtblk_req *vbr; > @@ -65,7 +65,6 @@ static bool blk_done(struct virtqueue *v > /* In case queue is stopped waiting for more buffers. */ > blk_start_queue(vblk->disk->queue); > spin_unlock_irqrestore(&vblk->lock, flags); > - return true; > } > > static bool do_req(struct request_queue *q, struct virtio_blk *vblk, > diff -r 0eabf082c13a drivers/lguest/lguest_device.c > --- a/drivers/lguest/lguest_device.c Tue Dec 18 13:51:13 2007 +1100 > +++ b/drivers/lguest/lguest_device.c Tue Dec 18 15:49:18 2007 +1100 > @@ -193,7 +193,7 @@ static void lg_notify(struct virtqueue * > * So we provide devices with a "find virtqueue and set it up" function. */ > static struct virtqueue *lg_find_vq(struct virtio_device *vdev, > unsigned index, > - bool (*callback)(struct virtqueue *vq)) > + void (*callback)(struct virtqueue *vq)) > { > struct lguest_device *ldev = to_lgdev(vdev); > struct lguest_vq_info *lvq; > diff -r 0eabf082c13a drivers/net/virtio_net.c > --- a/drivers/net/virtio_net.c Tue Dec 18 13:51:13 2007 +1100 > +++ b/drivers/net/virtio_net.c Tue Dec 18 15:49:18 2007 +1100 > @@ -59,13 +59,12 @@ static inline void vnet_hdr_to_sg(struct > sg_init_one(sg, skb_vnet_hdr(skb), sizeof(struct virtio_net_hdr)); > } > > -static bool skb_xmit_done(struct virtqueue *rvq) > +static void skb_xmit_done(struct virtqueue *rvq) > { > struct virtnet_info *vi = rvq->vdev->priv; > > /* In case we were waiting for output buffers. */ > netif_wake_queue(vi->dev); > - return true; > } > > static void receive_skb(struct net_device *dev, struct sk_buff *skb, > @@ -177,12 +176,12 @@ static void try_fill_recv(struct virtnet > vi->rvq->vq_ops->kick(vi->rvq); > } > > -static bool skb_recv_done(struct virtqueue *rvq) > +static void skb_recv_done(struct virtqueue *rvq) > { > struct virtnet_info *vi = rvq->vdev->priv; > + /* Suppress further interrupts. */ > + rvq->vq_ops->disable_cb(rvq); > netif_rx_schedule(vi->dev, &vi->napi); > - /* Suppress further interrupts. */ > - return false; > } > > static int virtnet_poll(struct napi_struct *napi, int budget) > @@ -208,7 +207,7 @@ again: > /* Out of packets? */ > if (received < budget) { > netif_rx_complete(vi->dev, napi); > - if (unlikely(!vi->rvq->vq_ops->restart(vi->rvq)) > + if (unlikely(!vi->rvq->vq_ops->enable_cb(vi->rvq)) > && netif_rx_reschedule(vi->dev, napi)) > goto again; > } > diff -r 0eabf082c13a drivers/virtio/virtio_pci.c > --- a/drivers/virtio/virtio_pci.c Tue Dec 18 13:51:13 2007 +1100 > +++ b/drivers/virtio/virtio_pci.c Tue Dec 18 15:49:18 2007 +1100 > @@ -190,7 +190,7 @@ static irqreturn_t vp_interrupt(int irq, > > /* the config->find_vq() implementation */ > static struct virtqueue *vp_find_vq(struct virtio_device *vdev, unsigned > index, > - bool (*callback)(struct virtqueue *vq)) > + void (*callback)(struct virtqueue *vq)) > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > struct virtio_pci_vq_info *info; > diff -r 0eabf082c13a drivers/virtio/virtio_ring.c > --- a/drivers/virtio/virtio_ring.c Tue Dec 18 13:51:13 2007 +1100 > +++ b/drivers/virtio/virtio_ring.c Tue Dec 18 15:49:18 2007 +1100 > @@ -220,7 +220,17 @@ static void *vring_get_buf(struct virtqu > return ret; > } > > -static bool vring_restart(struct virtqueue *_vq) > +static void vring_disable_cb(struct virtqueue *_vq) > +{ > + struct vring_virtqueue *vq = to_vvq(_vq); > + > + START_USE(vq); > + BUG_ON(vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT); > + vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT; > + END_USE(vq); > +} > + > +static bool vring_enable_cb(struct virtqueue *_vq) > { > struct vring_virtqueue *vq = to_vvq(_vq); > > @@ -254,8 +264,8 @@ irqreturn_t vring_interrupt(int irq, voi > return IRQ_HANDLED; > > pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback); > - if (vq->vq.callback && !vq->vq.callback(&vq->vq)) > - vq->vring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT; > + if (vq->vq.callback) > + vq->vq.callback(&vq->vq); > > return IRQ_HANDLED; > } > @@ -266,7 +276,8 @@ static struct virtqueue_ops vring_vq_ops > .add_buf = vring_add_buf, > .get_buf = vring_get_buf, > .kick = vring_kick, > - .restart = vring_restart, > + .disable_cb = vring_disable_cb, > + .enable_cb = vring_enable_cb, > .shutdown = vring_shutdown, > }; > > @@ -274,7 +285,7 @@ struct virtqueue *vring_new_virtqueue(un > struct virtio_device *vdev, > void *pages, > void (*notify)(struct virtqueue *), > - bool (*callback)(struct virtqueue *)) > + void (*callback)(struct virtqueue *)) > { > struct vring_virtqueue *vq; > unsigned int i; > diff -r 0eabf082c13a include/linux/virtio.h > --- a/include/linux/virtio.h Tue Dec 18 13:51:13 2007 +1100 > +++ b/include/linux/virtio.h Tue Dec 18 15:49:18 2007 +1100 > @@ -11,15 +11,13 @@ > /** > * virtqueue - a queue to register buffers for sending or receiving. > * @callback: the function to call when buffers are consumed (can be NULL). > - * If this returns false, callbacks are suppressed until vq_ops->restart > - * is called. > * @vdev: the virtio device this queue was created for. > * @vq_ops: the operations for this virtqueue (see below). > * @priv: a pointer for the virtqueue implementation to use. > */ > struct virtqueue > { > - bool (*callback)(struct virtqueue *vq); > + void (*callback)(struct virtqueue *vq); > struct virtio_device *vdev; > struct virtqueue_ops *vq_ops; > void *priv; > @@ -41,7 +39,9 @@ struct virtqueue > * vq: the struct virtqueue we're talking about. > * len: the length written into the buffer > * Returns NULL or the "data" token handed to add_buf. > - * @restart: restart callbacks after callback returned false. > + * @disable_cb: disable callbacks > + * vq: the struct virtqueue we're talking about. > + * @enable_cb: restart callbacks after disable_cb. > * vq: the struct virtqueue we're talking about. > * This returns "false" (and doesn't re-enable) if there are pending > * buffers in the queue, to avoid a race. > @@ -65,7 +65,8 @@ struct virtqueue_ops { > > void *(*get_buf)(struct virtqueue *vq, unsigned int *len); > > - bool (*restart)(struct virtqueue *vq); > + void (*disable_cb)(struct virtqueue *vq); > + bool (*enable_cb)(struct virtqueue *vq); > > void (*shutdown)(struct virtqueue *vq); > }; > diff -r 0eabf082c13a include/linux/virtio_config.h > --- a/include/linux/virtio_config.h Tue Dec 18 13:51:13 2007 +1100 > +++ b/include/linux/virtio_config.h Tue Dec 18 15:49:18 2007 +1100 > @@ -61,7 +61,7 @@ struct virtio_config_ops > void (*set_status)(struct virtio_device *vdev, u8 status); > struct virtqueue *(*find_vq)(struct virtio_device *vdev, > unsigned index, > - bool (*callback)(struct virtqueue *)); > + void (*callback)(struct virtqueue *)); > void (*del_vq)(struct virtqueue *vq); > }; > > diff -r 0eabf082c13a include/linux/virtio_ring.h > --- a/include/linux/virtio_ring.h Tue Dec 18 13:51:13 2007 +1100 > +++ b/include/linux/virtio_ring.h Tue Dec 18 15:49:18 2007 +1100 > @@ -114,7 +114,7 @@ struct virtqueue *vring_new_virtqueue(un > struct virtio_device *vdev, > void *pages, > void (*notify)(struct virtqueue *vq), > - bool (*callback)(struct virtqueue *vq)); > + void (*callback)(struct virtqueue *vq)); > void vring_del_virtqueue(struct virtqueue *vq); > > irqreturn_t vring_interrupt(int irq, void *_vq); > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel