Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-09-29 Thread Michael S. Tsirkin
On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote:
> On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
> > > > > > >
> > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn 
> > > > > > > > > > wrote:
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang 
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > With the implementation of napi-tx in virtio 
> > > > > > > > > > > > > > > driver, we clean tx
> > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose 
> > > > > > > > > > > > > > > of reducing tx
> > > > > > > > > > > > > > > complete interrupts. But this could introduce a 
> > > > > > > > > > > > > > > race where tx complete
> > > > > > > > > > > > > > > interrupt has been raised, but the handler found 
> > > > > > > > > > > > > > > there is no work to do
> > > > > > > > > > > > > > > because we have done the work in the previous rx 
> > > > > > > > > > > > > > > interrupt handler.
> > > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting 
> > > > > > > > > > > > > > > with the
> > > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not 
> > > > > > > > > > > > > > > tainted
> > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > > [ 3588.017942]  
> > > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > > [ 3588.017966]  
> > > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > > [ 3588.020374] [<1b9f1da8>] 
> > > > > > > > > > > > > > > vring_interrupt
> > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch adds a new param to struct 
> > > > > > > > > > > > > > > vring_virtqueue, and we set it for
> > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress 
> > > > > > > > > > > > > > > the warning in such
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx 
> > > > > > > > > > > > > > > descriptors from rx napi")
> > > > > > > > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > > only triggers after an interrupt is unhandled 
> > > > > > > > > > > > > > repeatedly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So something causes a storm of useless tx 
> > > > > > > > > > > > > > interrupts here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's find out what it was please. What you are 
> > > > > > > > > > > > > > doing is
> > > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The traffic that causes this warning is a netperf 
> > > > > > 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-09-29 Thread Michael S. Tsirkin
On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > >  wrote:
> > > >
> > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
> > > > >
> > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn 
> > > > > > > > wrote:
> > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, 
> > > > > > > > > > > > > we clean tx
> > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of 
> > > > > > > > > > > > > reducing tx
> > > > > > > > > > > > > complete interrupts. But this could introduce a race 
> > > > > > > > > > > > > where tx complete
> > > > > > > > > > > > > interrupt has been raised, but the handler found 
> > > > > > > > > > > > > there is no work to do
> > > > > > > > > > > > > because we have done the work in the previous rx 
> > > > > > > > > > > > > interrupt handler.
> > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with 
> > > > > > > > > > > > > the
> > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not 
> > > > > > > > > > > > > tainted
> > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > [ 3588.017942]  
> > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > [ 3588.017966]  
> > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch adds a new param to struct 
> > > > > > > > > > > > > vring_virtqueue, and we set it for
> > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the 
> > > > > > > > > > > > > warning in such
> > > > > > > > > > > > > case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx 
> > > > > > > > > > > > > descriptors from rx napi")
> > > > > > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > >
> > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > only triggers after an interrupt is unhandled 
> > > > > > > > > > > > repeatedly.
> > > > > > > > > > > >
> > > > > > > > > > > > So something causes a storm of useless tx interrupts 
> > > > > > > > > > > > here.
> > > > > > > > > > > >
> > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > >
> > > > > > > > > > > The traffic that causes this warning is a netperf 
> > > > > > > > > > > tcp_stream with at
> > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets 
> > > > > > > > > > > triggered on the
> > > > > > > > > > > receiving host, which has a lot of rx interrupts firing 
> > > > > > > > > > > on all queues,
> > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > And I think the scenario is: when the tx interrupt gets 
> > > > > > > > > > > fired, it gets
> > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx 
> > > > > > > > > > > interrupts
> > > > > > > > > > > get triggered very close to 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread Michael S. Tsirkin
On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
>  wrote:
> >
> > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
> > >
> > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > With the implementation of napi-tx in virtio driver, we 
> > > > > > > > > > > clean tx
> > > > > > > > > > > descriptors from rx napi handler, for the purpose of 
> > > > > > > > > > > reducing tx
> > > > > > > > > > > complete interrupts. But this could introduce a race 
> > > > > > > > > > > where tx complete
> > > > > > > > > > > interrupt has been raised, but the handler found there is 
> > > > > > > > > > > no work to do
> > > > > > > > > > > because we have done the work in the previous rx 
> > > > > > > > > > > interrupt handler.
> > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > [ 3588.017942]  
> > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > [ 3588.017966]  
> > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > >
> > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, 
> > > > > > > > > > > and we set it for
> > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the 
> > > > > > > > > > > warning in such
> > > > > > > > > > > case.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors 
> > > > > > > > > > > from rx napi")
> > > > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This description does not make sense to me.
> > > > > > > > > >
> > > > > > > > > > irq X: nobody cared
> > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > >
> > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > >
> > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > just preventing linux from complaining.
> > > > > > > > >
> > > > > > > > > The traffic that causes this warning is a netperf tcp_stream 
> > > > > > > > > with at
> > > > > > > > > least 128 flows between 2 hosts. And the warning gets 
> > > > > > > > > triggered on the
> > > > > > > > > receiving host, which has a lot of rx interrupts firing on 
> > > > > > > > > all queues,
> > > > > > > > > and a few tx interrupts.
> > > > > > > > > And I think the scenario is: when the tx interrupt gets 
> > > > > > > > > fired, it gets
> > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx 
> > > > > > > > > interrupts
> > > > > > > > > get triggered very close to each other, and gets handled in 
> > > > > > > > > one round
> > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which 
> > > > > > > > > calls
> > > > > > > > > virtnet_poll(). However, virtnet_poll() calls 
> > > > > > > > > virtnet_poll_cleantx()
> > > > > > > > > to try to do the work on the corresponding tx queue as well. 
> > > > > > > > > That's
> > > > > > > > > why when tx interrupt handler gets called, it sees no work to 
> > > > > > > > > do.
> > > > > > > > > And the reason for the rx handler to handle the tx work is 
> > > > > > > > > here:
> > > > > > > > > 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread Michael S. Tsirkin
On Mon, Apr 12, 2021 at 04:14:58PM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" 
> Date: Mon, 12 Apr 2021 18:33:45 -0400
> 
> > On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
> >> OK I started looking at this again. My idea is simple.
> >> A. disable callbacks before we try to drain skbs
> >> B. actually do disable callbacks even with event idx
> >> 
> >> To make B not regress, we need to
> >> C. detect the common case of disable after event triggering and skip the 
> >> write then.
> >> 
> >> I added a new event_triggered flag for that.
> >> Completely untested - but then I could not see the warnings either.
> >> Would be very much interested to know whether this patch helps
> >> resolve the sruprious interrupt problem at all ...
> >> 
> >> 
> >> Signed-off-by: Michael S. Tsirkin 
> > 
> > Hmm a slightly cleaner alternative is to clear the flag when enabling 
> > interrupts ...
> > I wonder which cacheline it's best to use for this.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Please make a fresh new submission if you want to use this approach, thanks.

Absolutely. This is untested so I just sent this idea out for early feedback
and hopefully help with testing on real hardware.
Sorry about being unclear.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Mon, 12 Apr 2021 18:33:45 -0400

> On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
>> OK I started looking at this again. My idea is simple.
>> A. disable callbacks before we try to drain skbs
>> B. actually do disable callbacks even with event idx
>> 
>> To make B not regress, we need to
>> C. detect the common case of disable after event triggering and skip the 
>> write then.
>> 
>> I added a new event_triggered flag for that.
>> Completely untested - but then I could not see the warnings either.
>> Would be very much interested to know whether this patch helps
>> resolve the sruprious interrupt problem at all ...
>> 
>> 
>> Signed-off-by: Michael S. Tsirkin 
> 
> Hmm a slightly cleaner alternative is to clear the flag when enabling 
> interrupts ...
> I wonder which cacheline it's best to use for this.
> 
> Signed-off-by: Michael S. Tsirkin 

Please make a fresh new submission if you want to use this approach, thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread Michael S. Tsirkin
On Mon, Apr 12, 2021 at 06:08:21PM -0400, Michael S. Tsirkin wrote:
> OK I started looking at this again. My idea is simple.
> A. disable callbacks before we try to drain skbs
> B. actually do disable callbacks even with event idx
> 
> To make B not regress, we need to
> C. detect the common case of disable after event triggering and skip the 
> write then.
> 
> I added a new event_triggered flag for that.
> Completely untested - but then I could not see the warnings either.
> Would be very much interested to know whether this patch helps
> resolve the sruprious interrupt problem at all ...
> 
> 
> Signed-off-by: Michael S. Tsirkin 

Hmm a slightly cleaner alternative is to clear the flag when enabling 
interrupts ...
I wonder which cacheline it's best to use for this.

Signed-off-by: Michael S. Tsirkin 


diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..c23341b18eb5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;
 
if (__netif_tx_trylock(txq)) {
+   virtqueue_disable_cb(vq);
free_old_xmit_skbs(sq, true);
__netif_tx_unlock(txq);
}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..88f0b16b11b8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
+   /* Hint for event idx: already triggered no need to disable. */
+   bool event_triggered;
+
union {
/* Available for split ring */
struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue 
*_vq)
 
if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-   if (!vq->event)
+   if (vq->event)
+   /* TODO: this is a hack. Figure out a cleaner value to 
write. */
+   vring_used_event(>split.vring) = 0x0;
+   else
vq->split.vring.avail->flags =
cpu_to_virtio16(_vq->vdev,
vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->packed_ring = true;
vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   /* If device triggered an event already it won't trigger one again:
+* no need to disable.
+*/
+   if (vq->event_triggered)
+   return;
+
if (vq->packed_ring)
virtqueue_disable_cb_packed(_vq);
else
@@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
*_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (vq->event_triggered)
+   vq->event_triggered = false;
+
return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
 virtqueue_enable_cb_prepare_split(_vq);
 }
@@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (vq->event_triggered)
+   vq->event_triggered = false;
+
return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
 virtqueue_enable_cb_delayed_split(_vq);
 }
@@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
if (unlikely(vq->broken))
return IRQ_HANDLED;
 
+   /* Just a hint for performance: so it's ok that this can be racy! */
+   if (vq->event)
+   vq->event_triggered = true;
+
pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
if (vq->vq.callback)
vq->vq.callback(>vq);
@@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-04-12 Thread Michael S. Tsirkin
OK I started looking at this again. My idea is simple.
A. disable callbacks before we try to drain skbs
B. actually do disable callbacks even with event idx

To make B not regress, we need to
C. detect the common case of disable after event triggering and skip the write 
then.

I added a new event_triggered flag for that.
Completely untested - but then I could not see the warnings either.
Would be very much interested to know whether this patch helps
resolve the sruprious interrupt problem at all ...


Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..a91a2d6d1ee3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;
 
if (__netif_tx_trylock(txq)) {
+   virtqueue_disable_cb(vq);
free_old_xmit_skbs(sq, true);
__netif_tx_unlock(txq);
}
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..213bfe8b6051 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
+   /* Hint for event idx: already triggered no need to disable. */
+   bool event_triggered;
+
union {
/* Available for split ring */
struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue 
*_vq)
 
if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-   if (!vq->event)
+   if (vq->event)
+   /* TODO: this is a hack. Figure out a cleaner value to 
write. */
+   vring_used_event(>split.vring) = 0x0;
+   else
vq->split.vring.avail->flags =
cpu_to_virtio16(_vq->vdev,
vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->packed_ring = true;
vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,14 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   /* If device triggered an event already it won't trigger one again:
+* no need to disable.
+*/
+   if (vq->event_triggered) {
+   vq->event_triggered = false;
+   return;
+   }
+
if (vq->packed_ring)
virtqueue_disable_cb_packed(_vq);
else
@@ -2044,6 +2059,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
if (unlikely(vq->broken))
return IRQ_HANDLED;
 
+   /* Just a hint for performance: so it's ok that this can be racy! */
+   if (vq->event)
+   vq->event_triggered = true;
+
pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
if (vq->vq.callback)
vq->vq.callback(>vq);
@@ -2083,6 +2102,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-21 Thread Michael S. Tsirkin
On Thu, Feb 18, 2021 at 01:39:19PM +0800, Jason Wang wrote:
> 
> On 2021/2/10 下午5:14, Michael S. Tsirkin wrote:
> > On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
> > > On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
> > >  wrote:
> > > > > > > > I have no preference. Just curious, especially if it 
> > > > > > > > complicates the patch.
> > > > > > > > 
> > > > > > > My understanding is that. It's probably ok for net. But we 
> > > > > > > probably need
> > > > > > > to document the assumptions to make sure it was not abused in 
> > > > > > > other drivers.
> > > > > > > 
> > > > > > > Introduce new parameters for find_vqs() can help to eliminate the 
> > > > > > > subtle
> > > > > > > stuffs but I agree it looks like a overkill.
> > > > > > > 
> > > > > > > (Btw, I forget the numbers but wonder how much difference if we 
> > > > > > > simple
> > > > > > > remove the free_old_xmits() from the rx NAPI path?)
> > > > > > The committed patchset did not record those numbers, but I found 
> > > > > > them
> > > > > > in an earlier iteration:
> > > > > > 
> > > > > > [PATCH net-next 0/3] virtio-net tx napi
> > > > > > https://lists.openwall.net/netdev/2017/04/02/55
> > > > > > 
> > > > > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > > > > time. For instance:
> > > > > > 
> > > > > >   TCP_RR Latency (us):
> > > > > >   1x:
> > > > > > p50  24   24   21
> > > > > > p99  27   27   27
> > > > > > Gcycles 299  432  308
> > > > > > 
> > > > > > I'm concerned that removing it now may cause a regression report in 
> > > > > > a
> > > > > > few months. That is higher risk than the spurious interrupt warning
> > > > > > that was only reported after years of use.
> > > > > 
> > > > > Right.
> > > > > 
> > > > > So if Michael is fine with this approach, I'm ok with it. But we
> > > > > probably need to a TODO to invent the interrupt handlers that can be
> > > > > used for more than one virtqueues. When MSI-X is enabled, the 
> > > > > interrupt
> > > > > handler (vring_interrup()) assumes the interrupt is used by a single
> > > > > virtqueue.
> > > > Thanks.
> > > > 
> > > > The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> > > > cleaning directly in this rx-napi function was not effective at
> > > > suppressing the warning, I understand.
> > > Correct. I tried the approach to schedule tx napi instead of directly
> > > do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
> > > still happens.
> > Two questions here: is the device using packed or split vqs?
> > And is event index enabled?
> > 
> > I think one issue is that at the moment with split and event index we
> > don't actually disable events at all.
> 
> 
> Do we really have a way to disable that? (We don't have a flag like packed
> virtqueue)
> 
> Or you mean the trick [1] when I post tx interrupt RFC?
> 
> Thanks
> 
> [1] https://lkml.org/lkml/2015/2/9/113

Something like this. Or basically any other value will do,
e.g. move the index back to a value just signalled ...

> 
> > 
> > static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > {
> >  struct vring_virtqueue *vq = to_vvq(_vq);
> > 
> >  if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> >  vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> >  if (!vq->event)
> >  vq->split.vring.avail->flags =
> >  cpu_to_virtio16(_vq->vdev,
> >  
> > vq->split.avail_flags_shadow);
> >  }
> > }
> > 
> > Can you try your napi patch + disable event index?
> > 
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-17 Thread Jason Wang


On 2021/2/10 下午5:14, Michael S. Tsirkin wrote:

On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:

On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
 wrote:

I have no preference. Just curious, especially if it complicates the patch.


My understanding is that. It's probably ok for net. But we probably need
to document the assumptions to make sure it was not abused in other drivers.

Introduce new parameters for find_vqs() can help to eliminate the subtle
stuffs but I agree it looks like a overkill.

(Btw, I forget the numbers but wonder how much difference if we simple
remove the free_old_xmits() from the rx NAPI path?)

The committed patchset did not record those numbers, but I found them
in an earlier iteration:

[PATCH net-next 0/3] virtio-net tx napi
https://lists.openwall.net/netdev/2017/04/02/55

It did seem to significantly reduce compute cycles ("Gcyc") at the
time. For instance:

  TCP_RR Latency (us):
  1x:
p50  24   24   21
p99  27   27   27
Gcycles 299  432  308

I'm concerned that removing it now may cause a regression report in a
few months. That is higher risk than the spurious interrupt warning
that was only reported after years of use.


Right.

So if Michael is fine with this approach, I'm ok with it. But we
probably need to a TODO to invent the interrupt handlers that can be
used for more than one virtqueues. When MSI-X is enabled, the interrupt
handler (vring_interrup()) assumes the interrupt is used by a single
virtqueue.

Thanks.

The approach to schedule tx-napi from virtnet_poll_cleantx instead of
cleaning directly in this rx-napi function was not effective at
suppressing the warning, I understand.

Correct. I tried the approach to schedule tx napi instead of directly
do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
still happens.

Two questions here: is the device using packed or split vqs?
And is event index enabled?

I think one issue is that at the moment with split and event index we
don't actually disable events at all.



Do we really have a way to disable that? (We don't have a flag like 
packed virtqueue)


Or you mean the trick [1] when I post tx interrupt RFC?

Thanks

[1] https://lkml.org/lkml/2015/2/9/113




static void virtqueue_disable_cb_split(struct virtqueue *_vq)
{
 struct vring_virtqueue *vq = to_vvq(_vq);

 if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
 if (!vq->event)
 vq->split.vring.avail->flags =
 cpu_to_virtio16(_vq->vdev,
 vq->split.avail_flags_shadow);
 }
}

Can you try your napi patch + disable event index?




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-10 Thread Michael S. Tsirkin
On Tue, Feb 09, 2021 at 10:00:22AM -0800, Wei Wang wrote:
> On Tue, Feb 9, 2021 at 6:58 AM Willem de Bruijn
>  wrote:
> >
> > > >>> I have no preference. Just curious, especially if it complicates the 
> > > >>> patch.
> > > >>>
> > > >> My understanding is that. It's probably ok for net. But we probably 
> > > >> need
> > > >> to document the assumptions to make sure it was not abused in other 
> > > >> drivers.
> > > >>
> > > >> Introduce new parameters for find_vqs() can help to eliminate the 
> > > >> subtle
> > > >> stuffs but I agree it looks like a overkill.
> > > >>
> > > >> (Btw, I forget the numbers but wonder how much difference if we simple
> > > >> remove the free_old_xmits() from the rx NAPI path?)
> > > > The committed patchset did not record those numbers, but I found them
> > > > in an earlier iteration:
> > > >
> > > >[PATCH net-next 0/3] virtio-net tx napi
> > > >https://lists.openwall.net/netdev/2017/04/02/55
> > > >
> > > > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > > > time. For instance:
> > > >
> > > >  TCP_RR Latency (us):
> > > >  1x:
> > > >p50  24   24   21
> > > >p99  27   27   27
> > > >Gcycles 299  432  308
> > > >
> > > > I'm concerned that removing it now may cause a regression report in a
> > > > few months. That is higher risk than the spurious interrupt warning
> > > > that was only reported after years of use.
> > >
> > >
> > > Right.
> > >
> > > So if Michael is fine with this approach, I'm ok with it. But we
> > > probably need to a TODO to invent the interrupt handlers that can be
> > > used for more than one virtqueues. When MSI-X is enabled, the interrupt
> > > handler (vring_interrup()) assumes the interrupt is used by a single
> > > virtqueue.
> >
> > Thanks.
> >
> > The approach to schedule tx-napi from virtnet_poll_cleantx instead of
> > cleaning directly in this rx-napi function was not effective at
> > suppressing the warning, I understand.
> 
> Correct. I tried the approach to schedule tx napi instead of directly
> do free_old_xmit_skbs() in virtnet_poll_cleantx(). But the warning
> still happens.

Two questions here: is the device using packed or split vqs?
And is event index enabled?

I think one issue is that at the moment with split and event index we
don't actually disable events at all.

static void virtqueue_disable_cb_split(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);

if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
if (!vq->event)
vq->split.vring.avail->flags =
cpu_to_virtio16(_vq->vdev,
vq->split.avail_flags_shadow);
}
}

Can you try your napi patch + disable event index?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-09 Thread Willem de Bruijn
> >>> I have no preference. Just curious, especially if it complicates the 
> >>> patch.
> >>>
> >> My understanding is that. It's probably ok for net. But we probably need
> >> to document the assumptions to make sure it was not abused in other 
> >> drivers.
> >>
> >> Introduce new parameters for find_vqs() can help to eliminate the subtle
> >> stuffs but I agree it looks like a overkill.
> >>
> >> (Btw, I forget the numbers but wonder how much difference if we simple
> >> remove the free_old_xmits() from the rx NAPI path?)
> > The committed patchset did not record those numbers, but I found them
> > in an earlier iteration:
> >
> >[PATCH net-next 0/3] virtio-net tx napi
> >https://lists.openwall.net/netdev/2017/04/02/55
> >
> > It did seem to significantly reduce compute cycles ("Gcyc") at the
> > time. For instance:
> >
> >  TCP_RR Latency (us):
> >  1x:
> >p50  24   24   21
> >p99  27   27   27
> >Gcycles 299  432  308
> >
> > I'm concerned that removing it now may cause a regression report in a
> > few months. That is higher risk than the spurious interrupt warning
> > that was only reported after years of use.
>
>
> Right.
>
> So if Michael is fine with this approach, I'm ok with it. But we
> probably need to a TODO to invent the interrupt handlers that can be
> used for more than one virtqueues. When MSI-X is enabled, the interrupt
> handler (vring_interrup()) assumes the interrupt is used by a single
> virtqueue.

Thanks.

The approach to schedule tx-napi from virtnet_poll_cleantx instead of
cleaning directly in this rx-napi function was not effective at
suppressing the warning, I understand.

It should be easy to drop the spurious rate to a little under 99%
percent, if only to suppress the warning. By probabilistically
cleaning in virtnet_poll_cleantx only every 98/100, say. But that
really is a hack.

There does seem to be a huge potential for cycle savings if we can
really avoid the many spurious interrupts.

As scheduling napi_tx from virtnet_poll_cleantx is not effective,
agreed that we should probably look at the more complete solution to
handle both tx and rx virtqueues from the same IRQ.

And revert this explicit warning suppression patch once we have that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-08 Thread Jason Wang


On 2021/2/9 上午3:08, Willem de Bruijn wrote:

On Sun, Feb 7, 2021 at 10:29 PM Jason Wang  wrote:


On 2021/2/5 上午4:50, Willem de Bruijn wrote:

On Wed, Feb 3, 2021 at 10:06 PM Jason Wang  wrote:

On 2021/2/4 上午2:28, Willem de Bruijn wrote:

On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:

On 2021/2/2 下午10:37, Willem de Bruijn wrote:

On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:

On 2021/1/29 上午8:21, Wei Wang wrote:

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones 
Signed-off-by: Wei Wang 
Signed-off-by: Willem de Bruijn 

Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.


---
  drivers/net/virtio_net.c | 19 ++-
  drivers/virtio/virtio_ring.c | 16 
  include/linux/virtio.h   |  2 ++
  3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info 
*vi,
  return;
  }

+ /* With napi_tx enabled, free_old_xmit_skbs() could be called from
+  * rx napi handler. Set work_steal to suppress bad irq warning for
+  * IRQ_NONE case from tx complete interrupt handler.
+  */
+ virtqueue_set_work_steal(vq, true);
+
  return virtnet_napi_enable(vq, napi);

Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.

Ok, so it looks like a hint. Then I wonder how much value do we need to
introduce helper like virtqueue_set_work_steal() that allows the caller
to toggle. How about disable the check forever during virtqueue
initialization?

Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.

It's better to do this before the allocating the irq. But it looks not
easy unless we extend find_vqs().

Can you elaborate why that is better? At virtnet_open the interrupts
are not firing either.


I think you meant NAPI actually?

I meant interrupt: we don't have to worry about the spurious interrupt
warning when no interrupts will be firing. Until virtnet_open
completes, the device is down.



Ok.






I have no preference. Just curious, especially if it complicates the patch.


My understanding is that. It's probably ok for net. But we probably need
to document the assumptions to make sure it was not abused in other drivers.

Introduce new parameters for find_vqs() can help to eliminate the subtle
stuffs but I agree it looks like a overkill.

(Btw, I forget the numbers but wonder how much difference if we simple
remove the free_old_xmits() from the rx NAPI path?)

The committed patchset did not record those numbers, but I found them
in an earlier iteration:

   [PATCH net-next 0/3] virtio-net tx napi
   https://lists.openwall.net/netdev/2017/04/02/55

It did seem to significantly reduce compute cycles ("Gcyc") at the
time. For instance:

 TCP_RR Latency (us):
 1x:
   p50  24   24   21
   p99  27   27   27
   Gcycles 299  432  308

I'm concerned that removing it now may cause a regression report in a
few months. That is higher risk than the spurious interrupt warning
that was only reported after years of use.



Right.

So if Michael is fine with 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-08 Thread Willem de Bruijn
On Sun, Feb 7, 2021 at 10:29 PM Jason Wang  wrote:
>
>
> On 2021/2/5 上午4:50, Willem de Bruijn wrote:
> > On Wed, Feb 3, 2021 at 10:06 PM Jason Wang  wrote:
> >>
> >> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
> >>> On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:
>  On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> > On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:
> >> On 2021/1/29 上午8:21, Wei Wang wrote:
> >>> With the implementation of napi-tx in virtio driver, we clean tx
> >>> descriptors from rx napi handler, for the purpose of reducing tx
> >>> complete interrupts. But this could introduce a race where tx complete
> >>> interrupt has been raised, but the handler found there is no work to 
> >>> do
> >>> because we have done the work in the previous rx interrupt handler.
> >>> This could lead to the following warning msg:
> >>> [ 3588.010778] irq 38: nobody cared (try booting with the
> >>> "irqpoll" option)
> >>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> >>> 5.3.0-19-generic #20~18.04.2-Ubuntu
> >>> [ 3588.017940] Call Trace:
> >>> [ 3588.017942]  
> >>> [ 3588.017951]  dump_stack+0x63/0x85
> >>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> >>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> >>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> >>> [ 3588.017957]  handle_irq_event+0x3b/0x60
> >>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> >>> [ 3588.017961]  handle_irq+0x20/0x30
> >>> [ 3588.017964]  do_IRQ+0x50/0xe0
> >>> [ 3588.017966]  common_interrupt+0xf/0xf
> >>> [ 3588.017966]  
> >>> [ 3588.017989] handlers:
> >>> [ 3588.020374] [<1b9f1da8>] vring_interrupt
> >>> [ 3588.025099] Disabling IRQ #38
> >>>
> >>> This patch adds a new param to struct vring_virtqueue, and we set it 
> >>> for
> >>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> >>> case.
> >>>
> >>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >>> Reported-by: Rick Jones 
> >>> Signed-off-by: Wei Wang 
> >>> Signed-off-by: Willem de Bruijn 
> >> Please use get_maintainer.pl to make sure Michael and me were cced.
> > Will do. Sorry about that. I suggested just the virtualization list, my 
> > bad.
> >
> >>> ---
> >>>  drivers/net/virtio_net.c | 19 ++-
> >>>  drivers/virtio/virtio_ring.c | 16 
> >>>  include/linux/virtio.h   |  2 ++
> >>>  3 files changed, 32 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 508408fbe78f..e9a3f30864e8 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct 
> >>> virtnet_info *vi,
> >>>  return;
> >>>  }
> >>>
> >>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called 
> >>> from
> >>> +  * rx napi handler. Set work_steal to suppress bad irq warning 
> >>> for
> >>> +  * IRQ_NONE case from tx complete interrupt handler.
> >>> +  */
> >>> + virtqueue_set_work_steal(vq, true);
> >>> +
> >>>  return virtnet_napi_enable(vq, napi);
> >> Do we need to force the ordering between steal set and napi enable?
> > The warning only occurs after one hundred spurious interrupts, so not
> > really.
>  Ok, so it looks like a hint. Then I wonder how much value do we need to
>  introduce helper like virtqueue_set_work_steal() that allows the caller
>  to toggle. How about disable the check forever during virtqueue
>  initialization?
> >>> Yes, that is even simpler.
> >>>
> >>> We still need the helper, as the internal variables of vring_virtqueue
> >>> are not accessible from virtio-net. An earlier patch added the
> >>> variable to virtqueue itself, but I think it belongs in
> >>> vring_virtqueue. And the helper is not a lot of code.
> >>
> >> It's better to do this before the allocating the irq. But it looks not
> >> easy unless we extend find_vqs().
> > Can you elaborate why that is better? At virtnet_open the interrupts
> > are not firing either.
>
>
> I think you meant NAPI actually?

I meant interrupt: we don't have to worry about the spurious interrupt
warning when no interrupts will be firing. Until virtnet_open
completes, the device is down.


>
> >
> > I have no preference. Just curious, especially if it complicates the patch.
> >
>
> My understanding is that. It's probably ok for net. But we probably need
> to document the assumptions to make sure it was not abused in other drivers.
>
> Introduce new parameters for find_vqs() can help to eliminate the subtle
> stuffs but I agree it looks like a overkill.
>
> (Btw, I forget the numbers but wonder how much 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-07 Thread Jason Wang


On 2021/2/5 上午4:50, Willem de Bruijn wrote:

On Wed, Feb 3, 2021 at 10:06 PM Jason Wang  wrote:


On 2021/2/4 上午2:28, Willem de Bruijn wrote:

On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:

On 2021/2/2 下午10:37, Willem de Bruijn wrote:

On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:

On 2021/1/29 上午8:21, Wei Wang wrote:

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones 
Signed-off-by: Wei Wang 
Signed-off-by: Willem de Bruijn 

Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.


---
 drivers/net/virtio_net.c | 19 ++-
 drivers/virtio/virtio_ring.c | 16 
 include/linux/virtio.h   |  2 ++
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info 
*vi,
 return;
 }

+ /* With napi_tx enabled, free_old_xmit_skbs() could be called from
+  * rx napi handler. Set work_steal to suppress bad irq warning for
+  * IRQ_NONE case from tx complete interrupt handler.
+  */
+ virtqueue_set_work_steal(vq, true);
+
 return virtnet_napi_enable(vq, napi);

Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.

Ok, so it looks like a hint. Then I wonder how much value do we need to
introduce helper like virtqueue_set_work_steal() that allows the caller
to toggle. How about disable the check forever during virtqueue
initialization?

Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.


It's better to do this before the allocating the irq. But it looks not
easy unless we extend find_vqs().

Can you elaborate why that is better? At virtnet_open the interrupts
are not firing either.



I think you meant NAPI actually?




I have no preference. Just curious, especially if it complicates the patch.



My understanding is that. It's probably ok for net. But we probably need 
to document the assumptions to make sure it was not abused in other drivers.


Introduce new parameters for find_vqs() can help to eliminate the subtle 
stuffs but I agree it looks like a overkill.


(Btw, I forget the numbers but wonder how much difference if we simple 
remove the free_old_xmits() from the rx NAPI path?)


Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-04 Thread Willem de Bruijn
On Wed, Feb 3, 2021 at 10:06 PM Jason Wang  wrote:
>
>
> On 2021/2/4 上午2:28, Willem de Bruijn wrote:
> > On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:
> >>
> >> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> >>> On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:
>  On 2021/1/29 上午8:21, Wei Wang wrote:
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this could introduce a race where tx complete
> > interrupt has been raised, but the handler found there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > This could lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patch adds a new param to struct vring_virtqueue, and we set it for
> > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > case.
> >
> > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > Reported-by: Rick Jones 
> > Signed-off-by: Wei Wang 
> > Signed-off-by: Willem de Bruijn 
>  Please use get_maintainer.pl to make sure Michael and me were cced.
> >>> Will do. Sorry about that. I suggested just the virtualization list, my 
> >>> bad.
> >>>
> > ---
> > drivers/net/virtio_net.c | 19 ++-
> > drivers/virtio/virtio_ring.c | 16 
> > include/linux/virtio.h   |  2 ++
> > 3 files changed, 32 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 508408fbe78f..e9a3f30864e8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct 
> > virtnet_info *vi,
> > return;
> > }
> >
> > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> > +  * rx napi handler. Set work_steal to suppress bad irq warning for
> > +  * IRQ_NONE case from tx complete interrupt handler.
> > +  */
> > + virtqueue_set_work_steal(vq, true);
> > +
> > return virtnet_napi_enable(vq, napi);
>  Do we need to force the ordering between steal set and napi enable?
> >>> The warning only occurs after one hundred spurious interrupts, so not
> >>> really.
> >>
> >> Ok, so it looks like a hint. Then I wonder how much value do we need to
> >> introduce helper like virtqueue_set_work_steal() that allows the caller
> >> to toggle. How about disable the check forever during virtqueue
> >> initialization?
> > Yes, that is even simpler.
> >
> > We still need the helper, as the internal variables of vring_virtqueue
> > are not accessible from virtio-net. An earlier patch added the
> > variable to virtqueue itself, but I think it belongs in
> > vring_virtqueue. And the helper is not a lot of code.
>
>
> It's better to do this before the allocating the irq. But it looks not
> easy unless we extend find_vqs().

Can you elaborate why that is better? At virtnet_open the interrupts
are not firing either.

I have no preference. Just curious, especially if it complicates the patch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-04 Thread Willem de Bruijn
On Wed, Feb 3, 2021 at 6:53 PM Wei Wang  wrote:
>
> On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin  wrote:
> > > >
> > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  wrote:
> > > > > > >
> > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > With the implementation of napi-tx in virtio driver, we clean 
> > > > > > > > > tx
> > > > > > > > > descriptors from rx napi handler, for the purpose of reducing 
> > > > > > > > > tx
> > > > > > > > > complete interrupts. But this could introduce a race where tx 
> > > > > > > > > complete
> > > > > > > > > interrupt has been raised, but the handler found there is no 
> > > > > > > > > work to do
> > > > > > > > > because we have done the work in the previous rx interrupt 
> > > > > > > > > handler.
> > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > "irqpoll" option)
> > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > [ 3588.017942]  
> > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > [ 3588.017966]  
> > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > >
> > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we 
> > > > > > > > > set it for
> > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning 
> > > > > > > > > in such
> > > > > > > > > case.
> > > > > > > > >
> > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from 
> > > > > > > > > rx napi")
> > > > > > > > > Reported-by: Rick Jones 
> > > > > > > > > Signed-off-by: Wei Wang 
> > > > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > > > >
> > > > > > > >
> > > > > > > > This description does not make sense to me.
> > > > > > > >
> > > > > > > > irq X: nobody cared
> > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > >
> > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > >
> > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > just preventing linux from complaining.
> > > > > > >
> > > > > > > The traffic that causes this warning is a netperf tcp_stream with 
> > > > > > > at
> > > > > > > least 128 flows between 2 hosts. And the warning gets triggered 
> > > > > > > on the
> > > > > > > receiving host, which has a lot of rx interrupts firing on all 
> > > > > > > queues,
> > > > > > > and a few tx interrupts.
> > > > > > > And I think the scenario is: when the tx interrupt gets fired, it 
> > > > > > > gets
> > > > > > > coalesced with the rx interrupt. Basically, the rx and tx 
> > > > > > > interrupts
> > > > > > > get triggered very close to each other, and gets handled in one 
> > > > > > > round
> > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > virtnet_poll(). However, virtnet_poll() calls 
> > > > > > > virtnet_poll_cleantx()
> > > > > > > to try to do the work on the corresponding tx queue as well. 
> > > > > > > That's
> > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > >
> > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > hundred such events, since boot, which is a small number compared 
> > > > > > real
> > > > > > interrupt load.
> > > > >
> > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > note_interrupt that applies here.
> > > > >
> > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > long as 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-03 Thread Jason Wang


On 2021/2/4 上午2:28, Willem de Bruijn wrote:

On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:


On 2021/2/2 下午10:37, Willem de Bruijn wrote:

On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:

On 2021/1/29 上午8:21, Wei Wang wrote:

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones 
Signed-off-by: Wei Wang 
Signed-off-by: Willem de Bruijn 

Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.


---
drivers/net/virtio_net.c | 19 ++-
drivers/virtio/virtio_ring.c | 16 
include/linux/virtio.h   |  2 ++
3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info 
*vi,
return;
}

+ /* With napi_tx enabled, free_old_xmit_skbs() could be called from
+  * rx napi handler. Set work_steal to suppress bad irq warning for
+  * IRQ_NONE case from tx complete interrupt handler.
+  */
+ virtqueue_set_work_steal(vq, true);
+
return virtnet_napi_enable(vq, napi);

Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.


Ok, so it looks like a hint. Then I wonder how much value do we need to
introduce helper like virtqueue_set_work_steal() that allows the caller
to toggle. How about disable the check forever during virtqueue
initialization?

Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.



It's better to do this before the allocating the irq. But it looks not 
easy unless we extend find_vqs().


Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-03 Thread Michael S. Tsirkin
On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin  wrote:
> >
> > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn  
> > > wrote:
> > > >
> > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  wrote:
> > > > >
> > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > complete interrupts. But this could introduce a race where tx 
> > > > > > > complete
> > > > > > > interrupt has been raised, but the handler found there is no work 
> > > > > > > to do
> > > > > > > because we have done the work in the previous rx interrupt 
> > > > > > > handler.
> > > > > > > This could lead to the following warning msg:
> > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > "irqpoll" option)
> > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > [ 3588.017940] Call Trace:
> > > > > > > [ 3588.017942]  
> > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > [ 3588.017966]  
> > > > > > > [ 3588.017989] handlers:
> > > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > >
> > > > > > > This patch adds a new param to struct vring_virtqueue, and we set 
> > > > > > > it for
> > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in 
> > > > > > > such
> > > > > > > case.
> > > > > > >
> > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx 
> > > > > > > napi")
> > > > > > > Reported-by: Rick Jones 
> > > > > > > Signed-off-by: Wei Wang 
> > > > > > > Signed-off-by: Willem de Bruijn 
> > > > > >
> > > > > >
> > > > > > This description does not make sense to me.
> > > > > >
> > > > > > irq X: nobody cared
> > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > >
> > > > > > So something causes a storm of useless tx interrupts here.
> > > > > >
> > > > > > Let's find out what it was please. What you are doing is
> > > > > > just preventing linux from complaining.
> > > > >
> > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > and a few tx interrupts.
> > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > get triggered very close to each other, and gets handled in one round
> > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > >
> > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > hundred such events, since boot, which is a small number compared real
> > > > interrupt load.
> > >
> > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > note_interrupt that applies here.
> > >
> > > > Occasionally seeing an interrupt with no work is expected after
> > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > long as this rate of events is very low compared to useful interrupts,
> > > > and total interrupt count is greatly reduced vs not having work
> > > > stealing, it is a net win.
> >
> > Right, but if 99900 out of 10 interrupts were wasted, then it is
> > surely an even greater win to disable interrupts while polling like
> > this.  Might be tricky to detect, disabling/enabling aggressively every
> > time even if there's nothing in the queue is sure to cause lots of cache
> > line bounces, and we don't want to enable callbacks if they were not
> > enabled e.g. by start_xmit ...  Some kind of counter?
> 
> Yes. It was known that the work 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-03 Thread Willem de Bruijn
On Wed, Feb 3, 2021 at 12:33 AM Jason Wang  wrote:
>
>
> On 2021/2/2 下午10:37, Willem de Bruijn wrote:
> > On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:
> >>
> >> On 2021/1/29 上午8:21, Wei Wang wrote:
> >>> With the implementation of napi-tx in virtio driver, we clean tx
> >>> descriptors from rx napi handler, for the purpose of reducing tx
> >>> complete interrupts. But this could introduce a race where tx complete
> >>> interrupt has been raised, but the handler found there is no work to do
> >>> because we have done the work in the previous rx interrupt handler.
> >>> This could lead to the following warning msg:
> >>> [ 3588.010778] irq 38: nobody cared (try booting with the
> >>> "irqpoll" option)
> >>> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> >>> 5.3.0-19-generic #20~18.04.2-Ubuntu
> >>> [ 3588.017940] Call Trace:
> >>> [ 3588.017942]  
> >>> [ 3588.017951]  dump_stack+0x63/0x85
> >>> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> >>> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> >>> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> >>> [ 3588.017957]  handle_irq_event+0x3b/0x60
> >>> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> >>> [ 3588.017961]  handle_irq+0x20/0x30
> >>> [ 3588.017964]  do_IRQ+0x50/0xe0
> >>> [ 3588.017966]  common_interrupt+0xf/0xf
> >>> [ 3588.017966]  
> >>> [ 3588.017989] handlers:
> >>> [ 3588.020374] [<1b9f1da8>] vring_interrupt
> >>> [ 3588.025099] Disabling IRQ #38
> >>>
> >>> This patch adds a new param to struct vring_virtqueue, and we set it for
> >>> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> >>> case.
> >>>
> >>> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> >>> Reported-by: Rick Jones 
> >>> Signed-off-by: Wei Wang 
> >>> Signed-off-by: Willem de Bruijn 
> >>
> >> Please use get_maintainer.pl to make sure Michael and me were cced.
> > Will do. Sorry about that. I suggested just the virtualization list, my bad.
> >
> >>> ---
> >>>drivers/net/virtio_net.c | 19 ++-
> >>>drivers/virtio/virtio_ring.c | 16 
> >>>include/linux/virtio.h   |  2 ++
> >>>3 files changed, 32 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 508408fbe78f..e9a3f30864e8 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct 
> >>> virtnet_info *vi,
> >>>return;
> >>>}
> >>>
> >>> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> >>> +  * rx napi handler. Set work_steal to suppress bad irq warning for
> >>> +  * IRQ_NONE case from tx complete interrupt handler.
> >>> +  */
> >>> + virtqueue_set_work_steal(vq, true);
> >>> +
> >>>return virtnet_napi_enable(vq, napi);
> >>
> >> Do we need to force the ordering between steal set and napi enable?
> > The warning only occurs after one hundred spurious interrupts, so not
> > really.
>
>
> Ok, so it looks like a hint. Then I wonder how much value do we need to
> introduce helper like virtqueue_set_work_steal() that allows the caller
> to toggle. How about disable the check forever during virtqueue
> initialization?

Yes, that is even simpler.

We still need the helper, as the internal variables of vring_virtqueue
are not accessible from virtio-net. An earlier patch added the
variable to virtqueue itself, but I think it belongs in
vring_virtqueue. And the helper is not a lot of code.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-03 Thread Willem de Bruijn
On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin  wrote:
>
> On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn  wrote:
> > >
> > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  wrote:
> > > >
> > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > complete interrupts. But this could introduce a race where tx 
> > > > > > complete
> > > > > > interrupt has been raised, but the handler found there is no work 
> > > > > > to do
> > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > This could lead to the following warning msg:
> > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > "irqpoll" option)
> > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > [ 3588.017940] Call Trace:
> > > > > > [ 3588.017942]  
> > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > [ 3588.017966]  
> > > > > > [ 3588.017989] handlers:
> > > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > >
> > > > > > This patch adds a new param to struct vring_virtqueue, and we set 
> > > > > > it for
> > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > case.
> > > > > >
> > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx 
> > > > > > napi")
> > > > > > Reported-by: Rick Jones 
> > > > > > Signed-off-by: Wei Wang 
> > > > > > Signed-off-by: Willem de Bruijn 
> > > > >
> > > > >
> > > > > This description does not make sense to me.
> > > > >
> > > > > irq X: nobody cared
> > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > >
> > > > > So something causes a storm of useless tx interrupts here.
> > > > >
> > > > > Let's find out what it was please. What you are doing is
> > > > > just preventing linux from complaining.
> > > >
> > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > and a few tx interrupts.
> > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > get triggered very close to each other, and gets handled in one round
> > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > to try to do the work on the corresponding tx queue as well. That's
> > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > And the reason for the rx handler to handle the tx work is here:
> > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > >
> > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > hundred such events, since boot, which is a small number compared real
> > > interrupt load.
> >
> > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > note_interrupt that applies here.
> >
> > > Occasionally seeing an interrupt with no work is expected after
> > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > long as this rate of events is very low compared to useful interrupts,
> > > and total interrupt count is greatly reduced vs not having work
> > > stealing, it is a net win.
>
> Right, but if 99900 out of 10 interrupts were wasted, then it is
> surely an even greater win to disable interrupts while polling like
> this.  Might be tricky to detect, disabling/enabling aggressively every
> time even if there's nothing in the queue is sure to cause lots of cache
> line bounces, and we don't want to enable callbacks if they were not
> enabled e.g. by start_xmit ...  Some kind of counter?

Yes. It was known that the work stealing is more effective in some
workloads than others. But a 99% spurious rate I had not anticipated.

Most interesting is the number of interrupts suppressed as a result of
the feature. That is not captured by this statistic.

In any case, we'll take a step back to better understand behavior. 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-03 Thread Michael S. Tsirkin
On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn  wrote:
> >
> > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang  wrote:
> > >
> > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > interrupt has been raised, but the handler found there is no work to 
> > > > > do
> > > > > because we have done the work in the previous rx interrupt handler.
> > > > > This could lead to the following warning msg:
> > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > "irqpoll" option)
> > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > [ 3588.017940] Call Trace:
> > > > > [ 3588.017942]  
> > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > [ 3588.017966]  
> > > > > [ 3588.017989] handlers:
> > > > > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > > > > [ 3588.025099] Disabling IRQ #38
> > > > >
> > > > > This patch adds a new param to struct vring_virtqueue, and we set it 
> > > > > for
> > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > case.
> > > > >
> > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > Reported-by: Rick Jones 
> > > > > Signed-off-by: Wei Wang 
> > > > > Signed-off-by: Willem de Bruijn 
> > > >
> > > >
> > > > This description does not make sense to me.
> > > >
> > > > irq X: nobody cared
> > > > only triggers after an interrupt is unhandled repeatedly.
> > > >
> > > > So something causes a storm of useless tx interrupts here.
> > > >
> > > > Let's find out what it was please. What you are doing is
> > > > just preventing linux from complaining.
> > >
> > > The traffic that causes this warning is a netperf tcp_stream with at
> > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > and a few tx interrupts.
> > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > get triggered very close to each other, and gets handled in one round
> > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > to try to do the work on the corresponding tx queue as well. That's
> > > why when tx interrupt handler gets called, it sees no work to do.
> > > And the reason for the rx handler to handle the tx work is here:
> > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> >
> > Indeed. It's not a storm necessarily. The warning occurs after one
> > hundred such events, since boot, which is a small number compared real
> > interrupt load.
> 
> Sorry, this is wrong. It is the other call to __report_bad_irq from
> note_interrupt that applies here.
> 
> > Occasionally seeing an interrupt with no work is expected after
> > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > long as this rate of events is very low compared to useful interrupts,
> > and total interrupt count is greatly reduced vs not having work
> > stealing, it is a net win.

Right, but if 99900 out of 10 interrupts were wasted, then it is
surely an even greater win to disable interrupts while polling like
this.  Might be tricky to detect, disabling/enabling aggressively every
time even if there's nothing in the queue is sure to cause lots of cache
line bounces, and we don't want to enable callbacks if they were not
enabled e.g. by start_xmit ...  Some kind of counter?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-02 Thread Jason Wang


On 2021/2/2 下午10:37, Willem de Bruijn wrote:

On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:


On 2021/1/29 上午8:21, Wei Wang wrote:

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones 
Signed-off-by: Wei Wang 
Signed-off-by: Willem de Bruijn 


Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.


---
   drivers/net/virtio_net.c | 19 ++-
   drivers/virtio/virtio_ring.c | 16 
   include/linux/virtio.h   |  2 ++
   3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info 
*vi,
   return;
   }

+ /* With napi_tx enabled, free_old_xmit_skbs() could be called from
+  * rx napi handler. Set work_steal to suppress bad irq warning for
+  * IRQ_NONE case from tx complete interrupt handler.
+  */
+ virtqueue_set_work_steal(vq, true);
+
   return virtnet_napi_enable(vq, napi);


Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.



Ok, so it looks like a hint. Then I wonder how much value do we need to 
introduce helper like virtqueue_set_work_steal() that allows the caller 
to toggle. How about disable the check forever during virtqueue 
initialization?






   }

-static void virtnet_napi_tx_disable(struct napi_struct *napi)
+static void virtnet_napi_tx_disable(struct virtqueue *vq,
+ struct napi_struct *napi)
   {
- if (napi->weight)
+ if (napi->weight) {
   napi_disable(napi);
+ virtqueue_set_work_steal(vq, false);
+ }
   }

   static void refill_work(struct work_struct *work)
@@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
   for (i = 0; i < vi->max_queue_pairs; i++) {
   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
   napi_disable(>rq[i].napi);
- virtnet_napi_tx_disable(>sq[i].napi);
+ virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
   }

   return 0;
@@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
   if (netif_running(vi->dev)) {
   for (i = 0; i < vi->max_queue_pairs; i++) {
   napi_disable(>rq[i].napi);
- virtnet_napi_tx_disable(>sq[i].napi);
+ virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
   }
   }
   }
@@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog,
   if (netif_running(dev)) {
   for (i = 0; i < vi->max_queue_pairs; i++) {
   napi_disable(>rq[i].napi);
- virtnet_napi_tx_disable(>sq[i].napi);
+ virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
   }
   }

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..f7c5d697c302 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -105,6 +105,9 @@ struct vring_virtqueue {
   /* Host publishes avail event idx */
   bool event;

+ /* Tx side napi work could be done from rx side. */
+ bool work_steal;


So vring_vritqueue is a general structure, let's avoid mentioning
network specific stuffs here. And we need a better name like
"no_interrupt_check"?

And we need a separate patch for virtio core 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-02 Thread Michael S. Tsirkin
On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this could introduce a race where tx complete
> interrupt has been raised, but the handler found there is no work to do
> because we have done the work in the previous rx interrupt handler.
> This could lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  
> [ 3588.017989] handlers:
> [ 3588.020374] [<1b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
> 
> This patch adds a new param to struct vring_virtqueue, and we set it for
> tx virtqueues if napi-tx is enabled, to suppress the warning in such
> case.
> 
> Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> Reported-by: Rick Jones 
> Signed-off-by: Wei Wang 
> Signed-off-by: Willem de Bruijn 


This description does not make sense to me.

irq X: nobody cared
only triggers after an interrupt is unhandled repeatedly.

So something causes a storm of useless tx interrupts here.

Let's find out what it was please. What you are doing is
just preventing linux from complaining.



> ---
>  drivers/net/virtio_net.c | 19 ++-
>  drivers/virtio/virtio_ring.c | 16 
>  include/linux/virtio.h   |  2 ++
>  3 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 508408fbe78f..e9a3f30864e8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct 
> virtnet_info *vi,
>   return;
>   }
>  
> + /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> +  * rx napi handler. Set work_steal to suppress bad irq warning for
> +  * IRQ_NONE case from tx complete interrupt handler.
> +  */
> + virtqueue_set_work_steal(vq, true);
> +
>   return virtnet_napi_enable(vq, napi);
>  }
>  
> -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> + struct napi_struct *napi)
>  {
> - if (napi->weight)
> + if (napi->weight) {
>   napi_disable(napi);
> + virtqueue_set_work_steal(vq, false);
> + }
>  }
>  
>  static void refill_work(struct work_struct *work)
> @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
>   napi_disable(>rq[i].napi);
> - virtnet_napi_tx_disable(>sq[i].napi);
> + virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
>   }
>  
>   return 0;
> @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>   if (netif_running(vi->dev)) {
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   napi_disable(>rq[i].napi);
> - virtnet_napi_tx_disable(>sq[i].napi);
> + virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
>   }
>   }
>  }
> @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, 
> struct bpf_prog *prog,
>   if (netif_running(dev)) {
>   for (i = 0; i < vi->max_queue_pairs; i++) {
>   napi_disable(>rq[i].napi);
> - virtnet_napi_tx_disable(>sq[i].napi);
> + virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
>   }
>   }
>  
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..f7c5d697c302 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -105,6 +105,9 @@ struct vring_virtqueue {
>   /* Host publishes avail event idx */
>   bool event;
>  
> + /* Tx side napi work could be done from rx side. */
> + bool work_steal;
> +
>   /* Head of free buffer list. */
>   unsigned int free_head;
>   /* Number we've added since last sync. */
> @@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   vq->notify = notify;
>   vq->weak_barriers = weak_barriers;
>   vq->broken = false;
> + vq->work_steal = false;
>   

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-02 Thread Willem de Bruijn
On Mon, Feb 1, 2021 at 10:09 PM Jason Wang  wrote:
>
>
> On 2021/1/29 上午8:21, Wei Wang wrote:
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this could introduce a race where tx complete
> > interrupt has been raised, but the handler found there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > This could lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<1b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patch adds a new param to struct vring_virtqueue, and we set it for
> > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > case.
> >
> > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > Reported-by: Rick Jones 
> > Signed-off-by: Wei Wang 
> > Signed-off-by: Willem de Bruijn 
>
>
> Please use get_maintainer.pl to make sure Michael and me were cced.

Will do. Sorry about that. I suggested just the virtualization list, my bad.

>
> > ---
> >   drivers/net/virtio_net.c | 19 ++-
> >   drivers/virtio/virtio_ring.c | 16 
> >   include/linux/virtio.h   |  2 ++
> >   3 files changed, 32 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 508408fbe78f..e9a3f30864e8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct 
> > virtnet_info *vi,
> >   return;
> >   }
> >
> > + /* With napi_tx enabled, free_old_xmit_skbs() could be called from
> > +  * rx napi handler. Set work_steal to suppress bad irq warning for
> > +  * IRQ_NONE case from tx complete interrupt handler.
> > +  */
> > + virtqueue_set_work_steal(vq, true);
> > +
> >   return virtnet_napi_enable(vq, napi);
>
>
> Do we need to force the ordering between steal set and napi enable?

The warning only occurs after one hundred spurious interrupts, so not
really.

>
> >   }
> >
> > -static void virtnet_napi_tx_disable(struct napi_struct *napi)
> > +static void virtnet_napi_tx_disable(struct virtqueue *vq,
> > + struct napi_struct *napi)
> >   {
> > - if (napi->weight)
> > + if (napi->weight) {
> >   napi_disable(napi);
> > + virtqueue_set_work_steal(vq, false);
> > + }
> >   }
> >
> >   static void refill_work(struct work_struct *work)
> > @@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
> >   for (i = 0; i < vi->max_queue_pairs; i++) {
> >   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
> >   napi_disable(>rq[i].napi);
> > - virtnet_napi_tx_disable(>sq[i].napi);
> > + virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
> >   }
> >
> >   return 0;
> > @@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device 
> > *vdev)
> >   if (netif_running(vi->dev)) {
> >   for (i = 0; i < vi->max_queue_pairs; i++) {
> >   napi_disable(>rq[i].napi);
> > - virtnet_napi_tx_disable(>sq[i].napi);
> > + virtnet_napi_tx_disable(vi->sq[i].vq, 
> > >sq[i].napi);
> >   }
> >   }
> >   }
> > @@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, 
> > struct bpf_prog *prog,
> >   if (netif_running(dev)) {
> >   for (i = 0; i < vi->max_queue_pairs; i++) {
> >   napi_disable(>rq[i].napi);
> > - virtnet_napi_tx_disable(>sq[i].napi);
> > + virtnet_napi_tx_disable(vi->sq[i].vq, 
> > >sq[i].napi);
> >   }
> >   }
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..f7c5d697c302 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -105,6 +105,9 @@ struct vring_virtqueue {
> >   /* Host publishes avail event idx */
> >   bool event;
> >
> > + /* Tx side napi work could be done from rx side. */
> > + bool work_steal;
>
>
> So vring_vritqueue is a general structure, let's avoid 

Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi

2021-02-01 Thread Jason Wang


On 2021/1/29 上午8:21, Wei Wang wrote:

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch adds a new param to struct vring_virtqueue, and we set it for
tx virtqueues if napi-tx is enabled, to suppress the warning in such
case.

Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Reported-by: Rick Jones 
Signed-off-by: Wei Wang 
Signed-off-by: Willem de Bruijn 



Please use get_maintainer.pl to make sure Michael and me were cced.



---
  drivers/net/virtio_net.c | 19 ++-
  drivers/virtio/virtio_ring.c | 16 
  include/linux/virtio.h   |  2 ++
  3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 508408fbe78f..e9a3f30864e8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1303,13 +1303,22 @@ static void virtnet_napi_tx_enable(struct virtnet_info 
*vi,
return;
}
  
+	/* With napi_tx enabled, free_old_xmit_skbs() could be called from

+* rx napi handler. Set work_steal to suppress bad irq warning for
+* IRQ_NONE case from tx complete interrupt handler.
+*/
+   virtqueue_set_work_steal(vq, true);
+
return virtnet_napi_enable(vq, napi);



Do we need to force the ordering between steal set and napi enable?



  }
  
-static void virtnet_napi_tx_disable(struct napi_struct *napi)

+static void virtnet_napi_tx_disable(struct virtqueue *vq,
+   struct napi_struct *napi)
  {
-   if (napi->weight)
+   if (napi->weight) {
napi_disable(napi);
+   virtqueue_set_work_steal(vq, false);
+   }
  }
  
  static void refill_work(struct work_struct *work)

@@ -1835,7 +1844,7 @@ static int virtnet_close(struct net_device *dev)
for (i = 0; i < vi->max_queue_pairs; i++) {
xdp_rxq_info_unreg(>rq[i].xdp_rxq);
napi_disable(>rq[i].napi);
-   virtnet_napi_tx_disable(>sq[i].napi);
+   virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
}
  
  	return 0;

@@ -2315,7 +2324,7 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
if (netif_running(vi->dev)) {
for (i = 0; i < vi->max_queue_pairs; i++) {
napi_disable(>rq[i].napi);
-   virtnet_napi_tx_disable(>sq[i].napi);
+   virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
}
}
  }
@@ -2440,7 +2449,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog,
if (netif_running(dev)) {
for (i = 0; i < vi->max_queue_pairs; i++) {
napi_disable(>rq[i].napi);
-   virtnet_napi_tx_disable(>sq[i].napi);
+   virtnet_napi_tx_disable(vi->sq[i].vq, >sq[i].napi);
}
}
  
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c

index 71e16b53e9c1..f7c5d697c302 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -105,6 +105,9 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
  
+	/* Tx side napi work could be done from rx side. */

+   bool work_steal;



So vring_vritqueue is a general structure, let's avoid mentioning 
network specific stuffs here. And we need a better name like 
"no_interrupt_check"?


And we need a separate patch for virtio core changes.



+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -1604,6 +1607,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->notify = notify;
vq->weak_barriers = weak_barriers;
vq->broken = false;
+   vq->work_steal = false;
vq->last_used_idx = 0;
vq->num_added = 0;
vq->packed_ring = true;
@@