Public bug reported:

I use a non-qemu VMM, cloud-hypervisor. It looks like a patch was
applied, that introduced a bug, a week later another patch got written
to fix that bug, and that second patch was not applied in Ubuntu's
release, but is seen in Greg KH's 5.15 branch.

The result of the bug is the kernel will not boot.

Cumulative diff:

```
> git diff  Ubuntu-5.15.0-86.96 Ubuntu-5.15.0-89.99 drivers/net/virtio_net.c
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0351f86494f1..af335f8266c2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3319,6 +3319,8 @@ static int virtnet_probe(struct virtio_device *vdev)
                }
        }

+       _virtnet_set_queues(vi, vi->curr_queue_pairs);
+
        /* serialize netdev register + virtio_device_ready() with ndo_open() */
        rtnl_lock();

@@ -3339,8 +3341,6 @@ static int virtnet_probe(struct virtio_device *vdev)
                goto free_unregister_netdev;
        }

-       virtnet_set_queues(vi, vi->curr_queue_pairs);
-
        /* Assume link up if device can't report link status,
           otherwise get link status from config. */
        netif_carrier_off(dev);
```

Blamed Commit:

```
commit 5e0545ef5682562ffef072138d9340ea36a2ebc9
Author: Jason Wang <jasow...@redhat.com>
Date:   Tue Jul 25 03:20:49 2023 -0400

    virtio-net: fix race between set queues and probe

    BugLink: https://bugs.launchpad.net/bugs/2035400

    commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.

    A race were found where set_channels could be called after registering
    but before virtnet_set_queues() in virtnet_probe(). Fixing this by
    moving the virtnet_set_queues() before netdevice registering. While at
    it, use _virtnet_set_queues() to avoid holding rtnl as the device is
    not even registered at that time.

    Cc: sta...@vger.kernel.org
    Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
    Signed-off-by: Jason Wang <jasow...@redhat.com>
    Acked-by: Michael S. Tsirkin <m...@redhat.com>
    Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
    Link: https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
    Signed-off-by: Jakub Kicinski <k...@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
    Signed-off-by: Kamal Mostafa <ka...@canonical.com>
    Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
```

Investigation into Greg KH's 5.15 branch shows the (unapplied?) followup
as:

```
commit 431db3f48c286462ad7453ccdf284f590aafa949
Author: Jason Wang <jasow...@redhat.com>
Date:   Wed Aug 9 23:12:56 2023 -0400

    virtio-net: set queues after driver_ok

    commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.

    Commit 25266128fe16 ("virtio-net: fix race between set queues and
    probe") tries to fix the race between set queues and probe by calling
    _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
    spec. Fixing this by setting queues after virtio_device_ready().

    Note that rtnl needs to be held for userspace requests to change the
    number of queues. So we are serialized in this way.

    Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
    Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
    Acked-by: Michael S. Tsirkin <m...@redhat.com>
    Signed-off-by: Jason Wang <jasow...@redhat.com>
    Signed-off-by: David S. Miller <da...@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
```

Boot stack trace:

```
[   28.129660] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
[systemd-udevd:165]
[   28.130265] Modules linked in: crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net(+)
net_failover virtio_rng failover virtio_blk
[   28.131396] CPU: 1 PID: 165 Comm: systemd-udevd Not tainted
5.15.0-89-generic https://github.com/ubicloud/ubicloud/pull/99-Ubuntu
[   28.131997] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
[   28.132479] RIP: 0010:virtnet_send_command+0x10b/0x170 [virtio_net]
[   28.132951] Code: 0b 83 c1 d8 85 c0 0f 88 d2 6e 00 00 48 8b 7b 08 e8
6a 72 c1 d8 84 c0 75 11 eb 56 48 8b 7b 08 e8 6b 5e c1 d8 84 c0 75 17 f3
90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 f5 71 c1 d8 48 85 c0 74 dc 48
[   28.134326] RSP: 0018:ffff9b0c4064f9b8 EFLAGS: 00000246
[   28.134720] RAX: 0000000000000000 RBX: ffff89dfc0d13980 RCX:
0000000000000a20
[   28.135252] RDX: 0000000000000000 RSI: ffff9b0c4064f9bc RDI:
ffff89dfc7cc00c0
[   28.135787] RBP: ffff9b0c4064fa50 R08: 0000000000000001 R09:
0000000000000003
[   28.136316] R10: 0000000000000003 R11: 0000000000000002 R12:
ffff9b0c4064f9e0
[   28.136851] R13: 0000000000000002 R14: 0000000000000004 R15:
ffff89dfc0c49400
[   28.137381] FS:  00007feeba10e8c0(0000) GS:ffff89e0f7d00000(0000)
knlGS:0000000000000000
[   28.137981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   28.138408] CR2: 00007feeba90a0f8 CR3: 0000000100258000 CR4:
0000000000350ee0
[   28.138940] Call Trace:
[   28.139129]  <IRQ>
[   28.139291]  ? show_trace_log_lvl+0x1d6/0x2ea
[   28.139627]  ? show_trace_log_lvl+0x1d6/0x2ea
[   28.139957]  ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
[   28.140369]  ? show_regs.part.0+0x23/0x29
[   28.140672]  ? show_regs.cold+0x8/0xd
[   28.140950]  ? watchdog_timer_fn+0x1be/0x220
[   28.141273]  ? lockup_detector_update_enable+0x60/0x60
[   28.141657]  ? __hrtimer_run_queues+0x107/0x230
[   28.142011]  ? clockevents_program_event+0xad/0x130
[   28.142377]  ? hrtimer_interrupt+0x101/0x220
[   28.142698]  ? __sysvec_apic_timer_interrupt+0x61/0xe0
[   28.143084]  ? sysvec_apic_timer_interrupt+0x7b/0x90
[   28.143460]  </IRQ>
```

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Summary changed:

- Ubuntu-5.15.0-89.99 breaks virtio-net spec compatibility
+ Ubuntu-5.15.0-89.99 breaks virtio-net spec and doesn't boot

** Description changed:

  I use a non-qemu VMM, cloud-hypervisor. It looks like a patch was
  applied, that introduced a bug, and the fix to that patch a couple of
  weeks later seen upstream was not applied. The result of the bug is the
  kernel will not boot.
  
  Cumulative diff:
  
  ```
  > git diff  Ubuntu-5.15.0-86.96 Ubuntu-5.15.0-89.99 drivers/net/virtio_net.c
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 0351f86494f1..af335f8266c2 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -3319,6 +3319,8 @@ static int virtnet_probe(struct virtio_device *vdev)
-                 }
-         }
-  
+                 }
+         }
+ 
  +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
  +
-         /* serialize netdev register + virtio_device_ready() with ndo_open() 
*/
-         rtnl_lock();
-  
+         /* serialize netdev register + virtio_device_ready() with ndo_open() 
*/
+         rtnl_lock();
+ 
  @@ -3339,8 +3341,6 @@ static int virtnet_probe(struct virtio_device *vdev)
-                 goto free_unregister_netdev;
-         }
-  
+                 goto free_unregister_netdev;
+         }
+ 
  -       virtnet_set_queues(vi, vi->curr_queue_pairs);
  -
-         /* Assume link up if device can't report link status,
-            otherwise get link status from config. */
-         netif_carrier_off(dev);
+         /* Assume link up if device can't report link status,
+            otherwise get link status from config. */
+         netif_carrier_off(dev);
  ```
  
  Blamed Commit:
  
  ```
  commit 5e0545ef5682562ffef072138d9340ea36a2ebc9
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Tue Jul 25 03:20:49 2023 -0400
  
-     virtio-net: fix race between set queues and probe
-     
-     BugLink: https://bugs.launchpad.net/bugs/2035400
-     
-     commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.
-     
-     A race were found where set_channels could be called after registering
-     but before virtnet_set_queues() in virtnet_probe(). Fixing this by
-     moving the virtnet_set_queues() before netdevice registering. While at
-     it, use _virtnet_set_queues() to avoid holding rtnl as the device is
-     not even registered at that time.
-     
-     Cc: sta...@vger.kernel.org
-     Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
-     Signed-off-by: Jason Wang <jasow...@redhat.com>
-     Acked-by: Michael S. Tsirkin <m...@redhat.com>
-     Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
-     Link: 
https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
-     Signed-off-by: Jakub Kicinski <k...@kernel.org>
-     Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
-     Signed-off-by: Kamal Mostafa <ka...@canonical.com>
-     Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
+     virtio-net: fix race between set queues and probe
+ 
+     BugLink: https://bugs.launchpad.net/bugs/2035400
+ 
+     commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.
+ 
+     A race were found where set_channels could be called after registering
+     but before virtnet_set_queues() in virtnet_probe(). Fixing this by
+     moving the virtnet_set_queues() before netdevice registering. While at
+     it, use _virtnet_set_queues() to avoid holding rtnl as the device is
+     not even registered at that time.
+ 
+     Cc: sta...@vger.kernel.org
+     Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
+     Signed-off-by: Jason Wang <jasow...@redhat.com>
+     Acked-by: Michael S. Tsirkin <m...@redhat.com>
+     Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
+     Link: 
https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
+     Signed-off-by: Jakub Kicinski <k...@kernel.org>
+     Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
+     Signed-off-by: Kamal Mostafa <ka...@canonical.com>
+     Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
  ```
  
  Investigation into Greg KH's 5.15 branch shows the followup as:
  
  ```
  commit 431db3f48c286462ad7453ccdf284f590aafa949
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Wed Aug 9 23:12:56 2023 -0400
  
-     virtio-net: set queues after driver_ok
-     
-     commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.
-     
-     Commit 25266128fe16 ("virtio-net: fix race between set queues and
-     probe") tries to fix the race between set queues and probe by calling
-     _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
-     spec. Fixing this by setting queues after virtio_device_ready().
-     
-     Note that rtnl needs to be held for userspace requests to change the
-     number of queues. So we are serialized in this way.
-     
-     Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
-     Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
-     Acked-by: Michael S. Tsirkin <m...@redhat.com>
-     Signed-off-by: Jason Wang <jasow...@redhat.com>
-     Signed-off-by: David S. Miller <da...@davemloft.net>
-     Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
+     virtio-net: set queues after driver_ok
+ 
+     commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.
+ 
+     Commit 25266128fe16 ("virtio-net: fix race between set queues and
+     probe") tries to fix the race between set queues and probe by calling
+     _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
+     spec. Fixing this by setting queues after virtio_device_ready().
+ 
+     Note that rtnl needs to be held for userspace requests to change the
+     number of queues. So we are serialized in this way.
+ 
+     Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
+     Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
+     Acked-by: Michael S. Tsirkin <m...@redhat.com>
+     Signed-off-by: Jason Wang <jasow...@redhat.com>
+     Signed-off-by: David S. Miller <da...@davemloft.net>
+     Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
  ```
+ 
+ Boot stack trace:
+ 
+ ```
+ [   28.129660] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
+ [systemd-udevd:165]
+ [   28.130265] Modules linked in: crct10dif_pclmul crc32_pclmul
+ ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net(+)
+ net_failover virtio_rng failover virtio_blk
+ [   28.131396] CPU: 1 PID: 165 Comm: systemd-udevd Not tainted
+ 5.15.0-89-generic https://github.com/ubicloud/ubicloud/pull/99-Ubuntu
+ [   28.131997] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
+ [   28.132479] RIP: 0010:virtnet_send_command+0x10b/0x170 [virtio_net]
+ [   28.132951] Code: 0b 83 c1 d8 85 c0 0f 88 d2 6e 00 00 48 8b 7b 08 e8
+ 6a 72 c1 d8 84 c0 75 11 eb 56 48 8b 7b 08 e8 6b 5e c1 d8 84 c0 75 17 f3
+ 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 f5 71 c1 d8 48 85 c0 74 dc 48
+ [   28.134326] RSP: 0018:ffff9b0c4064f9b8 EFLAGS: 00000246
+ [   28.134720] RAX: 0000000000000000 RBX: ffff89dfc0d13980 RCX:
+ 0000000000000a20
+ [   28.135252] RDX: 0000000000000000 RSI: ffff9b0c4064f9bc RDI:
+ ffff89dfc7cc00c0
+ [   28.135787] RBP: ffff9b0c4064fa50 R08: 0000000000000001 R09:
+ 0000000000000003
+ [   28.136316] R10: 0000000000000003 R11: 0000000000000002 R12:
+ ffff9b0c4064f9e0
+ [   28.136851] R13: 0000000000000002 R14: 0000000000000004 R15:
+ ffff89dfc0c49400
+ [   28.137381] FS:  00007feeba10e8c0(0000) GS:ffff89e0f7d00000(0000)
+ knlGS:0000000000000000
+ [   28.137981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ [   28.138408] CR2: 00007feeba90a0f8 CR3: 0000000100258000 CR4:
+ 0000000000350ee0
+ [   28.138940] Call Trace:
+ [   28.139129]  <IRQ>
+ [   28.139291]  ? show_trace_log_lvl+0x1d6/0x2ea
+ [   28.139627]  ? show_trace_log_lvl+0x1d6/0x2ea
+ [   28.139957]  ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
+ [   28.140369]  ? show_regs.part.0+0x23/0x29
+ [   28.140672]  ? show_regs.cold+0x8/0xd
+ [   28.140950]  ? watchdog_timer_fn+0x1be/0x220
+ [   28.141273]  ? lockup_detector_update_enable+0x60/0x60
+ [   28.141657]  ? __hrtimer_run_queues+0x107/0x230
+ [   28.142011]  ? clockevents_program_event+0xad/0x130
+ [   28.142377]  ? hrtimer_interrupt+0x101/0x220
+ [   28.142698]  ? __sysvec_apic_timer_interrupt+0x61/0xe0
+ [   28.143084]  ? sysvec_apic_timer_interrupt+0x7b/0x90
+ [   28.143460]  </IRQ>
+ ```

** Description changed:

  I use a non-qemu VMM, cloud-hypervisor. It looks like a patch was
  applied, that introduced a bug, and the fix to that patch a couple of
  weeks later seen upstream was not applied. The result of the bug is the
  kernel will not boot.
  
  Cumulative diff:
  
  ```
  > git diff  Ubuntu-5.15.0-86.96 Ubuntu-5.15.0-89.99 drivers/net/virtio_net.c
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 0351f86494f1..af335f8266c2 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -3319,6 +3319,8 @@ static int virtnet_probe(struct virtio_device *vdev)
                  }
          }
  
  +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
  +
          /* serialize netdev register + virtio_device_ready() with ndo_open() 
*/
          rtnl_lock();
  
  @@ -3339,8 +3341,6 @@ static int virtnet_probe(struct virtio_device *vdev)
                  goto free_unregister_netdev;
          }
  
  -       virtnet_set_queues(vi, vi->curr_queue_pairs);
  -
          /* Assume link up if device can't report link status,
             otherwise get link status from config. */
          netif_carrier_off(dev);
  ```
  
  Blamed Commit:
  
  ```
  commit 5e0545ef5682562ffef072138d9340ea36a2ebc9
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Tue Jul 25 03:20:49 2023 -0400
  
      virtio-net: fix race between set queues and probe
  
      BugLink: https://bugs.launchpad.net/bugs/2035400
  
      commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.
  
      A race were found where set_channels could be called after registering
      but before virtnet_set_queues() in virtnet_probe(). Fixing this by
      moving the virtnet_set_queues() before netdevice registering. While at
      it, use _virtnet_set_queues() to avoid holding rtnl as the device is
      not even registered at that time.
  
      Cc: sta...@vger.kernel.org
      Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
      Link: 
https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
      Signed-off-by: Jakub Kicinski <k...@kernel.org>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
      Signed-off-by: Kamal Mostafa <ka...@canonical.com>
      Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
  ```
  
- Investigation into Greg KH's 5.15 branch shows the followup as:
+ Investigation into Greg KH's 5.15 branch shows the (unapplied?) followup
+ as:
  
  ```
  commit 431db3f48c286462ad7453ccdf284f590aafa949
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Wed Aug 9 23:12:56 2023 -0400
  
      virtio-net: set queues after driver_ok
  
      commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.
  
      Commit 25266128fe16 ("virtio-net: fix race between set queues and
      probe") tries to fix the race between set queues and probe by calling
      _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
      spec. Fixing this by setting queues after virtio_device_ready().
  
      Note that rtnl needs to be held for userspace requests to change the
      number of queues. So we are serialized in this way.
  
      Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
      Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Signed-off-by: David S. Miller <da...@davemloft.net>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
  ```
  
  Boot stack trace:
  
  ```
  [   28.129660] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
  [systemd-udevd:165]
  [   28.130265] Modules linked in: crct10dif_pclmul crc32_pclmul
  ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net(+)
  net_failover virtio_rng failover virtio_blk
  [   28.131396] CPU: 1 PID: 165 Comm: systemd-udevd Not tainted
  5.15.0-89-generic https://github.com/ubicloud/ubicloud/pull/99-Ubuntu
  [   28.131997] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
  [   28.132479] RIP: 0010:virtnet_send_command+0x10b/0x170 [virtio_net]
  [   28.132951] Code: 0b 83 c1 d8 85 c0 0f 88 d2 6e 00 00 48 8b 7b 08 e8
  6a 72 c1 d8 84 c0 75 11 eb 56 48 8b 7b 08 e8 6b 5e c1 d8 84 c0 75 17 f3
  90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 f5 71 c1 d8 48 85 c0 74 dc 48
  [   28.134326] RSP: 0018:ffff9b0c4064f9b8 EFLAGS: 00000246
  [   28.134720] RAX: 0000000000000000 RBX: ffff89dfc0d13980 RCX:
  0000000000000a20
  [   28.135252] RDX: 0000000000000000 RSI: ffff9b0c4064f9bc RDI:
  ffff89dfc7cc00c0
  [   28.135787] RBP: ffff9b0c4064fa50 R08: 0000000000000001 R09:
  0000000000000003
  [   28.136316] R10: 0000000000000003 R11: 0000000000000002 R12:
  ffff9b0c4064f9e0
  [   28.136851] R13: 0000000000000002 R14: 0000000000000004 R15:
  ffff89dfc0c49400
  [   28.137381] FS:  00007feeba10e8c0(0000) GS:ffff89e0f7d00000(0000)
  knlGS:0000000000000000
  [   28.137981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   28.138408] CR2: 00007feeba90a0f8 CR3: 0000000100258000 CR4:
  0000000000350ee0
  [   28.138940] Call Trace:
  [   28.139129]  <IRQ>
  [   28.139291]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139627]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139957]  ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
  [   28.140369]  ? show_regs.part.0+0x23/0x29
  [   28.140672]  ? show_regs.cold+0x8/0xd
  [   28.140950]  ? watchdog_timer_fn+0x1be/0x220
  [   28.141273]  ? lockup_detector_update_enable+0x60/0x60
  [   28.141657]  ? __hrtimer_run_queues+0x107/0x230
  [   28.142011]  ? clockevents_program_event+0xad/0x130
  [   28.142377]  ? hrtimer_interrupt+0x101/0x220
  [   28.142698]  ? __sysvec_apic_timer_interrupt+0x61/0xe0
  [   28.143084]  ? sysvec_apic_timer_interrupt+0x7b/0x90
  [   28.143460]  </IRQ>
  ```

** Description changed:

  I use a non-qemu VMM, cloud-hypervisor. It looks like a patch was
- applied, that introduced a bug, and the fix to that patch a couple of
- weeks later seen upstream was not applied. The result of the bug is the
- kernel will not boot.
+ applied, that introduced a bug, a week later another patch got written
+ to fix that bug, and that second patch was not applied in Ubuntu's
+ release, but is seen in Greg KH's 5.15 branch.
+ 
+ The result of the bug is the kernel will not boot.
  
  Cumulative diff:
  
  ```
  > git diff  Ubuntu-5.15.0-86.96 Ubuntu-5.15.0-89.99 drivers/net/virtio_net.c
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 0351f86494f1..af335f8266c2 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -3319,6 +3319,8 @@ static int virtnet_probe(struct virtio_device *vdev)
                  }
          }
  
  +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
  +
          /* serialize netdev register + virtio_device_ready() with ndo_open() 
*/
          rtnl_lock();
  
  @@ -3339,8 +3341,6 @@ static int virtnet_probe(struct virtio_device *vdev)
                  goto free_unregister_netdev;
          }
  
  -       virtnet_set_queues(vi, vi->curr_queue_pairs);
  -
          /* Assume link up if device can't report link status,
             otherwise get link status from config. */
          netif_carrier_off(dev);
  ```
  
  Blamed Commit:
  
  ```
  commit 5e0545ef5682562ffef072138d9340ea36a2ebc9
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Tue Jul 25 03:20:49 2023 -0400
  
      virtio-net: fix race between set queues and probe
  
      BugLink: https://bugs.launchpad.net/bugs/2035400
  
      commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.
  
      A race were found where set_channels could be called after registering
      but before virtnet_set_queues() in virtnet_probe(). Fixing this by
      moving the virtnet_set_queues() before netdevice registering. While at
      it, use _virtnet_set_queues() to avoid holding rtnl as the device is
      not even registered at that time.
  
      Cc: sta...@vger.kernel.org
      Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
      Link: 
https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
      Signed-off-by: Jakub Kicinski <k...@kernel.org>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
      Signed-off-by: Kamal Mostafa <ka...@canonical.com>
      Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
  ```
  
  Investigation into Greg KH's 5.15 branch shows the (unapplied?) followup
  as:
  
  ```
  commit 431db3f48c286462ad7453ccdf284f590aafa949
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Wed Aug 9 23:12:56 2023 -0400
  
      virtio-net: set queues after driver_ok
  
      commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.
  
      Commit 25266128fe16 ("virtio-net: fix race between set queues and
      probe") tries to fix the race between set queues and probe by calling
      _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
      spec. Fixing this by setting queues after virtio_device_ready().
  
      Note that rtnl needs to be held for userspace requests to change the
      number of queues. So we are serialized in this way.
  
      Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
      Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Signed-off-by: David S. Miller <da...@davemloft.net>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
  ```
  
  Boot stack trace:
  
  ```
  [   28.129660] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
  [systemd-udevd:165]
  [   28.130265] Modules linked in: crct10dif_pclmul crc32_pclmul
  ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net(+)
  net_failover virtio_rng failover virtio_blk
  [   28.131396] CPU: 1 PID: 165 Comm: systemd-udevd Not tainted
  5.15.0-89-generic https://github.com/ubicloud/ubicloud/pull/99-Ubuntu
  [   28.131997] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
  [   28.132479] RIP: 0010:virtnet_send_command+0x10b/0x170 [virtio_net]
  [   28.132951] Code: 0b 83 c1 d8 85 c0 0f 88 d2 6e 00 00 48 8b 7b 08 e8
  6a 72 c1 d8 84 c0 75 11 eb 56 48 8b 7b 08 e8 6b 5e c1 d8 84 c0 75 17 f3
  90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 f5 71 c1 d8 48 85 c0 74 dc 48
  [   28.134326] RSP: 0018:ffff9b0c4064f9b8 EFLAGS: 00000246
  [   28.134720] RAX: 0000000000000000 RBX: ffff89dfc0d13980 RCX:
  0000000000000a20
  [   28.135252] RDX: 0000000000000000 RSI: ffff9b0c4064f9bc RDI:
  ffff89dfc7cc00c0
  [   28.135787] RBP: ffff9b0c4064fa50 R08: 0000000000000001 R09:
  0000000000000003
  [   28.136316] R10: 0000000000000003 R11: 0000000000000002 R12:
  ffff9b0c4064f9e0
  [   28.136851] R13: 0000000000000002 R14: 0000000000000004 R15:
  ffff89dfc0c49400
  [   28.137381] FS:  00007feeba10e8c0(0000) GS:ffff89e0f7d00000(0000)
  knlGS:0000000000000000
  [   28.137981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   28.138408] CR2: 00007feeba90a0f8 CR3: 0000000100258000 CR4:
  0000000000350ee0
  [   28.138940] Call Trace:
  [   28.139129]  <IRQ>
  [   28.139291]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139627]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139957]  ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
  [   28.140369]  ? show_regs.part.0+0x23/0x29
  [   28.140672]  ? show_regs.cold+0x8/0xd
  [   28.140950]  ? watchdog_timer_fn+0x1be/0x220
  [   28.141273]  ? lockup_detector_update_enable+0x60/0x60
  [   28.141657]  ? __hrtimer_run_queues+0x107/0x230
  [   28.142011]  ? clockevents_program_event+0xad/0x130
  [   28.142377]  ? hrtimer_interrupt+0x101/0x220
  [   28.142698]  ? __sysvec_apic_timer_interrupt+0x61/0xe0
  [   28.143084]  ? sysvec_apic_timer_interrupt+0x7b/0x90
  [   28.143460]  </IRQ>
  ```

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045443

Title:
  Ubuntu-5.15.0-89.99 breaks virtio-net spec and doesn't boot

Status in linux package in Ubuntu:
  New

Bug description:
  I use a non-qemu VMM, cloud-hypervisor. It looks like a patch was
  applied, that introduced a bug, a week later another patch got written
  to fix that bug, and that second patch was not applied in Ubuntu's
  release, but is seen in Greg KH's 5.15 branch.

  The result of the bug is the kernel will not boot.

  Cumulative diff:

  ```
  > git diff  Ubuntu-5.15.0-86.96 Ubuntu-5.15.0-89.99 drivers/net/virtio_net.c
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 0351f86494f1..af335f8266c2 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -3319,6 +3319,8 @@ static int virtnet_probe(struct virtio_device *vdev)
                  }
          }

  +       _virtnet_set_queues(vi, vi->curr_queue_pairs);
  +
          /* serialize netdev register + virtio_device_ready() with ndo_open() 
*/
          rtnl_lock();

  @@ -3339,8 +3341,6 @@ static int virtnet_probe(struct virtio_device *vdev)
                  goto free_unregister_netdev;
          }

  -       virtnet_set_queues(vi, vi->curr_queue_pairs);
  -
          /* Assume link up if device can't report link status,
             otherwise get link status from config. */
          netif_carrier_off(dev);
  ```

  Blamed Commit:

  ```
  commit 5e0545ef5682562ffef072138d9340ea36a2ebc9
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Tue Jul 25 03:20:49 2023 -0400

      virtio-net: fix race between set queues and probe

      BugLink: https://bugs.launchpad.net/bugs/2035400

      commit 25266128fe16d5632d43ada34c847d7b8daba539 upstream.

      A race were found where set_channels could be called after registering
      but before virtnet_set_queues() in virtnet_probe(). Fixing this by
      moving the virtnet_set_queues() before netdevice registering. While at
      it, use _virtnet_set_queues() to avoid holding rtnl as the device is
      not even registered at that time.

      Cc: sta...@vger.kernel.org
      Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Reviewed-by: Xuan Zhuo <xuanz...@linux.alibaba.com>
      Link: 
https://lore.kernel.org/r/20230725072049.617289-1-jasow...@redhat.com
      Signed-off-by: Jakub Kicinski <k...@kernel.org>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
      Signed-off-by: Kamal Mostafa <ka...@canonical.com>
      Signed-off-by: Stefan Bader <stefan.ba...@canonical.com>
  ```

  Investigation into Greg KH's 5.15 branch shows the (unapplied?)
  followup as:

  ```
  commit 431db3f48c286462ad7453ccdf284f590aafa949
  Author: Jason Wang <jasow...@redhat.com>
  Date:   Wed Aug 9 23:12:56 2023 -0400

      virtio-net: set queues after driver_ok

      commit 51b813176f098ff61bd2833f627f5319ead098a5 upstream.

      Commit 25266128fe16 ("virtio-net: fix race between set queues and
      probe") tries to fix the race between set queues and probe by calling
      _virtnet_set_queues() before DRIVER_OK is set. This violates virtio
      spec. Fixing this by setting queues after virtio_device_ready().

      Note that rtnl needs to be held for userspace requests to change the
      number of queues. So we are serialized in this way.

      Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe")
      Reported-by: Dragos Tatulea <dtatu...@nvidia.com>
      Acked-by: Michael S. Tsirkin <m...@redhat.com>
      Signed-off-by: Jason Wang <jasow...@redhat.com>
      Signed-off-by: David S. Miller <da...@davemloft.net>
      Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
  ```

  Boot stack trace:

  ```
  [   28.129660] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
  [systemd-udevd:165]
  [   28.130265] Modules linked in: crct10dif_pclmul crc32_pclmul
  ghash_clmulni_intel aesni_intel crypto_simd cryptd virtio_net(+)
  net_failover virtio_rng failover virtio_blk
  [   28.131396] CPU: 1 PID: 165 Comm: systemd-udevd Not tainted
  5.15.0-89-generic https://github.com/ubicloud/ubicloud/pull/99-Ubuntu
  [   28.131997] Hardware name: Cloud Hypervisor cloud-hypervisor, BIOS 0
  [   28.132479] RIP: 0010:virtnet_send_command+0x10b/0x170 [virtio_net]
  [   28.132951] Code: 0b 83 c1 d8 85 c0 0f 88 d2 6e 00 00 48 8b 7b 08 e8
  6a 72 c1 d8 84 c0 75 11 eb 56 48 8b 7b 08 e8 6b 5e c1 d8 84 c0 75 17 f3
  90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 f5 71 c1 d8 48 85 c0 74 dc 48
  [   28.134326] RSP: 0018:ffff9b0c4064f9b8 EFLAGS: 00000246
  [   28.134720] RAX: 0000000000000000 RBX: ffff89dfc0d13980 RCX:
  0000000000000a20
  [   28.135252] RDX: 0000000000000000 RSI: ffff9b0c4064f9bc RDI:
  ffff89dfc7cc00c0
  [   28.135787] RBP: ffff9b0c4064fa50 R08: 0000000000000001 R09:
  0000000000000003
  [   28.136316] R10: 0000000000000003 R11: 0000000000000002 R12:
  ffff9b0c4064f9e0
  [   28.136851] R13: 0000000000000002 R14: 0000000000000004 R15:
  ffff89dfc0c49400
  [   28.137381] FS:  00007feeba10e8c0(0000) GS:ffff89e0f7d00000(0000)
  knlGS:0000000000000000
  [   28.137981] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   28.138408] CR2: 00007feeba90a0f8 CR3: 0000000100258000 CR4:
  0000000000350ee0
  [   28.138940] Call Trace:
  [   28.139129]  <IRQ>
  [   28.139291]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139627]  ? show_trace_log_lvl+0x1d6/0x2ea
  [   28.139957]  ? _virtnet_set_queues+0xbb/0x100 [virtio_net]
  [   28.140369]  ? show_regs.part.0+0x23/0x29
  [   28.140672]  ? show_regs.cold+0x8/0xd
  [   28.140950]  ? watchdog_timer_fn+0x1be/0x220
  [   28.141273]  ? lockup_detector_update_enable+0x60/0x60
  [   28.141657]  ? __hrtimer_run_queues+0x107/0x230
  [   28.142011]  ? clockevents_program_event+0xad/0x130
  [   28.142377]  ? hrtimer_interrupt+0x101/0x220
  [   28.142698]  ? __sysvec_apic_timer_interrupt+0x61/0xe0
  [   28.143084]  ? sysvec_apic_timer_interrupt+0x7b/0x90
  [   28.143460]  </IRQ>
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045443/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to