Hi,

We tried to repro issue with kernel 'linux-image-
unsigned-5.15.0-59-generic_5.15.0-59.65_amd64.deb' from -proposed
repository. The issue is not observed.

After link down/up sequence, nvme controllers 4, 68, 69 and 70 reconnect
successfully.

[  793.550538] nvme nvme4: queue 0: timeout request 0x0 type 4
[  793.550544] nvme nvme4: starting error recovery
[  793.552141] nvme nvme4: failed nvme_keep_alive_end_io error=10
[  793.567947] nvme nvme4: Reconnecting in 10 seconds...
[  794.574539] nvme nvme70: queue 0: timeout request 0x0 type 4
[  794.574543] nvme nvme70: starting error recovery
[  794.574544] nvme nvme68: queue 0: timeout request 0x0 type 4
[  794.574548] nvme nvme69: queue 0: timeout request 0x0 type 4
[  794.574549] nvme nvme68: starting error recovery
[  794.574550] nvme nvme69: starting error recovery
[  794.574768] nvme nvme70: failed nvme_keep_alive_end_io error=10
[  794.574793] nvme nvme69: failed nvme_keep_alive_end_io error=10
[  794.574877] nvme nvme68: failed nvme_keep_alive_end_io error=10
[  794.591403] nvme nvme70: Reconnecting in 10 seconds...
[  794.591628] nvme nvme69: Reconnecting in 10 seconds...
[  794.594555] nvme nvme68: Reconnecting in 10 seconds...
[  796.631586] IPv6: ADDRCONF(NETDEV_CHANGE): eno33np0: link becomes ready
[  803.632108] nvme nvme4: creating 64 I/O queues.
[  803.668542] nvme nvme4: mapped 64/0/0 default/read/poll queues.
[  803.671517] nvme nvme4: Successfully reconnected (1 attempt)
[  804.655794] nvme nvme70: queue_size 128 > ctrl sqsize 64, clamping down
[  804.655886] nvme nvme70: creating 64 I/O queues.
[  804.655961] nvme nvme68: queue_size 128 > ctrl sqsize 64, clamping down
[  804.655994] nvme nvme69: queue_size 128 > ctrl sqsize 64, clamping down
[  804.656042] nvme nvme68: creating 64 I/O queues.
[  804.656043] nvme nvme69: creating 64 I/O queues.
[  804.669742] nvme nvme69: mapped 64/0/0 default/read/poll queues.
[  804.669761] nvme nvme70: mapped 64/0/0 default/read/poll queues.
[  804.669773] nvme nvme68: mapped 64/0/0 default/read/poll queues.
[  804.685893] nvme nvme70: Successfully reconnected (1 attempt)
[  804.702605] nvme nvme69: Successfully reconnected (1 attempt)
[  804.722602] nvme nvme68: Successfully reconnected (1 attempt)


** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1989990

Title:
  [SRU] Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target
  after  link down/link up sequence

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  In Progress

Bug description:
  [Impact]
  Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP target 
after link down event if the number of queues have changed post link down.

  [Fix]
  Following upstream patch set helps address the issue.

  1.
  nvmet: Expose max queues to configfs
  
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2

  2.
  nvme-tcp: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4

  3.
  nvme-rdma: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86

  The patch in Point 2 above helps address the failure to reconnect in
  NVMe TCP scenario.

  Also, following patch addresses error code parsing issue in the
  reconnect sequence.

  nvme-fabrics: parse nvme connect Linux error codes
  
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d

  [Test Plan]
  1.  Boot into Ubuntu 22.04 kernel without fix.

  2.  Establish connection to NVMe TCP target.

  3.  Toggle NIC link and bring link up after 10 seconds. When the NIC
  link is down, on the target increase the number of queues assigned to
  the controller.

  4.  Observe that connection to target is lost and after link comes up,
  controller from host tries to re-establish connection.

  5.  With patch, reconnection succeeds with higher number of queues

  [Where problems could occur]

  Regression risk is low to medium.

  [Other Info]

  Test Kernel Source

  
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/lp_1989990_nvme_tcp

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1989990/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to