** Description changed:

  [Impact]
  Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP target 
after link down event if the number of queues have changed post link down.
  
  [Fix]
  Following upstream patch set helps address the issue.
  
  1.
  nvmet: Expose max queues to configfs
  
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2
  
  2.
  nvme-tcp: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4
  
  3.
  nvme-rdma: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86
  
  The patch in Point 2 above helps address the failure to reconnect in
  NVMe TCP scenario.
  
  Also, following patch addresses error code parsing issue in the
  reconnect sequence.
  
  nvme-fabrics: parse nvme connect Linux error codes
  
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d
  
  [Test Plan]
+ 1.  Boot into Ubuntu 22.04 kernel without fix.
  
- 1. Boot into Ubuntu 22.04 kernel without fix.
+ 2.  Establish connection to NVMe TCP target.
  
- 2. Establish connection to powerstore and create more than 70 NVMe
- controllers ( >64 controllers)
+ 3.  Toggle NIC link and bring link up after 10 seconds. When the NIC
+ link is down, on the target increase the number of queues assigned to
+ the controller.
  
- nvme connect -t tcp -a <target address> -n <target nqn> -D
- 
- Observe that nvme controllers > 64 get assigned 8 queues.
- 
- 3. Delete few controllers so that total number of controllers becomes <
- 64. This results in higher number of queues becoming available to
- remaining NVMe controllers.
- 
- nvme disconnect -d <nvme controller>
- 
- 4. Toggle NIC link and bring link up after 10 seconds.
- 
- 5. Observe that connection to target is lost and after link comes up,
+ 4.  Observe that connection to target is lost and after link comes up,
  controller from host tries to re-establish connection.
  
- 6. With patch, reconnection succeeds with higher number of queues.
+ 5.  With patch, reconnection succeeds with higher number of queues
  
  [Where problems could occur]
  
  Regression risk is low to medium.
  
  [Other Info]
  
  Test Kernel Source
  
  
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/lp_1989990_nvme_tcp

** Information type changed from Private to Public

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1989990

Title:
  [SRU] Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target
  after  link down/link up sequence

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  [Impact]
  Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP target 
after link down event if the number of queues have changed post link down.

  [Fix]
  Following upstream patch set helps address the issue.

  1.
  nvmet: Expose max queues to configfs
  
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2

  2.
  nvme-tcp: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4

  3.
  nvme-rdma: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86

  The patch in Point 2 above helps address the failure to reconnect in
  NVMe TCP scenario.

  Also, following patch addresses error code parsing issue in the
  reconnect sequence.

  nvme-fabrics: parse nvme connect Linux error codes
  
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d

  [Test Plan]
  1.  Boot into Ubuntu 22.04 kernel without fix.

  2.  Establish connection to NVMe TCP target.

  3.  Toggle NIC link and bring link up after 10 seconds. When the NIC
  link is down, on the target increase the number of queues assigned to
  the controller.

  4.  Observe that connection to target is lost and after link comes up,
  controller from host tries to re-establish connection.

  5.  With patch, reconnection succeeds with higher number of queues

  [Where problems could occur]

  Regression risk is low to medium.

  [Other Info]

  Test Kernel Source

  
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/lp_1989990_nvme_tcp

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1989990/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to