[Kernel-packages] [Bug 1989990] Re: [SRU] Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target after link down/link up sequence

2023-01-16 Thread Narendra K
Hi,

We tried to repro issue with kernel 'linux-image-
unsigned-5.15.0-59-generic_5.15.0-59.65_amd64.deb' from -proposed
repository. The issue is not observed.

After link down/up sequence, nvme controllers 4, 68, 69 and 70 reconnect
successfully.

[  793.550538] nvme nvme4: queue 0: timeout request 0x0 type 4
[  793.550544] nvme nvme4: starting error recovery
[  793.552141] nvme nvme4: failed nvme_keep_alive_end_io error=10
[  793.567947] nvme nvme4: Reconnecting in 10 seconds...
[  794.574539] nvme nvme70: queue 0: timeout request 0x0 type 4
[  794.574543] nvme nvme70: starting error recovery
[  794.574544] nvme nvme68: queue 0: timeout request 0x0 type 4
[  794.574548] nvme nvme69: queue 0: timeout request 0x0 type 4
[  794.574549] nvme nvme68: starting error recovery
[  794.574550] nvme nvme69: starting error recovery
[  794.574768] nvme nvme70: failed nvme_keep_alive_end_io error=10
[  794.574793] nvme nvme69: failed nvme_keep_alive_end_io error=10
[  794.574877] nvme nvme68: failed nvme_keep_alive_end_io error=10
[  794.591403] nvme nvme70: Reconnecting in 10 seconds...
[  794.591628] nvme nvme69: Reconnecting in 10 seconds...
[  794.594555] nvme nvme68: Reconnecting in 10 seconds...
[  796.631586] IPv6: ADDRCONF(NETDEV_CHANGE): eno33np0: link becomes ready
[  803.632108] nvme nvme4: creating 64 I/O queues.
[  803.668542] nvme nvme4: mapped 64/0/0 default/read/poll queues.
[  803.671517] nvme nvme4: Successfully reconnected (1 attempt)
[  804.655794] nvme nvme70: queue_size 128 > ctrl sqsize 64, clamping down
[  804.655886] nvme nvme70: creating 64 I/O queues.
[  804.655961] nvme nvme68: queue_size 128 > ctrl sqsize 64, clamping down
[  804.655994] nvme nvme69: queue_size 128 > ctrl sqsize 64, clamping down
[  804.656042] nvme nvme68: creating 64 I/O queues.
[  804.656043] nvme nvme69: creating 64 I/O queues.
[  804.669742] nvme nvme69: mapped 64/0/0 default/read/poll queues.
[  804.669761] nvme nvme70: mapped 64/0/0 default/read/poll queues.
[  804.669773] nvme nvme68: mapped 64/0/0 default/read/poll queues.
[  804.685893] nvme nvme70: Successfully reconnected (1 attempt)
[  804.702605] nvme nvme69: Successfully reconnected (1 attempt)
[  804.722602] nvme nvme68: Successfully reconnected (1 attempt)


** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1989990

Title:
  [SRU] Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target
  after  link down/link up sequence

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  In Progress

Bug description:
  [Impact]
  Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP target 
after link down event if the number of queues have changed post link down.

  [Fix]
  Following upstream patch set helps address the issue.

  1.
  nvmet: Expose max queues to configfs
  
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2

  2.
  nvme-tcp: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4

  3.
  nvme-rdma: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86

  The patch in Point 2 above helps address the failure to reconnect in
  NVMe TCP scenario.

  Also, following patch addresses error code parsing issue in the
  reconnect sequence.

  nvme-fabrics: parse nvme connect Linux error codes
  
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d

  [Test Plan]
  1.  Boot into Ubuntu 22.04 kernel without fix.

  2.  Establish connection to NVMe TCP target.

  3.  Toggle NIC link and bring link up after 10 seconds. When the NIC
  link is down, on the target increase the number of queues assigned to
  the controller.

  4.  Observe that connection to target is lost and after link comes up,
  controller from host tries to re-establish connection.

  5.  With patch, reconnection succeeds with higher number of queues

  [Where problems could occur]

  Regression risk is low to medium.

  [Other Info]

  Test Kernel Source

  
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/lp_1989990_nvme_tcp

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1989990/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1989990] [NEW] Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target after link down/link up sequence

2022-09-16 Thread Narendra K
Private bug reported:

Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP target
after link down event if the number of queues have changed post link
down.

Following upstream patch set helps address the issue.

1. 
nvmet: Expose max queues to configfs
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2

2.
nvme-tcp: Handle number of queue changes
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4

3.
nvme-rdma: Handle number of queue changes
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86


The patch in Point 2 above helps address the failure to reconnect in NVMe TCP 
scenario.

Also, following patch addresses error code parsing issue in the
reconnect sequence.

nvme-fabrics: parse nvme connect Linux error codes
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

** Information type changed from Public to Private

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1989990

Title:
  Ubuntu 22.04 - NVMe TCP - Host fails to reconnect to target after
  link down/link up sequence

Status in linux package in Ubuntu:
  New

Bug description:
  Ubuntu 22.04 host fails to reconnect successfully to the NVMe TCP
  target after link down event if the number of queues have changed post
  link down.

  Following upstream patch set helps address the issue.

  1. 
  nvmet: Expose max queues to configfs
  
https://git.infradead.org/nvme.git/commit/2c4282742d049e2a5ab874e2b359a2421b9377c2

  2.
  nvme-tcp: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/516204e486a19d03962c2757ef49782e6c1cacf4

  3.
  nvme-rdma: Handle number of queue changes
  
https://git.infradead.org/nvme.git/commit/e800278c1dc97518eab1970f8f58a5aad52b0f86

  
  The patch in Point 2 above helps address the failure to reconnect in NVMe TCP 
scenario.

  Also, following patch addresses error code parsing issue in the
  reconnect sequence.

  nvme-fabrics: parse nvme connect Linux error codes
  
https://git.infradead.org/nvme.git/commit/ec9e96b5230148294c7abcaf3a4c592d3720b62d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1989990/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1965241] Re: Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment events

2022-07-20 Thread Narendra K
Basic sanity test shows positive results with 5.15.0-43.46 kernel from
-proposed repo.

** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1965241

Title:
  Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort
  Containment events

Status in dellserver:
  New
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:

  [Impact]
  Recovery from DownPort Containment events fail and the NVMe endpoint is not 
accessible in some scenarios.

  [Fix]

  These are some of the DPC fixes which help in handling some of the
  failure cases of DownPort Containment events.

  Upstream kernel patches to be included into Ubuntu 22.04 and into
  Ubuntu 20.04.5:

  Already in Jammy as of Ubuntu-5.15.0-1.1
  PCI/portdrv: Enable Bandwidth Notification only if port supports it
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6=00823dcbdd415c868390feaca16f0265101efab4

  PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6=ea401499e943c307e6d44af6c2b4e068643e7884

  3134689f98   PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()

  [Test Case]

  1. Disable the memory space of NVMe end point device
  2. Issue IO to the device
  3. Observe dmesg. dmesg shows that EDR event is generated, link is contained 
and NVMe device is recovered.

  2. Observe the dmesg

  [Other Info]
  
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/jammy/+ref/test_dpc_1965241

To manage notifications about this bug go to:
https://bugs.launchpad.net/dellserver/+bug/1965241/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1965241] Re: Include DPC Fixes in Ubuntu 22.04 and 20.04

2022-04-26 Thread Narendra K
Michael,

We tried the test kernel from comment #5. From the sanity tests, basic
functionality works as expected -

On a system where NVMe end point is connected to root port,

1. When an EDR event occurs, the link is contained and system does not crash.
2. The config space of NVMe end point device is restored. 

The DPC functionality does not work as expected if CONFIG_PCIE_EDR is
not enabled.

Test Case:

1. Disable the memory space of NVMe end point device 
2. Issue IO to the device
3. Observe dmesg. dmesg shows that EDR event is generated, link is contained 
and NVMe device is recovered.


2. Observe the dmesg

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1965241

Title:
  Include DPC Fixes in Ubuntu 22.04 and 20.04

Status in dellserver:
  New
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  SRU Justification:

  [Impact]
  Recovery from DownPort Containment events fail and the NVMe endpoint is not 
accessible in some scenarios.

  [Fix]

  These are some of the DPC fixes which help in handling some of the
  failure cases of DownPort Containment events.

  Upstream kernel patches to be included into Ubuntu 22.04 and into
  Ubuntu 20.04.5:

  Already in Jammy as of Ubuntu-5.15.0-1.1
  PCI/portdrv: Enable Bandwidth Notification only if port supports it
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6=00823dcbdd415c868390feaca16f0265101efab4

  Not yet pulled
  PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.17-rc6=ea401499e943c307e6d44af6c2b4e068643e7884

  [Test Case]

  [Other Info]
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1965241

To manage notifications about this bug go to:
https://bugs.launchpad.net/dellserver/+bug/1965241/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp