Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
Hi Laurence, Great thanks for your so quick test! On Fri, May 11, 2018 at 5:59 AM, Laurence Oberman wrote: > On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote: >> On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote: >> > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: >> > > Hi, >> > > >> > > The 1st patch introduces blk_quiesce_timeout() and >> > > blk_unquiesce_timeout() >> > > for NVMe, meantime fixes blk_sync_queue(). >> > > >> > > The 2nd patch covers timeout for admin commands for recovering >> > > controller >> > > for avoiding possible deadlock. >> > > >> > > The 3rd and 4th patches avoid to wait_freeze on queues which >> > > aren't >> > > frozen. >> > > >> > > The last 4 patches fixes several races wrt. NVMe timeout handler, >> > > and >> > > finally can make blktests block/011 passed. Meantime the NVMe PCI >> > > timeout >> > > mecanism become much more rebost than before. >> > > >> > > gitweb: >> > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 >> > > >> > > V4: >> > > - fixe nvme_init_set_host_mem_cmd() >> > > - use nested EH model, and run both nvme_dev_disable() and >> > > resetting in one same context >> > > >> > > V3: >> > > - fix one new race related freezing in patch 4, >> > > nvme_reset_work() >> > > may hang forever without this patch >> > > - rewrite the last 3 patches, and avoid to break >> > > nvme_reset_ctrl*() >> > > >> > > V2: >> > > - fix draining timeout work, so no need to change return value >> > > from >> > > .timeout() >> > > - fix race between nvme_start_freeze() and nvme_unfreeze() >> > > - cover timeout for admin commands running in EH >> > > >> > > Ming Lei (7): >> > > block: introduce blk_quiesce_timeout() and >> > > blk_unquiesce_timeout() >> > > nvme: pci: cover timeout for admin commands running in EH >> > > nvme: pci: only wait freezing if queue is frozen >> > > nvme: pci: freeze queue in nvme_dev_disable() in case of error >> > > recovery >> > > nvme: core: introduce 'reset_lock' for sync reset state and >> > > reset >> > > activities >> > > nvme: pci: prepare for supporting error recovery from resetting >> > > context >> > > nvme: pci: support nested EH >> > > >> > > block/blk-core.c | 21 +++- >> > > block/blk-mq.c | 9 ++ >> > > block/blk-timeout.c | 5 +- >> > > drivers/nvme/host/core.c | 46 ++- >> > > drivers/nvme/host/nvme.h | 5 + >> > > drivers/nvme/host/pci.c | 304 >> > > --- >> > > include/linux/blkdev.h | 13 ++ >> > > 7 files changed, 356 insertions(+), 47 deletions(-) >> > > >> > > Cc: Jianchao Wang >> > > Cc: Christoph Hellwig >> > > Cc: Sagi Grimberg >> > > Cc: linux-n...@lists.infradead.org >> > > Cc: Laurence Oberman >> > >> > Hello Ming >> > >> > I have a two node NUMA system here running your kernel tree >> > 4.17.0-rc3.ming.nvme+ >> > >> > [root@segstorage1 ~]# numactl --hardware >> > available: 2 nodes (0-1) >> > node 0 cpus: 0 3 5 6 8 11 13 14 >> > node 0 size: 63922 MB >> > node 0 free: 61310 MB >> > node 1 cpus: 1 2 4 7 9 10 12 15 >> > node 1 size: 64422 MB >> > node 1 free: 62372 MB >> > node distances: >> > node 0 1 >> > 0: 10 20 >> > 1: 20 10 >> > >> > I ran block/011 >> > >> > [root@segstorage1 blktests]# ./check block/011 >> > block/011 => nvme0n1 (disable PCI device while doing >> > I/O)[failed] >> > runtime... 106.936s >> > --- tests/block/011.out 2018-05-05 18:01:14.268414752 >> > -0400 >> > +++ results/nvme0n1/block/011.out.bad 2018-05-05 >> > 19:07:21.028634858 -0400 >> > @@ -1,2 +1,36 @@ >> > Running block/011 >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & >> > IO_U_F_FLIGHT) == 0' failed. >> > ... >> > (Run 'diff -u tests/block/011.out >> > results/nvme0n1/block/011.out.bad' to see the entire diff) >> > >> > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 >> > [ 1452.676351] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.718221] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.718239] nvme nvme0: EH 0: before shutdown >> > [ 1452.760890] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760894] nvme nvme0: controller is down; will reset: >> > CSTS=0x3, >> > PCI_STATUS=0x10 >> > [ 1452.760897] nvme nvme0: controller is down; wil
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Thu, 2018-05-10 at 18:28 +0800, Ming Lei wrote: > On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote: > > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: > > > Hi, > > > > > > The 1st patch introduces blk_quiesce_timeout() and > > > blk_unquiesce_timeout() > > > for NVMe, meantime fixes blk_sync_queue(). > > > > > > The 2nd patch covers timeout for admin commands for recovering > > > controller > > > for avoiding possible deadlock. > > > > > > The 3rd and 4th patches avoid to wait_freeze on queues which > > > aren't > > > frozen. > > > > > > The last 4 patches fixes several races wrt. NVMe timeout handler, > > > and > > > finally can make blktests block/011 passed. Meantime the NVMe PCI > > > timeout > > > mecanism become much more rebost than before. > > > > > > gitweb: > > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 > > > > > > V4: > > > - fixe nvme_init_set_host_mem_cmd() > > > - use nested EH model, and run both nvme_dev_disable() and > > > resetting in one same context > > > > > > V3: > > > - fix one new race related freezing in patch 4, > > > nvme_reset_work() > > > may hang forever without this patch > > > - rewrite the last 3 patches, and avoid to break > > > nvme_reset_ctrl*() > > > > > > V2: > > > - fix draining timeout work, so no need to change return value > > > from > > > .timeout() > > > - fix race between nvme_start_freeze() and nvme_unfreeze() > > > - cover timeout for admin commands running in EH > > > > > > Ming Lei (7): > > > block: introduce blk_quiesce_timeout() and > > > blk_unquiesce_timeout() > > > nvme: pci: cover timeout for admin commands running in EH > > > nvme: pci: only wait freezing if queue is frozen > > > nvme: pci: freeze queue in nvme_dev_disable() in case of error > > > recovery > > > nvme: core: introduce 'reset_lock' for sync reset state and > > > reset > > > activities > > > nvme: pci: prepare for supporting error recovery from resetting > > > context > > > nvme: pci: support nested EH > > > > > > block/blk-core.c | 21 +++- > > > block/blk-mq.c | 9 ++ > > > block/blk-timeout.c | 5 +- > > > drivers/nvme/host/core.c | 46 ++- > > > drivers/nvme/host/nvme.h | 5 + > > > drivers/nvme/host/pci.c | 304 > > > --- > > > include/linux/blkdev.h | 13 ++ > > > 7 files changed, 356 insertions(+), 47 deletions(-) > > > > > > Cc: Jianchao Wang > > > Cc: Christoph Hellwig > > > Cc: Sagi Grimberg > > > Cc: linux-n...@lists.infradead.org > > > Cc: Laurence Oberman > > > > Hello Ming > > > > I have a two node NUMA system here running your kernel tree > > 4.17.0-rc3.ming.nvme+ > > > > [root@segstorage1 ~]# numactl --hardware > > available: 2 nodes (0-1) > > node 0 cpus: 0 3 5 6 8 11 13 14 > > node 0 size: 63922 MB > > node 0 free: 61310 MB > > node 1 cpus: 1 2 4 7 9 10 12 15 > > node 1 size: 64422 MB > > node 1 free: 62372 MB > > node distances: > > node 0 1 > > 0: 10 20 > > 1: 20 10 > > > > I ran block/011 > > > > [root@segstorage1 blktests]# ./check block/011 > > block/011 => nvme0n1 (disable PCI device while doing > > I/O)[failed] > > runtime... 106.936s > > --- tests/block/011.out 2018-05-05 18:01:14.268414752 > > -0400 > > +++ results/nvme0n1/block/011.out.bad 2018-05-05 > > 19:07:21.028634858 -0400 > > @@ -1,2 +1,36 @@ > > Running block/011 > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > ... > > (Run 'diff -u tests/block/011.out > > results/nvme0n1/block/011.out.bad' to see the entire diff) > > > > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 > > [ 1452.676351] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.718221] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.718239] nvme nvme0: EH 0: before shutdown > > [ 1452.760890] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760894] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760897] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760900] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760903] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_ST
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Sat, May 05, 2018 at 07:11:33PM -0400, Laurence Oberman wrote: > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: > > Hi, > > > > The 1st patch introduces blk_quiesce_timeout() and > > blk_unquiesce_timeout() > > for NVMe, meantime fixes blk_sync_queue(). > > > > The 2nd patch covers timeout for admin commands for recovering > > controller > > for avoiding possible deadlock. > > > > The 3rd and 4th patches avoid to wait_freeze on queues which aren't > > frozen. > > > > The last 4 patches fixes several races wrt. NVMe timeout handler, and > > finally can make blktests block/011 passed. Meantime the NVMe PCI > > timeout > > mecanism become much more rebost than before. > > > > gitweb: > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 > > > > V4: > > - fixe nvme_init_set_host_mem_cmd() > > - use nested EH model, and run both nvme_dev_disable() and > > resetting in one same context > > > > V3: > > - fix one new race related freezing in patch 4, > > nvme_reset_work() > > may hang forever without this patch > > - rewrite the last 3 patches, and avoid to break > > nvme_reset_ctrl*() > > > > V2: > > - fix draining timeout work, so no need to change return value > > from > > .timeout() > > - fix race between nvme_start_freeze() and nvme_unfreeze() > > - cover timeout for admin commands running in EH > > > > Ming Lei (7): > > block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() > > nvme: pci: cover timeout for admin commands running in EH > > nvme: pci: only wait freezing if queue is frozen > > nvme: pci: freeze queue in nvme_dev_disable() in case of error > > recovery > > nvme: core: introduce 'reset_lock' for sync reset state and reset > > activities > > nvme: pci: prepare for supporting error recovery from resetting > > context > > nvme: pci: support nested EH > > > > block/blk-core.c | 21 +++- > > block/blk-mq.c | 9 ++ > > block/blk-timeout.c | 5 +- > > drivers/nvme/host/core.c | 46 ++- > > drivers/nvme/host/nvme.h | 5 + > > drivers/nvme/host/pci.c | 304 > > --- > > include/linux/blkdev.h | 13 ++ > > 7 files changed, 356 insertions(+), 47 deletions(-) > > > > Cc: Jianchao Wang > > Cc: Christoph Hellwig > > Cc: Sagi Grimberg > > Cc: linux-n...@lists.infradead.org > > Cc: Laurence Oberman > > Hello Ming > > I have a two node NUMA system here running your kernel tree > 4.17.0-rc3.ming.nvme+ > > [root@segstorage1 ~]# numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 3 5 6 8 11 13 14 > node 0 size: 63922 MB > node 0 free: 61310 MB > node 1 cpus: 1 2 4 7 9 10 12 15 > node 1 size: 64422 MB > node 1 free: 62372 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > I ran block/011 > > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O)[failed] > runtime... 106.936s > --- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400 > +++ results/nvme0n1/block/011.out.bad 2018-05-05 > 19:07:21.028634858 -0400 > @@ -1,2 +1,36 @@ > Running block/011 > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > ... > (Run 'diff -u tests/block/011.out > results/nvme0n1/block/011.out.bad' to see the entire diff) > > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 > [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.718239] nvme nvme0: EH 0: before shutdown > [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Wed, May 09, 2018 at 01:46:09PM +0800, jianchao.wang wrote: > Hi ming > > I did some tests on my local. > > [ 598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller > > This should be a timeout on nvme_reset_dev->nvme_wait_freeze. > > [ 598.828743] nvme nvme0: EH 1: before shutdown > [ 599.013586] nvme nvme0: EH 1: after shutdown > [ 599.137197] nvme nvme0: EH 1: after recovery > > The EH 1 have mark the state to LIVE > > [ 599.137241] nvme nvme0: failed to mark controller state 1 > > So the EH 0 failed to mark state to LIVE > The card was removed. > This should not be expected by nested EH. Right. > > [ 599.137322] nvme nvme0: Removing after probe failure status: 0 > [ 599.326539] nvme nvme0: EH 0: after recovery > [ 599.326760] nvme0n1: detected capacity change from 128035676160 to 0 > [ 599.457208] nvme nvme0: failed to set APST feature (-19) > > nvme_reset_dev should identify whether it is nested. The above should be caused by race between updating controller state, hope I can find some time in this week to investigate it further. Also maybe we can change to remove controller until nested EH has been tried enough times. Thanks, Ming
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
Hi ming I did some tests on my local. [ 598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller This should be a timeout on nvme_reset_dev->nvme_wait_freeze. [ 598.828743] nvme nvme0: EH 1: before shutdown [ 599.013586] nvme nvme0: EH 1: after shutdown [ 599.137197] nvme nvme0: EH 1: after recovery The EH 1 have mark the state to LIVE [ 599.137241] nvme nvme0: failed to mark controller state 1 So the EH 0 failed to mark state to LIVE The card was removed. This should not be expected by nested EH. [ 599.137322] nvme nvme0: Removing after probe failure status: 0 [ 599.326539] nvme nvme0: EH 0: after recovery [ 599.326760] nvme0n1: detected capacity change from 128035676160 to 0 [ 599.457208] nvme nvme0: failed to set APST feature (-19) nvme_reset_dev should identify whether it is nested. Thanks Jianchao
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Sat, May 05, 2018 at 07:51:22PM -0400, Laurence Oberman wrote: > 3rd and 4th attempts slightly better, but clearly not dependable > > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O)[failed] > runtime... 81.188s > --- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400 > +++ results/nvme0n1/block/011.out.bad 2018-05-05 > 19:44:48.848568687 -0400 > @@ -1,2 +1,3 @@ > Running block/011 > +tests/block/011: line 47: echo: write error: Input/output error > Test complete > > This one passed > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O)[passed] > runtime 81.188s ... 43.400s > > I will capture a vmcore next time it panics and give some information > after analyzing the core We definitely should never panic, but I am not sure this blktest can be reliable on IO errors: the test is disabling memory space enabling and bus master without the driver's knowledge, and it does this repeatedly in a tight loop. If the test happens to disable the device while the driver is trying to recover from the previous iteration, the recovery will surely fail, so I think IO errors may possibly be expected. As far as I can tell, the only way you'll actually get it to succeed is if the test's subsequent "enable" happen's to hit in conjuction with the driver's reset pci_enable_device_mem(), such that the pci_dev's enable_cnt is > 1, which prevents the disabling for the remainder of the test's looping. I still think this is a very good test, but we might be able to make it more deterministic on what actually happens to the pci device.
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Sat, 2018-05-05 at 19:31 -0400, Laurence Oberman wrote: > On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote: > > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: > > > Hi, > > > > > > The 1st patch introduces blk_quiesce_timeout() and > > > blk_unquiesce_timeout() > > > for NVMe, meantime fixes blk_sync_queue(). > > > > > > The 2nd patch covers timeout for admin commands for recovering > > > controller > > > for avoiding possible deadlock. > > > > > > The 3rd and 4th patches avoid to wait_freeze on queues which > > > aren't > > > frozen. > > > > > > The last 4 patches fixes several races wrt. NVMe timeout handler, > > > and > > > finally can make blktests block/011 passed. Meantime the NVMe PCI > > > timeout > > > mecanism become much more rebost than before. > > > > > > gitweb: > > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 > > > > > > V4: > > > - fixe nvme_init_set_host_mem_cmd() > > > - use nested EH model, and run both nvme_dev_disable() and > > > resetting in one same context > > > > > > V3: > > > - fix one new race related freezing in patch 4, > > > nvme_reset_work() > > > may hang forever without this patch > > > - rewrite the last 3 patches, and avoid to break > > > nvme_reset_ctrl*() > > > > > > V2: > > > - fix draining timeout work, so no need to change return value > > > from > > > .timeout() > > > - fix race between nvme_start_freeze() and nvme_unfreeze() > > > - cover timeout for admin commands running in EH > > > > > > Ming Lei (7): > > > block: introduce blk_quiesce_timeout() and > > > blk_unquiesce_timeout() > > > nvme: pci: cover timeout for admin commands running in EH > > > nvme: pci: only wait freezing if queue is frozen > > > nvme: pci: freeze queue in nvme_dev_disable() in case of error > > > recovery > > > nvme: core: introduce 'reset_lock' for sync reset state and > > > reset > > > activities > > > nvme: pci: prepare for supporting error recovery from resetting > > > context > > > nvme: pci: support nested EH > > > > > > block/blk-core.c | 21 +++- > > > block/blk-mq.c | 9 ++ > > > block/blk-timeout.c | 5 +- > > > drivers/nvme/host/core.c | 46 ++- > > > drivers/nvme/host/nvme.h | 5 + > > > drivers/nvme/host/pci.c | 304 > > > --- > > > include/linux/blkdev.h | 13 ++ > > > 7 files changed, 356 insertions(+), 47 deletions(-) > > > > > > Cc: Jianchao Wang > > > Cc: Christoph Hellwig > > > Cc: Sagi Grimberg > > > Cc: linux-n...@lists.infradead.org > > > Cc: Laurence Oberman > > > > Hello Ming > > > > I have a two node NUMA system here running your kernel tree > > 4.17.0-rc3.ming.nvme+ > > > > [root@segstorage1 ~]# numactl --hardware > > available: 2 nodes (0-1) > > node 0 cpus: 0 3 5 6 8 11 13 14 > > node 0 size: 63922 MB > > node 0 free: 61310 MB > > node 1 cpus: 1 2 4 7 9 10 12 15 > > node 1 size: 64422 MB > > node 1 free: 62372 MB > > node distances: > > node 0 1 > > 0: 10 20 > > 1: 20 10 > > > > I ran block/011 > > > > [root@segstorage1 blktests]# ./check block/011 > > block/011 => nvme0n1 (disable PCI device while doing > > I/O)[failed] > > runtime... 106.936s > > --- tests/block/011.out 2018-05-05 18:01:14.268414752 > > -0400 > > +++ results/nvme0n1/block/011.out.bad 2018-05-05 > > 19:07:21.028634858 -0400 > > @@ -1,2 +1,36 @@ > > Running block/011 > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > > IO_U_F_FLIGHT) == 0' failed. > > ... > > (Run 'diff -u tests/block/011.out > > results/nvme0n1/block/011.out.bad' to see the entire diff) > > > > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 > > [ 1452.676351] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.718221] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.718239] nvme nvme0: EH 0: before shutdown > > [ 1452.760890] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760894] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760897] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760900] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_STATUS=0x10 > > [ 1452.760903] nvme nvme0: controller is down; will reset: > > CSTS=0x3, > > PCI_S
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Sat, 2018-05-05 at 19:11 -0400, Laurence Oberman wrote: > On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: > > Hi, > > > > The 1st patch introduces blk_quiesce_timeout() and > > blk_unquiesce_timeout() > > for NVMe, meantime fixes blk_sync_queue(). > > > > The 2nd patch covers timeout for admin commands for recovering > > controller > > for avoiding possible deadlock. > > > > The 3rd and 4th patches avoid to wait_freeze on queues which aren't > > frozen. > > > > The last 4 patches fixes several races wrt. NVMe timeout handler, > > and > > finally can make blktests block/011 passed. Meantime the NVMe PCI > > timeout > > mecanism become much more rebost than before. > > > > gitweb: > > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 > > > > V4: > > - fixe nvme_init_set_host_mem_cmd() > > - use nested EH model, and run both nvme_dev_disable() and > > resetting in one same context > > > > V3: > > - fix one new race related freezing in patch 4, > > nvme_reset_work() > > may hang forever without this patch > > - rewrite the last 3 patches, and avoid to break > > nvme_reset_ctrl*() > > > > V2: > > - fix draining timeout work, so no need to change return value > > from > > .timeout() > > - fix race between nvme_start_freeze() and nvme_unfreeze() > > - cover timeout for admin commands running in EH > > > > Ming Lei (7): > > block: introduce blk_quiesce_timeout() and > > blk_unquiesce_timeout() > > nvme: pci: cover timeout for admin commands running in EH > > nvme: pci: only wait freezing if queue is frozen > > nvme: pci: freeze queue in nvme_dev_disable() in case of error > > recovery > > nvme: core: introduce 'reset_lock' for sync reset state and reset > > activities > > nvme: pci: prepare for supporting error recovery from resetting > > context > > nvme: pci: support nested EH > > > > block/blk-core.c | 21 +++- > > block/blk-mq.c | 9 ++ > > block/blk-timeout.c | 5 +- > > drivers/nvme/host/core.c | 46 ++- > > drivers/nvme/host/nvme.h | 5 + > > drivers/nvme/host/pci.c | 304 > > --- > > include/linux/blkdev.h | 13 ++ > > 7 files changed, 356 insertions(+), 47 deletions(-) > > > > Cc: Jianchao Wang > > Cc: Christoph Hellwig > > Cc: Sagi Grimberg > > Cc: linux-n...@lists.infradead.org > > Cc: Laurence Oberman > > Hello Ming > > I have a two node NUMA system here running your kernel tree > 4.17.0-rc3.ming.nvme+ > > [root@segstorage1 ~]# numactl --hardware > available: 2 nodes (0-1) > node 0 cpus: 0 3 5 6 8 11 13 14 > node 0 size: 63922 MB > node 0 free: 61310 MB > node 1 cpus: 1 2 4 7 9 10 12 15 > node 1 size: 64422 MB > node 1 free: 62372 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > I ran block/011 > > [root@segstorage1 blktests]# ./check block/011 > block/011 => nvme0n1 (disable PCI device while doing I/O)[failed] > runtime... 106.936s > --- tests/block/011.out 2018-05-05 18:01:14.268414752 > -0400 > +++ results/nvme0n1/block/011.out.bad 2018-05-05 > 19:07:21.028634858 -0400 > @@ -1,2 +1,36 @@ > Running block/011 > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & > IO_U_F_FLIGHT) == 0' failed. > ... > (Run 'diff -u tests/block/011.out > results/nvme0n1/block/011.out.bad' to see the entire diff) > > [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 > [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.718239] nvme nvme0: EH 0: before shutdown > [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3, > PCI_STATUS=0x10 > [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS
Re: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling
On Sat, 2018-05-05 at 21:58 +0800, Ming Lei wrote: > Hi, > > The 1st patch introduces blk_quiesce_timeout() and > blk_unquiesce_timeout() > for NVMe, meantime fixes blk_sync_queue(). > > The 2nd patch covers timeout for admin commands for recovering > controller > for avoiding possible deadlock. > > The 3rd and 4th patches avoid to wait_freeze on queues which aren't > frozen. > > The last 4 patches fixes several races wrt. NVMe timeout handler, and > finally can make blktests block/011 passed. Meantime the NVMe PCI > timeout > mecanism become much more rebost than before. > > gitweb: > https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 > > V4: > - fixe nvme_init_set_host_mem_cmd() > - use nested EH model, and run both nvme_dev_disable() and > resetting in one same context > > V3: > - fix one new race related freezing in patch 4, > nvme_reset_work() > may hang forever without this patch > - rewrite the last 3 patches, and avoid to break > nvme_reset_ctrl*() > > V2: > - fix draining timeout work, so no need to change return value > from > .timeout() > - fix race between nvme_start_freeze() and nvme_unfreeze() > - cover timeout for admin commands running in EH > > Ming Lei (7): > block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() > nvme: pci: cover timeout for admin commands running in EH > nvme: pci: only wait freezing if queue is frozen > nvme: pci: freeze queue in nvme_dev_disable() in case of error > recovery > nvme: core: introduce 'reset_lock' for sync reset state and reset > activities > nvme: pci: prepare for supporting error recovery from resetting > context > nvme: pci: support nested EH > > block/blk-core.c | 21 +++- > block/blk-mq.c | 9 ++ > block/blk-timeout.c | 5 +- > drivers/nvme/host/core.c | 46 ++- > drivers/nvme/host/nvme.h | 5 + > drivers/nvme/host/pci.c | 304 > --- > include/linux/blkdev.h | 13 ++ > 7 files changed, 356 insertions(+), 47 deletions(-) > > Cc: Jianchao Wang > Cc: Christoph Hellwig > Cc: Sagi Grimberg > Cc: linux-n...@lists.infradead.org > Cc: Laurence Oberman Hello Ming I have a two node NUMA system here running your kernel tree 4.17.0-rc3.ming.nvme+ [root@segstorage1 ~]# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 3 5 6 8 11 13 14 node 0 size: 63922 MB node 0 free: 61310 MB node 1 cpus: 1 2 4 7 9 10 12 15 node 1 size: 64422 MB node 1 free: 62372 MB node distances: node 0 1 0: 10 20 1: 20 10 I ran block/011 [root@segstorage1 blktests]# ./check block/011 block/011 => nvme0n1 (disable PCI device while doing I/O)[failed] runtime... 106.936s --- tests/block/011.out 2018-05-05 18:01:14.268414752 -0400 +++ results/nvme0n1/block/011.out.bad 2018-05-05 19:07:21.028634858 -0400 @@ -1,2 +1,36 @@ Running block/011 +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. +fio: ioengines.c:289: td_io_queue: Assertion `(io_u->flags & IO_U_F_FLIGHT) == 0' failed. ... (Run 'diff -u tests/block/011.out results/nvme0n1/block/011.out.bad' to see the entire diff) [ 1421.738551] run blktests block/011 at 2018-05-05 19:05:34 [ 1452.676351] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.718221] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.718239] nvme nvme0: EH 0: before shutdown [ 1452.760890] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760894] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760897] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760900] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760903] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760906] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760909] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760912] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760915] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760918] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760921] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760923] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10 [ 1452.760926] nvme nvme0: controller
[PATCH V4 0/7] nvme: pci: fix & improve timeout handling
Hi, The 1st patch introduces blk_quiesce_timeout() and blk_unquiesce_timeout() for NVMe, meantime fixes blk_sync_queue(). The 2nd patch covers timeout for admin commands for recovering controller for avoiding possible deadlock. The 3rd and 4th patches avoid to wait_freeze on queues which aren't frozen. The last 4 patches fixes several races wrt. NVMe timeout handler, and finally can make blktests block/011 passed. Meantime the NVMe PCI timeout mecanism become much more rebost than before. gitweb: https://github.com/ming1/linux/commits/v4.17-rc-nvme-timeout.V4 V4: - fixe nvme_init_set_host_mem_cmd() - use nested EH model, and run both nvme_dev_disable() and resetting in one same context V3: - fix one new race related freezing in patch 4, nvme_reset_work() may hang forever without this patch - rewrite the last 3 patches, and avoid to break nvme_reset_ctrl*() V2: - fix draining timeout work, so no need to change return value from .timeout() - fix race between nvme_start_freeze() and nvme_unfreeze() - cover timeout for admin commands running in EH Ming Lei (7): block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() nvme: pci: cover timeout for admin commands running in EH nvme: pci: only wait freezing if queue is frozen nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery nvme: core: introduce 'reset_lock' for sync reset state and reset activities nvme: pci: prepare for supporting error recovery from resetting context nvme: pci: support nested EH block/blk-core.c | 21 +++- block/blk-mq.c | 9 ++ block/blk-timeout.c | 5 +- drivers/nvme/host/core.c | 46 ++- drivers/nvme/host/nvme.h | 5 + drivers/nvme/host/pci.c | 304 --- include/linux/blkdev.h | 13 ++ 7 files changed, 356 insertions(+), 47 deletions(-) Cc: Jianchao Wang Cc: Christoph Hellwig Cc: Sagi Grimberg Cc: linux-n...@lists.infradead.org Cc: Laurence Oberman -- 2.9.5