Re: [PATCH 1/7] cxlflash: Yield to active send threads

2018-05-16 Thread Matthew R. Ochs
On Fri, May 11, 2018 at 02:04:46PM -0500, Uma Krishnan wrote:
> The following Oops may be encountered if the device is reset, i.e. EEH
> recovery, while there is heavy I/O traffic:
> 
> 59:mon> t
> [c000200db64bb680] c00809264c40 cxlflash_queuecommand+0x3b8/0x500
>   [cxlflash]
> [c000200db64bb770] c090d3b0 scsi_dispatch_cmd+0x130/0x2f0
> [c000200db64bb7f0] c090fdd8 scsi_request_fn+0x3c8/0x8d0
> [c000200db64bb900] c067f528 __blk_run_queue+0x68/0xb0
> [c000200db64bb930] c067ab80 __elv_add_request+0x140/0x3c0
> [c000200db64bb9b0] c068daac blk_execute_rq_nowait+0xec/0x1a0
> [c000200db64bba00] c068dbb0 blk_execute_rq+0x50/0xe0
> [c000200db64bba50] c06b2040 sg_io+0x1f0/0x520
> [c000200db64bbaf0] c06b2e94 scsi_cmd_ioctl+0x534/0x610
> [c000200db64bbc20] c0926208 sd_ioctl+0x118/0x280
> [c000200db64bbcc0] c069f7ac blkdev_ioctl+0x7fc/0xe30
> [c000200db64bbd20] c0439204 block_ioctl+0x84/0xa0
> [c000200db64bbd40] c03f8514 do_vfs_ioctl+0xd4/0xa00
> [c000200db64bbde0] c03f8f04 SyS_ioctl+0xc4/0x130
> [c000200db64bbe30] c000b184 system_call+0x58/0x6c
> 
> When there is no room to send the I/O request, the cached room is refreshed
> by reading the memory mapped command room value from the AFU. The AFU
> register mapping is refreshed during a reset, creating a race condition
> that can lead to the Oops above.
> 
> During a device reset, the AFU should not be unmapped until all the active
> send threads quiesce. An atomic counter, cmds_active, is currently used to
> track internal AFU commands and quiesce during reset. This same counter can
> also be used for the active send threads.
> 
> Signed-off-by: Uma Krishnan 

Acked-by: Matthew R. Ochs 



[PATCH 1/7] cxlflash: Yield to active send threads

2018-05-11 Thread Uma Krishnan
The following Oops may be encountered if the device is reset, i.e. EEH
recovery, while there is heavy I/O traffic:

59:mon> t
[c000200db64bb680] c00809264c40 cxlflash_queuecommand+0x3b8/0x500
[cxlflash]
[c000200db64bb770] c090d3b0 scsi_dispatch_cmd+0x130/0x2f0
[c000200db64bb7f0] c090fdd8 scsi_request_fn+0x3c8/0x8d0
[c000200db64bb900] c067f528 __blk_run_queue+0x68/0xb0
[c000200db64bb930] c067ab80 __elv_add_request+0x140/0x3c0
[c000200db64bb9b0] c068daac blk_execute_rq_nowait+0xec/0x1a0
[c000200db64bba00] c068dbb0 blk_execute_rq+0x50/0xe0
[c000200db64bba50] c06b2040 sg_io+0x1f0/0x520
[c000200db64bbaf0] c06b2e94 scsi_cmd_ioctl+0x534/0x610
[c000200db64bbc20] c0926208 sd_ioctl+0x118/0x280
[c000200db64bbcc0] c069f7ac blkdev_ioctl+0x7fc/0xe30
[c000200db64bbd20] c0439204 block_ioctl+0x84/0xa0
[c000200db64bbd40] c03f8514 do_vfs_ioctl+0xd4/0xa00
[c000200db64bbde0] c03f8f04 SyS_ioctl+0xc4/0x130
[c000200db64bbe30] c000b184 system_call+0x58/0x6c

When there is no room to send the I/O request, the cached room is refreshed
by reading the memory mapped command room value from the AFU. The AFU
register mapping is refreshed during a reset, creating a race condition
that can lead to the Oops above.

During a device reset, the AFU should not be unmapped until all the active
send threads quiesce. An atomic counter, cmds_active, is currently used to
track internal AFU commands and quiesce during reset. This same counter can
also be used for the active send threads.

Signed-off-by: Uma Krishnan 
---
 drivers/scsi/cxlflash/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index a24d7e6..dad2be6 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -616,6 +616,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
rc = 0;
goto out;
default:
+   atomic_inc(&afu->cmds_active);
break;
}
 
@@ -641,6 +642,7 @@ static int cxlflash_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scp)
memcpy(cmd->rcb.cdb, scp->cmnd, sizeof(cmd->rcb.cdb));
 
rc = afu->send_cmd(afu, cmd);
+   atomic_dec(&afu->cmds_active);
 out:
return rc;
 }
-- 
2.1.0