On 07/12/2013 12:00 PM, Ren Mingxin wrote:
> Hi, Hannes:
> 
> On 07/12/2013 02:09 PM, Hannes Reinecke wrote:
>> On 07/12/2013 06:14 AM, Ren Mingxin wrote:
>>> On 07/01/2013 10:24 PM, Hannes Reinecke wrote:
>>>> With the original SCSI EH I got:
>>>> # time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s
>>>>
>>>> real    2m22.657s
>>>> user    0m0.013s
>>>> sys    0m0.145s
>>>>
>>>> With this patchset I got:
>>>> # time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
>>>> 4096+0 records in
>>>> 4096+0 records out
>>>> 16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s
>>>>
>>>> real    0m52.163s
>>>> user    0m0.012s
>>>> sys    0m0.145s
>>>>
>>>> Test was to disable RSCN on the target port, disable the
>>>> target port, and then start the 'dd' command as indicated.
>>>
>>> Do you mean disabling RSCN/port is enough? I'm afraid I couldn't
>>> reproduce the problem by your steps. Both with and without your
>>> patchset are the same 'dd' result: 27s. Please let me know where I
>>> neglected or mistook:
>>>
>>> 1) I made a dm-multipath target 'dm-0' whose grouping policy was
>>>     failover;
>>> 2) Disable RSCN/port via brocade fc switch:
>>>     SW300:root>  portcfg rscnsupr 15 --enable; portDisable 15
>>> 3) Start the 'dd' command:
>>>     # time dd if=/dev/zero of=/dev/dm-0 bs=4k count=4k oflag=direct
>>>     dd: writing `/dev/sde': Input/output error
>>>     1+0 records in
>>>     0+0 records out
>>>     0 bytes (0 B) copied, 27.8588 s, 0.0 kB/s
>>>
>>>     real    0m27.860s
>>>     user    0m0.001s
>>>     sys     0m0.000s
>>
>> You are aware that you have to disable RSCNs on the _target_ port,
>> right?
>> Disabling RSCNs on the _initiator_ ports is a well-tested case, and
>> the one which actually makes sense (and is even implemented in
>> QLogic switches).
>> Disabling RSCNs for the _target_ port, OTOH, has a very questionable
>> nature (hence QLogic switches don't even allow you to do this).
> 
> You're right. By disabling RSCNs on target port, I've reproduced this
> problem. Thank you so much. But I've encountered the bug I said
> before. I'll test again with your new patchset once you send.
> 

Could you check with the attached patch? That should convert it to
delayed_work and avoid this issue.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>From 2d0851bea02e3d15bf403e6800cc7164f2e211f2 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <h...@suse.de>
Date: Fri, 12 Jul 2013 12:06:19 +0200
Subject: [PATCH] scsi: use delayed_work to trigger aborts

Command aborts are invoked from an interrupt, so we cannot use
cancel_work_sync() when an outstanding abort needs to be cancelled.
So convert the mechanism to delayed_work, which can be cancelled
even from an interrupt context.

Signed-off-by: Hannes Reinecke <h...@suse.de>

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index a62ae84..13a3418 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,7 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask)
 
 		cmd->device = dev;
 		INIT_LIST_HEAD(&cmd->list);
-		INIT_WORK(&cmd->abort_work, scmd_eh_abort_handler);
+		INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
 		spin_lock_irqsave(&dev->list_lock, flags);
 		list_add_tail(&cmd->list, &dev->cmd_list);
 		spin_unlock_irqrestore(&dev->list_lock, flags);
@@ -354,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 	list_del_init(&cmd->list);
 	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
+	cancel_delayed_work_sync(&cmd->abort_work);
+
 	__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 28c272f..cd443b2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -187,7 +187,7 @@ void
 scmd_eh_abort_handler(struct work_struct *work)
 {
 	struct scsi_cmnd *scmd =
-		container_of(work, struct scsi_cmnd, abort_work);
+		container_of(work, struct scsi_cmnd, abort_work.work);
 	struct scsi_device *sdev = scmd->device;
 	unsigned long flags;
 	int rtn;
@@ -254,7 +254,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "scmd %p abort timeout\n", scmd));
-		cancel_work_sync(&scmd->abort_work);
+		cancel_delayed_work(&scmd->abort_work);
 		return BLK_EH_NOT_HANDLED;
 	}
 
@@ -279,7 +279,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
 	SCSI_LOG_ERROR_RECOVERY(3,
 		scmd_printk(KERN_INFO, scmd,
 			    "scmd %p abort scheduled\n", scmd));
-	schedule_work(&scmd->abort_work);
+	schedule_delayed_work(&scmd->abort_work, HZ / 100);
 	return BLK_EH_SCHEDULED;
 }
 EXPORT_SYMBOL_GPL(scsi_abort_command);
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a2f062e..e3137e3 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -55,7 +55,7 @@ struct scsi_cmnd {
 	struct scsi_device *device;
 	struct list_head list;  /* scsi_cmnd participates in queue lists */
 	struct list_head eh_entry; /* entry for the host eh_cmd_q */
-	struct work_struct abort_work;
+	struct delayed_work abort_work;
 	int eh_eflags;		/* Used by error handlr */
 
 	/*

Reply via email to