Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-19 Thread Christoph Hellwig
Looks like I won't get the major error status changes into 4.12,
so let's go with these patches for now:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-03 Thread Christoph Hellwig
I'm planning to introduce new block-layer specific status code ASAP,
so I'd prefer not to add new errno special cases.

I'll port your patches to the new code and will send them out with
my series in a few days, though.


[PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-03 Thread Dmitry Monakhov
It is quite easily to get URE after power failure and get scary message.
URE is happens due to internal drive crc mismatch due to partial sector
update. Most people interpret such message as "My drive is dying", which
isreasonable assumption if your dmesg is full of complain from disks and
read(2) return EIO. In fact this error is not fatal. One can fix it easily
by rewriting affected sector.

So we have to handle URE like follows:
- Return EILSEQ to signall caller that this is bad data related problem
- Do not retry command, because this is useless.



### Test case
#Test uses two HDD: disks sdb sdc
#Write_phase
# let fio work ~100sec and then cut the power
fio --ioengine=libaio --direct=1 --rw=write --bs=1M --iodepth=16 \
--time_based=1 --runtime=600 --filesize=1G --size=1T \
--name /dev/sdb --name /dev/sdc

# Check_phase after system goes back
fio --ioengine=libaio --direct=1 --group_reporting --rw=read --bs=1M \
--iodepth=16 --size=1G --filesize=1G
--name=/dev/sdb --name /dev/sdc

More info about URE probability here:
https://plus.google.com/101761226576930717211/posts/Pctq7kk1dLL

Signed-off-by: Dmitry Monakhov 
---
 drivers/scsi/scsi_lib.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d7..59d64ad 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -961,6 +961,19 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
/* See SSC3rXX or current. */
action = ACTION_FAIL;
break;
+   case MEDIUM_ERROR:
+   if (sshdr.asc == 0x11) {
+   /* Handle unrecovered read error */
+   switch (sshdr.ascq) {
+   case 0x00: /* URE */
+   case 0x04: /* URE auto reallocate failed */
+   case 0x0B: /* URE recommend reassignment*/
+   case 0x0C: /* URE recommend rewrite the data */
+   action = ACTION_FAIL;
+   error = -EILSEQ;
+   break;
+   }
+   }
default:
action = ACTION_FAIL;
break;
-- 
2.9.3