[PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-03 Thread Dmitry Monakhov
It is quite easily to get URE after power failure and get scary message.
URE is happens due to internal drive crc mismatch due to partial sector
update. Most people interpret such message as "My drive is dying", which
isreasonable assumption if your dmesg is full of complain from disks and
read(2) return EIO. In fact this error is not fatal. One can fix it easily
by rewriting affected sector.

So we have to handle URE like follows:
- Return EILSEQ to signall caller that this is bad data related problem
- Do not retry command, because this is useless.



### Test case
#Test uses two HDD: disks sdb sdc
#Write_phase
# let fio work ~100sec and then cut the power
fio --ioengine=libaio --direct=1 --rw=write --bs=1M --iodepth=16 \
--time_based=1 --runtime=600 --filesize=1G --size=1T \
--name /dev/sdb --name /dev/sdc

# Check_phase after system goes back
fio --ioengine=libaio --direct=1 --group_reporting --rw=read --bs=1M \
--iodepth=16 --size=1G --filesize=1G
--name=/dev/sdb --name /dev/sdc

More info about URE probability here:
https://plus.google.com/101761226576930717211/posts/Pctq7kk1dLL

Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
 drivers/scsi/scsi_lib.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 19125d7..59d64ad 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -961,6 +961,19 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned 
int good_bytes)
/* See SSC3rXX or current. */
action = ACTION_FAIL;
break;
+   case MEDIUM_ERROR:
+   if (sshdr.asc == 0x11) {
+   /* Handle unrecovered read error */
+   switch (sshdr.ascq) {
+   case 0x00: /* URE */
+   case 0x04: /* URE auto reallocate failed */
+   case 0x0B: /* URE recommend reassignment*/
+   case 0x0C: /* URE recommend rewrite the data */
+   action = ACTION_FAIL;
+   error = -EILSEQ;
+   break;
+   }
+   }
default:
action = ACTION_FAIL;
break;
-- 
2.9.3



[PATCH 2/2] block: Improve error handling verbosity

2017-04-03 Thread Dmitry Monakhov
EILSEQ is returned due to internal csum error on disk/fabric,
let's add special message to distinguish it from others. Also dump
original numerical error code.

Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
 block/blk-core.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 071a998..8eab846 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2576,13 +2576,16 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
case -ENODATA:
error_type = "critical medium";
break;
+   case -EILSEQ:
+   error_type = "bad data";
+   break;
case -EIO:
default:
error_type = "I/O";
break;
}
-   printk_ratelimited(KERN_ERR "%s: %s error, dev %s, sector 
%llu\n",
-  __func__, error_type, req->rq_disk ?
+   printk_ratelimited(KERN_ERR "%s: %s error (%d), dev %s, sector 
%llu\n",
+  __func__, error_type, error, req->rq_disk ?
   req->rq_disk->disk_name : "?",
   (unsigned long long)blk_rq_pos(req));
 
-- 
2.9.3



Re: [PATCH 0/3] Improve block device testing coverage

2017-03-31 Thread Dmitry Monakhov
Eryu Guan <eg...@redhat.com> writes:

> On Fri, Mar 31, 2017 at 10:43:19AM +0300, Dmitry Monakhov wrote:
>> Christoph Hellwig <h...@infradead.org> writes:
>> 
>> > On Thu, Mar 30, 2017 at 05:19:01PM +0400, Dmitry Monakhov wrote:
>> >> During LSFMM we have discussed how to test lower-backend of linux 
>> >> IO-stack.
>> >> Common opinion was that xfstests is the most obvious solution which cover
>> >> most of use cases filesystem care about.
>> >> 
>> >> I'm working on integration T10-DIF/DIF data integrity features to ext4,
>> >> for that reason we need to be shure that linux integrity framework is
>> >> in working state, which is currently broken in several places.
>> >> 
>> >> In fact, it is relatively simple to add basic coverage tests for basic
>> >> IO operations over virtual device with integrity support. All we need
>> >> is to add lio target support.
>> >
>> > First:  Thanks for adding block layer testing!
>> >
>> > Second: even more so than Darrick's blockdev fallocate test this is
>> > the wrong place.  If I run xfstests I want to test my file system,
>> > not random block device features.  Please start a proper block device
>> > testsuite instead, possibly by copy and pasting code from xfstests.
>> Fair enough. I also not happy to place blkdev feature  to tests/generic
>> namespace. But altearnative to fork xfstests infrastructure to dedicated
>> test-framework only for blkdevice seems not very good. Because fork is
>> always pain. I already maintain one internal fork of xfstests which
>> tests our Vituozzo's speciffic features.
>> 
>> May be it would be reasonable idea to add didicated namespace
>> 'tests/blockdev' in xfstests, and move all blkdev related tests here?
>> IMHO this is good idea. Because filesystem relay on some basic
>> features from blkdev which should be tested explicitly, because
>> implicit testing is too hard to debug/investigation.
>
> I'm not sure if xfstests is the right place for blockdev tests,
> especially for the pure blockdev level features (at least Darrick's
> blockdev tests are testing fallocate(2) interface).
Another good example may be a bug with dirty page cache after blkdiscard
https://lkml.org/lkml/2017/3/22/789 . This simple bug  result in crappy
fsimage if mkfs relay on discard_zeroes_data behaviour.
So IMHO basic blkdev test coverage is important filesystem testing. i.e.
important for xfstests.
>
> But yeah, a new tests/blockdev dir would be good if we eventually decide
> adding blockdev tests to xfstests, so they're not run by default when
> people want to test filesystems.
>
> Thanks,
> Eryu


Re: [PATCH 0/3] Improve block device testing coverage

2017-03-31 Thread Dmitry Monakhov
Christoph Hellwig <h...@infradead.org> writes:

> On Thu, Mar 30, 2017 at 05:19:01PM +0400, Dmitry Monakhov wrote:
>> During LSFMM we have discussed how to test lower-backend of linux IO-stack.
>> Common opinion was that xfstests is the most obvious solution which cover
>> most of use cases filesystem care about.
>> 
>> I'm working on integration T10-DIF/DIF data integrity features to ext4,
>> for that reason we need to be shure that linux integrity framework is
>> in working state, which is currently broken in several places.
>> 
>> In fact, it is relatively simple to add basic coverage tests for basic
>> IO operations over virtual device with integrity support. All we need
>> is to add lio target support.
>
> First:  Thanks for adding block layer testing!
>
> Second: even more so than Darrick's blockdev fallocate test this is
> the wrong place.  If I run xfstests I want to test my file system,
> not random block device features.  Please start a proper block device
> testsuite instead, possibly by copy and pasting code from xfstests.
Fair enough. I also not happy to place blkdev feature  to tests/generic
namespace. But altearnative to fork xfstests infrastructure to dedicated
test-framework only for blkdevice seems not very good. Because fork is
always pain. I already maintain one internal fork of xfstests which
tests our Vituozzo's speciffic features.

May be it would be reasonable idea to add didicated namespace
'tests/blockdev' in xfstests, and move all blkdev related tests here?
IMHO this is good idea. Because filesystem relay on some basic
features from blkdev which should be tested explicitly, because
implicit testing is too hard to debug/investigation.

>
> That's how I started the test suite for qemu's block layer for example.
Do you mean qemu/tests/qemu-iotests ?





[PATCH 2/3] add test: generic/420 check information lead for lio-fileio

2017-03-30 Thread Dmitry Monakhov
Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
 tests/generic/420 | 77 +++
 tests/generic/420.out |  2 ++
 tests/generic/group   |  1 +
 3 files changed, 80 insertions(+)
 create mode 100755 tests/generic/420
 create mode 100644 tests/generic/420.out

diff --git a/tests/generic/420 b/tests/generic/420
new file mode 100755
index 000..592703d
--- /dev/null
+++ b/tests/generic/420
@@ -0,0 +1,77 @@
+#! /bin/bash
+# FS QA Test 420
+#
+# Regression test for  information leak for lio-fileio target
+# BUG: If image  file is less than virtual blockdev then read() may return
+#  unitilized data.
+#
+#---
+# Copyright (c) 2017 Dmitry Monakhov <dmonak...@openvz.org>
+# All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+   _liotgt_cleanup
+   rm -rf $TEST_DIR/$$
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/liotarget
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_test
+_require_liotarget
+
+mkdir -p $TEST_DIR/$$ || _fail "Can not make test dir"
+
+cfg_path=$(_liotgt_create_fileio  420-test.img $TEST_DIR/$$/420-test.img 128M)
+dev=$(_liotgt_attach_target /backstores/fileio/420-test.img)
+
+$XFS_IO_PROG -f -c "truncate 1M" $TEST_DIR/$$/420-test.img >> $seqres.full
+
+$XFS_IO_PROG -c "pwrite -S 0xa0 -b 512 512 1k" -d $dev >>$seqres.full 2>&1 || \
+_fail "pwrite failed"
+
+$XFS_IO_PROG -c "pwrite -S 0xab -b 512 2M 1k" -d $dev >>$seqres.full 2>&1 || \
+_fail "pwrite failed"
+
+md5sum $dev | sed -e "s,$dev,loopdev,g"
+# success, all done
+status=0
+exit
diff --git a/tests/generic/420.out b/tests/generic/420.out
new file mode 100644
index 000..deed5da
--- /dev/null
+++ b/tests/generic/420.out
@@ -0,0 +1,2 @@
+QA output created by 420
+f5cfb0d8d7bfadbc2127f4f6ca4a0eba  loopdev
diff --git a/tests/generic/group b/tests/generic/group
index 0781f35..1261547 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -422,3 +422,4 @@
 417 auto quick shutdown log
 418 auto rw
 419 auto quick encrypt
+420 auto blockdev liotarget
-- 
2.9.3



[PATCH 1/3] add lio-target helpers

2017-03-30 Thread Dmitry Monakhov
Linux-IO Target is very good framework for testing block backend.
It is more flexible than scsi_debug.

http://linux-iscsi.org/wiki/LIO
Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
 common/config|   2 +
 common/liotarget | 111 +++
 2 files changed, 113 insertions(+)
 create mode 100644 common/liotarget

diff --git a/common/config b/common/config
index 59041a3..cfe7913 100644
--- a/common/config
+++ b/common/config
@@ -212,6 +212,8 @@ export XZ_PROG="`set_prog_path xz`"
 export FLOCK_PROG="`set_prog_path flock`"
 export LDD_PROG="`set_prog_path ldd`"
 export TIMEOUT_PROG="`set_prog_path timeout`"
+export TARGETCLI_PROG="`set_prog_path targetcli`"
+export TARGETCTL_PROG="`set_prog_path targetctl`"
 
 # use 'udevadm settle' or 'udevsettle' to wait for lv to be settled.
 # newer systems have udevadm command but older systems like RHEL5 don't.
diff --git a/common/liotarget b/common/liotarget
new file mode 100644
index 000..f821692
--- /dev/null
+++ b/common/liotarget
@@ -0,0 +1,111 @@
+#!/bin/bash
+#
+# Copyright (c) 2017 Virtuozzo Inc
+# All Rights Reserved.
+#
+# Written by Dmitry Monakhov <dmonak...@openvz.org>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+#
+# Functions for Linux-IO Target manipulation
+#
+
+_require_liotarget()
+{
+   which $TARGETCLI_PROG >>$seqres.full 2>&1  || \
+   _notrun "this test requires 'targetcli' tool"
+   which $TARGETCLI_PROG >>$seqres.full 2>&1  || \
+   _notrun "this test requires 'targetcli' tool"
+
+   $TARGETCLI_PROG ls /backstores/ramdisk >>$seqres.full 2>&1 ||\
+   _notrun "kernel compiled w/o CONFIG_TARGET_CORE"
+   $TARGETCLI_PROG ls /backstores/fileio >>$seqres.full 2>&1 ||\
+   _notrun "kernel compiled w/o CONFIG_TCM_FILEIO"
+   $TARGETCLI_PROG ls /loopback >>$seqres.full 2>&1 ||\
+   _notrun "kernel compiled w/o CONFIG_LOOPBACK_TARGET"
+}
+
+_liotgt_create_fileio()
+{
+   local name=$1
+   local img=$2
+   local sz=$3
+
+   $TARGETCLI_PROG /backstores/fileio create \
+   name=$name file_or_dev=$img  size=$sz >>$seqres.full ||\
+   _fail "Can not create /backstores/fileio/$name"
+
+   local cfg_path=`ls -d /sys/kernel/config/target/core/fileio_*/$name`
+   [ -d "$cfg_path" ] || _fail "Bad config path"
+   echo $cfg_path
+}
+
+_liotgt_set_attribute()
+{
+   local path=$1
+   local attr_name=$2
+   local attr_val=$3
+
+   [ -f $path/attrib/$attr_name ] || _fail "Bad attribute $attr_name"
+   echo "echo $attr_val >  $path/attrib/$attr_name " >>$seqres.full
+   echo $attr_val >  $path/attrib/$attr_name
+}
+
+_liotgt_create_loopback()
+{
+   local out=`$TARGETCLI_PROG /loopback/ create 2>&1`
+   [ $? -eq 0 ] || _fail "Can not create loopback target"
+   echo $out >>$seqres.full
+
+   local naa=`echo $out | gawk '{ print $3 }'`
+   echo ${naa:0:20} >>$seqres.full
+
+   echo ${naa:0:20}
+}
+
+_liotgt_find_dev()
+{
+   local found=""
+   local name=$1
+   local drives=`find /sys/devices -type f -name model`
+   for d in $drives;do
+   local dir=`dirname $d`
+   local vendor=`cat $dir/vendor`
+   local model=`cat $dir/model`
+   if [ "${vendor//[[:space:]]/}" == "LIO-ORG" ] && \
+  [ "${model//[[:space:]]/}" == "$name" ]; then
+   found=/dev/$(ls $dir/block)
+   break
+   fi
+   done
+   [ -z "$found" ] && _fail "Can not find device with backend $name"
+   echo "$found"
+}
+
+_liotgt_attach_target()
+{
+   local path=$1
+   local name=`basename $path`
+   local naa=$(_liotgt_create_loopback)
+
+   $TARGETCLI_PROG /loopback/$naa/luns create $path >>$seqres.full 2>&1 ||\
+   _fail "Can not attach $path to tcm_loop"
+   _liotgt_find_dev $name
+}
+
+_liotgt_cleanup()
+{
+   $TARGETCTL_PROG clear
+}
-- 
2.9.3



[PATCH 3/3] add test: generic/421 basic blockdev T10-DIF-TYPE1 integrity checks

2017-03-30 Thread Dmitry Monakhov
Test create virtual block device via lio-targed infastructure and
perform basic IO operations with data corruption detection.

Temprorally mark is as dangerous, because currently it trigger BUG_ON
inside blkdev_issue_flush

BTW: I use 'dd' to test read from corrupted image instead of xfs_io
because even if pread failed, xfs_io still exit with success, BUG?

Signed-off-by: Dmitry Monakhov <dmonak...@openvz.org>
---
 tests/generic/421 | 136 ++
 tests/generic/421.out |  16 ++
 tests/generic/group   |   1 +
 3 files changed, 153 insertions(+)
 create mode 100644 tests/generic/421
 create mode 100644 tests/generic/421.out

diff --git a/tests/generic/421 b/tests/generic/421
new file mode 100644
index 000..de1ba73
--- /dev/null
+++ b/tests/generic/421
@@ -0,0 +1,136 @@
+#! /bin/bash
+# FS QA Test 421
+#
+# Check basic T10-DIF integrity features for a block device
+#
+# DIF/DIX TYPE: T10-DIF-TYPE1-CRC
+# Kernel docs: Documentation/block/data-integrity.txt
+#---
+# Copyright (c) 2017 Dmitry Monakhov <dmonak...@openvz.org>
+# All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+   _liotgt_cleanup
+   rm -rf $TEST_DIR/$$
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/liotarget
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs generic
+_supported_os Linux
+_require_test
+_require_liotarget
+
+mkdir -p $TEST_DIR/$$ || _fail "Can not make test dir"
+
+# Create virtual block device
+cfg_path=$(_liotgt_create_fileio  t10-dif-t1 $TEST_DIR/$$/t10-dif-t1 32M)
+
+_liotgt_set_attribute $cfg_path  pi_prot_type 1
+_liotgt_set_attribute $cfg_path  pi_prot_format 1
+
+dev=$(_liotgt_attach_target /backstores/fileio/t10-dif-t1)
+
+echo "T0: Test basic IO"
+$XFS_IO_PROG -c "pwrite -S 0xa0 -b 4M 0 16M" -d $dev >>$seqres.full 2>&1 || \
+_fail "pwrite failed"
+
+$XFS_IO_PROG -c "pwrite -S 0xa1 -b 1k -V8  20M 32k" -d $dev >>$seqres.full 
2>&1 || \
+_fail "pwrite failed"
+
+$XFS_IO_PROG -c "pwrite -S 0xa2 -b 1k -V8  2M 32k" -f $dev >>$seqres.full 2>&1 
|| \
+_fail "pwrite failed"
+
+$XFS_IO_PROG -c "pwrite -S 0xa3 -b 4k  1536000 8k" -f $dev >>$seqres.full 2>&1 
|| \
+_fail "pwrite failed"
+
+$XFS_IO_PROG -c "fsync" -d $dev >>$seqres.full 2>&1 || _fail "fsync failed"
+
+echo "Check that buffered and direct read works"
+dd if=$dev bs=4k  2>$seqres.full | md5sum
+dd if=$dev bs=4M iflag=direct 2>$seqres.full | md5sum
+
+echo "Check csum corruption detection"
+# LIO-fileio store t10 DIF data in separate file ${IMG}.protection
+# struct t10_pi_tuple {
+#  __be16 guard_tag;   /* Checksum */
+#   __be16 app_tag; /* Opaque storage */
+#   __be32 ref_tag; /* Target LBA or indirect LBA */
+#}
+# Play with 3000'th sector -> t10_pi_tuple offset == 3000 * 8 == 24000
+#
+echo "T1: Corrupt guard_tag, next read should fail"
+$XFS_IO_PROG -c "pwrite -S 0xde -b2 24000 2 -w" \
+-f $TEST_DIR/$$/t10-dif-t1.protection >>$seqres.full 2>&1
+dd if=$dev bs=1M count=2 iflag=direct >>$seqres.full 2>&1 &&
+_fail "read should fail on 3000'th sector"
+
+echo "T2: Check that unaffected blocks are still readable"
+dd if=$dev bs=1M count=1 iflag=direct >>$seqres.full 2>&1 || _fail "read 
failed"
+
+echo "T3: Rewrite corrupted sector and check that read works"
+$XFS_IO_PROG -c "pwrite -S 0xa3 -b 4k 1536000 4k" -d $dev >>$seqres.full 2>&1 
|| \
+_fail "pwrite failed"
+dd if=$dev bs=2M count=1 iflag=direct >>$seqres.f

[PATCH 0/3] Improve block device testing coverage

2017-03-30 Thread Dmitry Monakhov
During LSFMM we have discussed how to test lower-backend of linux IO-stack.
Common opinion was that xfstests is the most obvious solution which cover
most of use cases filesystem care about.

I'm working on integration T10-DIF/DIF data integrity features to ext4,
for that reason we need to be shure that linux integrity framework is
in working state, which is currently broken in several places.

In fact, it is relatively simple to add basic coverage tests for basic
IO operations over virtual device with integrity support. All we need
is to add lio target support.

TOC:
add lio target helpers
add test: generic/420 check information lead for lio-fileio
add test: generic/421 basic blockdev T10-DIF-TYPE1 IO


HDD Unrecovered readerror issue

2016-07-20 Thread Dmitry Monakhov

Drive:WDC WD1003FZEX-00MK2A0
I have got this in logs:

ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/a0:a0:f0:c0:c5/00:00:04:00:00/40 tag 20 ncq 81920 in res 
41/40:00:88:c1:c5/00:00:04:00:00/00 Emask 0x409 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 0:0:0:0: [sda] Sense Key : Medium Error[current] [descriptor]
sd 0:0:0:0: [sda] Add. Sense: Unrecovered readerror - auto reallocate failed
sd 0:0:0:0: [sda] CDB: Read(10) 28 00 04 c5 c0 f0 00 00 a0 00
blk_update_request: I/O error, dev sda, sector 80069000
ata1: EH complete

I can reproduce this easily
#xfs_io -c "pread $((80069000/2))k 4k" -d  /dev/sda
pread64: Input/output error
##Got EIO
##Smartctl also detect this
#smartctl -t short /dev/sda
#smartctl -l selftest /dev/sda

Short offline   Completed: read failure   90%  4682 80069000

But once I rewrite this block, problem goes away.
#xfs_io -c "pwrite -S 0x0 $((80069000/2))k 4k" -d  /dev/sda

Now I can read it w/o any errors and smartctl is happy
#smartctl -t short /dev/sda
#smartctl -l selftest /dev/sda
Num  Test_DescriptionStatus  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline   Completed without error   00%  4683 -

So my disk is not dead right? Why the hell HDD fail read from very beginning
Is this because HDD firmware detect internal crcXX sum corruption?
How this can happen? Is this because of power failure?
AFAIK standard guarantees that sector will be updated atomically.
But it happens! Please guide me how to fix such problems in general.


signature.asc
Description: PGP signature