Package: linux-image-3.16.0-0.bpo.4-amd64 Version: 3.16.7-ckt2-1~bpo70+1 Severity: important
Dear Maintainer, * What led up to the situation? One of my RAID1 arrays sporadically degrades during the checkarray cron job: Jan 4 00:57:01 nihlus /USR/SBIN/CRON[4367]: (root) CMD (if [ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi) Jan 4 00:57:01 nihlus kernel: [ 3932.435274] md: data-check of RAID array md0 Jan 4 00:57:01 nihlus kernel: [ 3932.455356] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Jan 4 00:57:01 nihlus kernel: [ 3932.469160] md: delaying data-check of md2 until md0 has finished (they share one or more physical units) Jan 4 00:57:01 nihlus kernel: [ 3932.524839] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Jan 4 00:57:01 nihlus kernel: [ 3932.569568] md: using 128k window, over a total of 262132k. Jan 4 00:57:03 nihlus kernel: [ 3934.473794] md: md0: data-check done. Jan 4 00:57:03 nihlus kernel: [ 3934.491622] md: data-check of RAID array md2 Jan 4 00:57:03 nihlus kernel: [ 3934.510850] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Jan 4 00:57:03 nihlus mdadm[2289]: RebuildFinished event detected on md device /dev/md/0 Jan 4 00:57:03 nihlus kernel: [ 3934.541334] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. Jan 4 00:57:03 nihlus kernel: [ 3934.587243] md: using 128k window, over a total of 1952201680k. [...] Jan 4 03:35:35 nihlus kernel: [13446.203438] sd 1:0:0:0: [sdb] Unhandled error code Jan 4 03:35:35 nihlus kernel: [13446.225179] sd 1:0:0:0: [sdb] Jan 4 03:35:35 nihlus kernel: [13446.239316] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT Jan 4 03:35:35 nihlus kernel: [13446.265222] sd 1:0:0:0: [sdb] CDB: Jan 4 03:35:36 nihlus kernel: [13446.280916] Write(10): 2a 00 00 d8 67 08 00 00 20 00 Jan 4 03:35:36 nihlus kernel: [13446.303438] end_request: I/O error, dev sdb, sector 14182152 Jan 4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Disk failure on sdb3, disabling device. Jan 4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Operation continuing on 1 devices. Jan 4 03:35:36 nihlus kernel: [13446.401456] md: md2: data-check interrupted. Jan 4 03:35:36 nihlus kernel: [13446.467913] RAID1 conf printout: Jan 4 03:35:36 nihlus kernel: [13446.467920] --- wd:1 rd:2 Jan 4 03:35:36 nihlus kernel: [13446.467925] disk 0, wo:0, o:1, dev:sda3 Jan 4 03:35:36 nihlus kernel: [13446.467929] disk 1, wo:1, o:0, dev:sdb3 Jan 4 03:35:36 nihlus kernel: [13446.492871] RAID1 conf printout: Jan 4 03:35:36 nihlus kernel: [13446.492878] --- wd:1 rd:2 Jan 4 03:35:36 nihlus kernel: [13446.492883] disk 0, wo:0, o:1, dev:sda3 Jan 4 03:35:36 nihlus mdadm[2289]: Fail event detected on md device /dev/md/2 Jan 4 03:35:36 nihlus postfix/pickup[4968]: 3kFPGJ1mCzz1n: uid=0 from=<root> Jan 4 03:35:36 nihlus postfix/cleanup[5060]: 3kFPGJ1mCzz1n: message-id=<3kfpgj1mcz...@spectre.leuxner.net> Jan 4 03:35:36 nihlus mdadm[2289]: FailSpare event detected on md device /dev/md/2, component device /dev/sdb3 Jan 4 03:35:36 nihlus mdadm[2289]: RebuildFinished event detected on md device /dev/md/2 # cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda3[0] sdb3[2](F) 1952201680 blocks super 1.2 [2/1] [U_] md1 : active (auto-read-only) raid1 sda2[0] sdb2[1] 1048564 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 262132 blocks super 1.2 [2/2] [UU] # smartctl -i /dev/sdb smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Device Model: TOSHIBA DT01ACA200 Serial Number: [redacted] LU WWN Device Id: 5 000039 ff3e05ac0 Firmware Version: MX4OABB0 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Sun Jan 4 18:50:18 2015 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled # smartctl -l selftest /dev/sdb smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1490 - # 2 Extended offline Completed without error 00% 822 - # 3 Short offline Completed without error 00% 812 - * What exactly did you do (or not do) that was effective (or ineffective)? # mdadm --manage /dev/md2 --remove /dev/sdb3 # mdadm --manage /dev/md2 --add /dev/sdb3 * What was the outcome of this action? The array rebuilt without _any_ errors. The drive never went offline during normal operation and also shows no errors when conducting self-tests. It only sporadically gets removed from the array during the checkarray job - when a driver timeout occurs. * What outcome did you expect instead? No drive degradation during cron job. -- System Information: Debian Release: 7.7 APT prefers stable APT policy: (1001, 'stable'), (500, 'unstable'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-0.bpo.4-amd64 Locale: LANG=en_US.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150104182356.35893.74154.report...@nihlus.leuxner.net