Hi,

One of my drives died earlier in a fairly emphatic way in that not
only did it show IO errors and got removed as a device by the
kernel, but it was also making audible grinding/screeching noises
until I hot unplugged it.

Feb 14 18:29:36 specialbrew kernel: [27576156.070961] ata6.15: SATA link up 3.0 
Gbps (SStatus 123 SControl 0)
Feb 14 18:29:37 specialbrew kernel: [27576157.215312] ata6.00: hard resetting 
link
Feb 14 18:29:37 specialbrew kernel: [27576157.555369] ata6.00: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:37 specialbrew kernel: [27576157.560028] ata6.01: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576157.915797] ata6.01: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576157.920591] ata6.02: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576158.275759] ata6.02: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576158.280603] ata6.03: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576158.603658] ata6.03: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:38 specialbrew kernel: [27576158.608844] ata6.04: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576158.947805] ata6.04: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576158.953058] ata6.05: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576159.291801] ata6.05: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.297143] ata6.06: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576159.639850] ata6.06: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.645411] ata6.07: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576159.971581] ata6.07: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576159.977251] ata6.08: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576160.303533] ata6.08: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.310056] ata6.09: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576160.635541] ata6.09: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.641371] ata6.10: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576160.967639] ata6.10: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576160.973591] ata6.11: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576161.299570] ata6.11: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.305670] ata6.12: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576161.631589] ata6.12: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.637725] ata6.13: hard resetting 
link
Feb 14 18:29:42 specialbrew kernel: [27576161.963597] ata6.13: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576161.969538] ata6.14: hard resetting 
link
Feb 14 18:29:42 specialbrew kernel: [27576162.295657] ata6.14: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576162.303094] ata6.00: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.310674] ata6.01: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.317928] ata6.02: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.326589] ata6.04: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.337178] ata6.05: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.344438] ata6.06: configured for 
UDMA/100
Feb 14 18:29:43 specialbrew kernel: [27576163.607145] ata6.03: hard resetting 
link
Feb 14 18:29:44 specialbrew kernel: [27576163.935962] ata6.03: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:44 specialbrew kernel: [27576163.942835] ata6.03: limiting SATA 
link speed to 1.5 Gbps
Feb 14 18:29:49 specialbrew kernel: [27576168.939422] ata6.03: hard resetting 
link
Feb 14 18:29:49 specialbrew kernel: [27576169.264031] ata6.03: SATA link down 
(SStatus 0 SControl 310)
Feb 14 18:29:49 specialbrew kernel: [27576169.270519] ata6.03: disabled
Feb 14 18:29:49 specialbrew kernel: [27576169.276874] end_request: I/O error, 
dev sdh, sector 0
Feb 14 18:29:49 specialbrew kernel: [27576169.282908] 
btrfs_dev_stat_print_on_error: 965 callbacks suppressed
Feb 14 18:29:49 specialbrew kernel: [27576169.282929] ata6: EH complete
Feb 14 18:29:49 specialbrew kernel: [27576169.294246] BTRFS: bdev /dev/sdh 
errs: wr 125, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.300987] sd 5:3:0:0: rejecting I/O 
to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.307016] BTRFS: lost page write 
due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.312976] BTRFS: bdev /dev/sdh 
errs: wr 126, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.319049] ata6.03: detaching (SCSI 
5:3:0:0)
Feb 14 18:29:49 specialbrew kernel: [27576169.319433] sd 5:3:0:0: rejecting I/O 
to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.319443] BTRFS: lost page write 
due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.319448] BTRFS: bdev /dev/sdh 
errs: wr 127, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.319521] sd 5:3:0:0: rejecting I/O 
to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.319523] BTRFS: lost page write 
due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.319526] BTRFS: bdev /dev/sdh 
errs: wr 128, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.426264] sd 5:3:0:0: [sdh] 
Synchronizing SCSI cache
Feb 14 18:29:49 specialbrew kernel: [27576169.432734] sd 5:3:0:0: [sdh]  
Feb 14 18:29:49 specialbrew kernel: [27576169.438653] Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 14 18:29:49 specialbrew kernel: [27576169.444590] sd 5:3:0:0: [sdh] 
Stopping disk
Feb 14 18:29:49 specialbrew kernel: [27576169.450961] sd 5:3:0:0: [sdh] 
START_STOP FAILED
Feb 14 18:29:49 specialbrew kernel: [27576169.456838] sd 5:3:0:0: [sdh]  
Feb 14 18:29:49 specialbrew kernel: [27576169.462622] Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 14 18:30:21 specialbrew kernel: [27576201.178630] BTRFS: bdev /dev/sdh 
errs: wr 128, rd 8, flush 2, corrupt 0, gen 0
Feb 14 18:30:21 specialbrew kernel: [27576201.309583] BTRFS: lost page write 
due to I/O error on /dev/sdh
Feb 14 18:30:21 specialbrew kernel: [27576201.315761] BTRFS: bdev /dev/sdh 
errs: wr 129, rd 8, flush 2, corrupt 0, gen 0
Feb 14 18:30:21 specialbrew kernel: [27576201.322086] BTRFS: lost page write 
due to I/O error on /dev/sdh

…and those BTRFS: messages continue now even though the system no
longer has a /dev/sdh.

Now:

$ sudo btrfs fi sh /srv/tank
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 6 FS bytes used 1.57TiB
        devid    3 size 1.82TiB used 383.00GiB path /dev/sdg
        devid    4 size 1.82TiB used 384.00GiB path /dev/sdf
        devid    5 size 2.73TiB used 1.25TiB path /dev/sdk
        devid    6 size 1.82TiB used 347.00GiB path /dev/sdj
        devid    7 size 2.73TiB used 464.00GiB path /dev/sde
        *** Some devices missing
$ sudo btrfs dev usage /srv/tank
/dev/sde, ID: 7
   Device size:             2.73TiB
   Data,RAID1:            464.00GiB
   Unallocated:             2.28TiB

/dev/sdf, ID: 4
   Device size:             1.82TiB
   Data,RAID1:            383.00GiB
   Metadata,RAID1:          1.00GiB
   Unallocated:             1.44TiB

/dev/sdg, ID: 3
   Device size:             1.82TiB
   Data,RAID1:            382.00GiB
   Metadata,RAID1:          1.00GiB
   Unallocated:             1.45TiB

/dev/sdh, ID: 2
   Device size:               0.00B
   Data,RAID1:            383.00GiB
   Metadata,RAID1:          1.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.44TiB

/dev/sdj, ID: 6
   Device size:             1.82TiB
   Data,RAID1:            347.00GiB
   Unallocated:             1.48TiB

/dev/sdk, ID: 5
   Device size:             2.73TiB
   Data,RAID1:              1.25TiB
   Metadata,RAID1:          3.00GiB
   System,RAID1:           32.00MiB
   Unallocated:             1.48TiB

So, ideally I'd like to remove the missing device sdh (id 2) to have
redundant copies of the data until I can insert a new drive. But
"remove" doesn't seem to want to work:

$ sudo btrfs dev remove /dev/sdh /srv/tank
ERROR: not a block device: /dev/sdh
$ sudo btrfs dev remove 2 /srv/tank
ERROR: not a block device: 2
$ btrfs --version
btrfs-progs v4.4

I expect my kernel might be too old as it is a Debian backports
version on wheezy (linux-image-3.16.0-0.bpo.4-amd64
3.16.7-ckt20-1+deb8u3~bpo70+1).

If I upgrade the kernel then should one of those remove commands
above work?

I would rather not reboot just now if I can achieve redundancy in
some other way. Would a rebalance like:

$ sudo btrfs balance -f -v -sdevid=2 -mdevid=2 /srv/tank

reconstruct redundant copies elsewhere?

With this btrfs-progs and kernel version, will a later "btrfs
replace start -r /dev/sdh /dev/sdl" work without me rebooting into a
newer kernel, even though /dev/sdh doesn't exist as a device to the
kernel right now?

Any information/advice appreciated.

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to