On Mon, Jun 20, 2016 at 01:30:11PM -0600, Chris Murphy wrote: > On Mon, Jun 20, 2016 at 1:11 PM, Zygo Blaxell > <ce3g8...@umail.furryterror.org> wrote: > > On Mon, Jun 20, 2016 at 11:13:51PM +0500, Roman Mamedov wrote: > >> On Sun, 19 Jun 2016 23:44:27 -0400 > Seems difficult at best due to this: > >>The normal 'device delete' operation got about 25% of the way in, > then got stuck on some corrupted sectors and aborting with EIO. > > In effect it's like a 2 disk failure for a raid5 (or it's > intermittently a 2 disk failure but always at least a 1 disk failure). > That's not something md raid recovers from. Even manual recovery in > such a case is far from certain. > > Perhaps Roman's advice is also a question about the cause of this > corruption? I'm wondering this myself. That's the real problem here as > I see it. Losing a drive is ordinary. Additional corruptions happening > afterward is not. And are those corrupt sectors hardware corruptions, > or Btrfs corruptions at the time the data was written to disk, or > Btrfs being confused as it's reading the data from disk?
> For me the critical question is what does "some corrupted sectors" mean? On other raid5 arrays, I would observe a small amount of corruption every time there was a system crash (some of which were triggered by disk failures, some not). It looked like any writes in progress at the time of the failure would be damaged. In the past I would just mop up the corrupt files (they were always the last extents written, easy to find with find-new or scrub) and have no further problems. In the earlier cases there were no new instances of corruption after the initial failure event and manual cleanup. Now that I did a little deeper into this, I do see one fairly significant piece of data: root@host:~# btrfs dev stat /data | grep -v ' 0$' [/dev/vdc].corruption_errs 16774 [/dev/vde].write_io_errs 121 [/dev/vde].read_io_errs 4 [devid:8].read_io_errs 16 Prior to the failure of devid:8, vde had 121 write errors and 4 read errors (these counter values are months old and the errors were long since repaired by scrub). The 16774 corruption errors on vdc are all new since the devid:8 failure, though. > > > -- > Chris Murphy >
signature.asc
Description: Digital signature