On Wed, Aug 19, 2015 at 1:22 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote: > > > Timothy Normand Miller wrote on 2015/08/18 22:55 -0400: >> >> On Tue, Aug 18, 2015 at 10:48 PM, Qu Wenruo <quwen...@cn.fujitsu.com> >> wrote: >>> >>> >>> >>> Timothy Normand Miller wrote on 2015/08/18 22:46 -0400: >>>> >>>> >>>> On Tue, Aug 18, 2015 at 9:32 PM, Qu Wenruo <quwen...@cn.fujitsu.com> >>>> wrote: >>>>> >>>>> >>>>> Hi Timothy, >>>>> >>>>> Although I have replied to the bugzilla, IMHO it's more appropriate to >>>>> discuss it in mail list, as it's not a kernel bug. >>>>> >>>> >>>> All four devices were online. The "missing" one was a drive that >>>> died, which was replaced by a new one, but btrfs wouldn't finish the >>>> deletion of the missing device. >>>> >>> By replaced, did you mean "btrfs replace"? Or just change the physical >>> disk >>> without using "btrfs replace"? >> >> >> Here's what happened: >> >> - A drive started throwing bad sectors. Somehow this caused metadata >> on other drives to get messed up. > > > Did that cause any huge damage?
It seems that metadata was damaged on all drives. > >> - I took that drive offline and mounted degraded (it's a 4-drive RAID1) >> - I did a "btrfs add" on a new drive and then a "btrfs delete missing" >> - The replacement drive failed during the replacement operation, and >> everything went to crap. >> - With some help, I got a kernel patch that allowed me to mount the >> original three drives with TWO missing devices. > > > So the original 3 drives are still OK, > original bad one is missing, and the newly add one is also missing? > > That sounds quite repairable. Nothing I tried would run to completion. There were always errors. > >> - I added a brand new drive and then did "delete missing" again. This >> time, the first "delete missing" was successful, but it didn't fully >> balance the drives, and there was another missing device, so I had to >> do a "delete missing" again, and that failed. >> >> I wanted to get this back online and restored from a backup, but I was >> willing to keep it this way if people wanted to probe at, in case we >> can uncover any btrfs bugs. So it was suggested to get a metadata >> image, but that ran into some kind of bug in btrfs-image. > > If btrfs-image doesn't work, you can also try btrfs-debug-tree. > IIRC, debug-tree should be more robust than btrfs-image. > > BTW, have you tried btrfsck on it? Does it also cause the infinite loop? > > I'll also try to reproduce it and investigate the codes directly. Well, I had to get things back online, so I've restored from backup. I do have what limited metadata image I could get from btrfs-image. > > Thanks, > Qu > >> >> Currently, I'm restoring from backup, but I have at least a partial >> metadata dump. >> >> > -- Timothy Normand Miller, PhD Assistant Professor of Computer Science, Binghamton University http://www.cs.binghamton.edu/~millerti/ Open Graphics Project -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html