Re: [dm-devel] [PATCH 08/15] dm mpath: merge do_end_io_bio into multipath_end_io_bio
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > This simplifies the code and especially the error passing a bit and > will help with the next patch. > > Signed-off-by: Christoph Hellwig <h...@lst.de> > --- > drivers/md/dm-mpath.c | 42 - > - > 1 file changed, 16 insertions(+), 26 deletions(-) > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c > index 3df056b73b66..b1cb0273b081 100644 > --- a/drivers/md/dm-mpath.c > +++ b/drivers/md/dm-mpath.c > @@ -1510,24 +1510,26 @@ static int multipath_end_io(struct dm_target > *ti, struct request *clone, > return r; > } > > -static int do_end_io_bio(struct multipath *m, struct bio *clone, > - int error, struct dm_mpath_io *mpio) > +static int multipath_end_io_bio(struct dm_target *ti, struct bio > *clone, int error) > { > + struct multipath *m = ti->private; > + struct dm_mpath_io *mpio = get_mpio_from_bio(clone); > + struct pgpath *pgpath = mpio->pgpath; > unsigned long flags; > > - if (!error) > - return 0; /* I/O complete */ > + BUG_ON(!mpio); You dereferenced mpio already above. Regards, Martin > > - if (noretry_error(error)) > - return error; > + if (!error || noretry_error(error)) > + goto done; > > - if (mpio->pgpath) > - fail_path(mpio->pgpath); > + if (pgpath) > + fail_path(pgpath); > > if (atomic_read(>nr_valid_paths) == 0 && > !test_bit(MPATHF_QUEUE_IF_NO_PATH, >flags)) { > dm_report_EIO(m); > - return -EIO; > + error = -EIO; > + goto done; > } > > /* Queue for the daemon to resubmit */ > @@ -1539,28 +1541,16 @@ static int do_end_io_bio(struct multipath *m, > struct bio *clone, > if (!test_bit(MPATHF_QUEUE_IO, >flags)) > queue_work(kmultipathd, >process_queued_bios); > > - return DM_ENDIO_INCOMPLETE; > -} > - > -static int multipath_end_io_bio(struct dm_target *ti, struct bio > *clone, int error) > -{ > - struct multipath *m = ti->private; > - struct dm_mpath_io *mpio = get_mpio_from_bio(clone); > - struct pgpath *pgpath; > - struct path_selector *ps; > - int r; > - > - BUG_ON(!mpio); > - > - r = do_end_io_bio(m, clone, error, mpio); > - pgpath = mpio->pgpath; > + error = DM_ENDIO_INCOMPLETE; > +done: > if (pgpath) { > - ps = >pg->ps; > + struct path_selector *ps = >pg->ps; > + > if (ps->type->end_io) > ps->type->end_io(ps, >path, mpio- > >nr_bytes); > } > > - return r; > + return error; > } > > /* -- Dr. Martin Wilck <mwi...@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
A story of btrfs corruption and recovery
In April 2014, I reported a btrfs corruption on the linux-btrfs mailing list (http://www.spinics.net/lists/linux-btrfs/msg33318.html). 8 months later, I am happy to be able to say I've been able to recover the data with a combination of persistence and luck. I want to share some of my insight with this list in the hope it that may be useful in future cases. I also did some work on the btrfs tools to be able to better understand what was wrong; I will submit the additions and changes I made for review later. 1. The history I had created this file system in late 2012 when I installed OpenSUSE 12.2 on a friend's laptop. btrfs was still unstable at that time, I imagine you say. That's easy to say in hindsight. OpenSUSE's installer offered btrfs as a tier-1 choice, as far as I remember. Articles written at the time (e.g. http://rainbowtux.blogspot.de/2012/09/to-btrfs-or-not-to-btrfs.html) suggest that I wasn't the only person considering it worth a serious try. Today I wish I hadn't incautiously put my friend's /home on that FS, too - I've certainly paid for that carelessness. So, /home was subvolume 263 in this file system. Complicating matters further, I had created encrypted home file systems using ecryptfs on top of btrfs. 2. The disaster It all went well until April 14, 2014. On that day, the laptop suddenly crashed. OpenSUSE Kernel 3.4.11-2.16 was running at the time of the crash. Subsequent reboot attempts failed. I described the phenomena in my posting to linux-btrfs, desparately hoping someone would give me an easy recipe for recovery. It didn't happen. I got the recommendation to use a newer version of the kernel and btrfs tools, but they didn't get me any further. Whatever tool I tried, /home appeared to be completely empty. I had to dig deeper. 3. The quest After quite some time, I found the hint, looking at the root of the /home subvolume, which was a level 2 node: # ./btrfs-debug-tree -b 980717568 /dev/XX node 980717568 level 2 items 78 free 43 generation 39637 owner 263 key (256 INODE_ITEM 0) block 1012207616 (247121) gen 35754 Looking at the supposed level-1 subnode at 1012207616, I found that it contained data of the wrong level (0), owner (2 - the extent tree), and generation: leaf 1012207616 items 26 free space 1967 generation 39622 owner 2 item 0 key (8266870784 EXTENT_ITEM 12288) itemoff 3942 itemsize 53 So, the tree was massively corrupted at this crucial point; the top inode of the subvolume couldn't be found, explaining why /home had appeared empty on every recovery attempt. I looked at the other children of the children of the tree root, and was pleasantly surprised that these didn't look bad; I saw inodes and directory entries of ecryptfs-encrypted home directories, as I had expected. The obvious next thing to try was to look for previous generations of the root of the /home subvolume, hoping they weren't corrupted. I started with the super block root backups, with no luck. Later I went back all the way from generation 39637 to 38081 (the oldest copy of this root node I could find), but it was just as corrupted as the last one - they all pointed to the same wrong level 1 block 1012207616. I began to wonder whether the all-important level 1 and leaf meta data of this part of the file system had survived somewhere at all. I hacked together a tool to search for a specific btrfs key in all of the meta data, and used it to search for the the key 256-1-0 of the subvolume 263 (the first inode of the /home file system). Luckily, I found exactly one copy of a leaf containing this key, and a handful of level 1 nodes referring to it. At this point I didn't yet dare to even think of repairing the file system. Rather, I made additional debugging steps. One strange thing I found was that beyond the 603 top (level 2) copies of /home's root node, there were several instances with the same generation number: node 1037123584 level 2 items 78 free 43 generation 39636 owner 263 node 1041215488 level 2 items 78 free 43 generation 39636 owner 263 node 980566016 level 2 items 78 free 43 generation 39636 owner 263 node 980717568 level 2 items 78 free 43 generation 39637 owner 263 Looking at the details of these blocks, I found that the various level-2-gen-39636-owner-263 were actually different. I have no idea if this can happen under any circumstances, but it gave me another hint towards the final solution. Out of the generation 39636 roots listed above, only the last one showed the original corruption I described - the others actually had reasonable data in slot 1. My first hope that these root copies might actually be healthy was quickly destroyed - a tree dump showed other errors. But, and that was key, these other corruptions were at different points of the tree. Taking the three gen-39636 roots together, I was able to find sane data for every part of the tree. I was lucky insofar as the total number of corruptions I needed to fix turned out to be so low that it was doable
Re: Lost /home subvolume after btrfs crash
Chris, OpenSUSE 12.3 is using kernel 3.7 which is also old for this sort of recovery attempt. Even openSUSE 13.1 is at 3.11.6 which might work in a bind, but if it doesn't, inevitably someone will suggest you use something even newer. Thanks for your reply, I appreciate it a lot. Current stable is 3.14.1, I suggest giving 3.13 or 3.14 a shot at this with -o ro,recovery as a first step and see if it at least mounts. I will. Note that with 12.3, which was the most recent media I had at hand at the time, the FS was actually mountable at first (-o ro,recovery). But there was no /home and later attempts to actually access data in other subvolumes failed with the messages in my debug material. And an old kernel implies old btrfs-progs too, which is where the code for btrfsck and btrfs restore is contained. So that needs to be at least v 3.12. And hopefully you didn't use --repair with btrfsck yet. I did, but I made a block-level copy of the device before. Thanks again Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Lost /home subvolume after btrfs crash
Hello, I have a broken btrfs file system on a laptop. Debug material is available here: https://www.dropbox.com/sh/utv8b3qd0do6a04/zTwGQCrN9x Most importantly, the /home subvolume is lost. All attempts to recover data from it (btrfs-restore, mount -o recovery, btrfsck) have failed so far (/home is simply empty in the btrfs-restore output), although I was able to recover most of the other subvolumes and the main FS. The btrfs tools would typically segfault with failed assertions when analyzing the FS. Grepping through the entire volume shows that (at least parts of) /home are still available, but they seem to be disconnected from the main tree somehow, or the meta data is so corrupt that the tools bail out trying to find files under it. Making matters worse, the important data was encrypted using ecryptfs, so I need to recover the ciphertext first and then find a way to recover the plaintext (I do have the passphrase). The crash happened with a rather old OpenSUSE 12.2 kernel (3.4.11-2.16). The user says she was just surfing the web normally when the crash occured (no screenshot of the original crash, unfortunately). On the next boot, the btrfs root file system couldn't be mounted any more. After that I booted an OpenSUSE 12.3 rescue DVD and created the debug material shown above. Any hints how to retrieve files from the /home subvolume from this corrupt file system would be highly appreciated. Best regards Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html