Re: csum errors in VirtualBox VDI files
On Sat, Mar 26, 2016 at 7:30 PM, Kai Krakowwrote: > Both filesystems on this PC show similar corruption now - but they are > connected to completely different buses (SATA3 bcache + 3x SATA2 > backing store bache{0,1,2}, and USB3 without bcache = sde), use > different compression (compress=lzo vs. compress-force=zlib), but > similar redundancy scheme (draid=0,mraid=1 vs. draid=single,mraid=dup). > A hardware problem would induce completely random errors on these > pathes. > > Completely different hardware shows similar problems - but that system > is currently not available to me, and will stay there for a while > (it's a non-production installation at my workplace). Why would similar > errors show up here, if it'd be a hardware error of the first system? Then there's something about the particular combination of mount options you're using with the workload that's inducing this, if it's reproducing on two different systems. What's the workload and what's the full history of the mount options? Looks like it started life as compress lzo and then later compress-force zlib and then after that the addition of space_cache=v2? Hopefully Qu has some advice on what's next. It might not be a bad idea to get a btrfs-image going. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
On Sat, Mar 26, 2016 at 7:50 PM, Kai Krakowwrote: > > # now let's wait for the backup to mount the FS and look at dmesg: > > [21375.606479] BTRFS info (device sde1): force zlib compression > [21375.606483] BTRFS info (device sde1): using free space tree You're using space_cache=v2. You're aware new free space tree option sets a read only incompat feature flag on the file system? You've got quite a few non-default mount options on this backup volume. Hopefully Qu has some idea what to try next or if you're better off just starting over with a new file system. > I only saw unreliable behavior with 4.4.5, 4.4.6, and 4.5.0 tho the > problem may exist longer in my FS. > > $ sudo btrfs-show-super /dev/sde1 > superblock: bytenr=65536, device=/dev/sde1 > - > csum0xcc976d97 [match] > bytenr 65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid1318ec21-c421-4e36-a44a-7be3d41f9c3f > label usb-backup > generation 50814 > root1251250159616 > sys_array_size 129 > chunk_root_generation 50784 > root_level 1 > chunk_root 2516518567936 > chunk_root_level1 > log_root0 > log_root_transid0 > log_root_level 0 > total_bytes 2000397864960 > bytes_used 1860398493696 > sectorsize 4096 > nodesize16384 > leafsize16384 > stripesize 4096 > root_dir6 > num_devices 1 > compat_flags0x0 > compat_ro_flags 0x1 > incompat_flags 0x169 > ( MIXED_BACKREF | > COMPRESS_LZO | > BIG_METADATA | > EXTENDED_IREF | > SKINNY_METADATA ) > csum_type 0 > csum_size 4 > cache_generation50208 > uuid_tree_generation50742 > dev_item.uuid 9008d5a0-ac7b-4505-8193-27428429f953 > dev_item.fsid 1318ec21-c421-4e36-a44a-7be3d41f9c3f [match] > dev_item.type 0 > dev_item.total_bytes2000397864960 > dev_item.bytes_used 1912308039680 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size4096 > dev_item.devid 1 > dev_item.dev_group 0 > dev_item.seek_speed 0 > dev_item.bandwidth 0 > dev_item.generation 0 > > > > BTW: btrfsck thinks that the space tree is invalid every time it is > run, no matter if cleanly unmounted, uncleanly unmounted, or "btrfsck > --repair" and then ran a second time. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID-1 refuses to balance large drive
For those curious as the the result, the reduction to single and restoration to RAID1 did indeed balance the array. It was extremely slow of course on a 12tb array. I did not bother doing this with the metadata. I also stopped the conversion to single when it had freed up enough space on the 2 smaller drives, because at that time it was moving stuff into the big drive, which seemed sub-optimal considering what was to come. In general, obviously, I hope the long term goal is to not need this, indeed not to need manual balance at all. I would hope the goal is to just be able to add and remove drives, tell the system what type of redundancy you need and let it figure out the rest. But I know this is an FS in development. I've actually come to feel that when it comes to personal drive arrays, we actually need something much smarter than today's filesystems. Truth is, for example, that once my infrequently accessed files, such as old photo and video archives, have a solid backup made, there is not actually a need to keep them redundantly at all, except for speed, while the much smaller volume of frequently accessed files needs that (or even extra redundancy not for safety but extra speed, and of course cache on an SSD is even better.) This requires not just the fileystem and OS to get smarter about this, but even the apps. It may happen some day -- no matter how cheap storage gets, we keep coming up with ways to fill it. Thanks for the help. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
Am Sat, 26 Mar 2016 20:30:35 +0100 schrieb Kai Krakow: > Am Wed, 23 Mar 2016 12:16:24 +0800 > schrieb Qu Wenruo : > > > Kai Krakow wrote on 2016/03/22 19:48 +0100: > > > Am Tue, 22 Mar 2016 16:47:10 +0800 > > > schrieb Qu Wenruo : > > > > [...] > > [...] > [...] > > > > > > Apparently, that system does not boot now due to errors in bcache > > > b-tree. That being that, it may well be some bcache error and not > > > btrfs' fault. Apparently I couldn't catch the output, I've been > > > in a hurry. It said "write error" and had some backtrace. I will > > > come to this back later. > > > > > > Let's go to the system I currently care about (that one with the > > > always breaking VDI file): > > > > > [...] > [...] > > > > > > After the error occured? > > > > > > Yes, some text about the extent being compressed and btrfs repair > > > doesn't currently handle that case (I tried --repair as I'm > > > having a backup). I simply decided not to investigate that > > > further at that point but delete and restore the affected file > > > from backup. However, this is the message from dmesg (tho, I > > > didn't catch the backtrace): > > > > > > btrfs_run_delayed_refs:2927: errno=-17 Object already exists > > > > That's nice, at least we have some clue. > > > > It's almost sure, it's a bug either in btrfs kernel which doesn't > > handle delayed refs well(low possibility), or, corrupted fs which > > create something kernel can't handle(I bet that's the case). > > [kernel 4.5.0 gentoo, btrfs-progs 4.4.1] > > Well, this time it hit me on the USB backup drive which uses no bcache > and no other fancy options except compress-force=zlib. Apparently, > I've only got a (real) screenshot which I'm going to link here: > > https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 > > The same drive has no problems except "bad metadata crossing stripe > boundary" - but a lot of them. This drive was never converted, it was > freshly generated several months ago. > [...] I finally got copy data: # before mounting let's check the FS: $ sudo btrfsck /dev/disk/by-label/usb-backup Checking filesystem on /dev/disk/by-label/usb-backup UUID: 1318ec21-c421-4e36-a44a-7be3d41f9c3f checking extents bad metadata [156041216, 156057600) crossing stripe boundary bad metadata [181403648, 181420032) crossing stripe boundary bad metadata [392167424, 392183808) crossing stripe boundary bad metadata [783482880, 783499264) crossing stripe boundary bad metadata [784924672, 784941056) crossing stripe boundary bad metadata [130151612416, 130151628800) crossing stripe boundary bad metadata [162826813440, 162826829824) crossing stripe boundary bad metadata [162927083520, 162927099904) crossing stripe boundary bad metadata [619740659712, 619740676096) crossing stripe boundary bad metadata [619781947392, 619781963776) crossing stripe boundary bad metadata [619795644416, 619795660800) crossing stripe boundary bad metadata [619816091648, 619816108032) crossing stripe boundary bad metadata [620011388928, 620011405312) crossing stripe boundary bad metadata [890992459776, 890992476160) crossing stripe boundary bad metadata [891022737408, 891022753792) crossing stripe boundary bad metadata [891101773824, 891101790208) crossing stripe boundary bad metadata [891301199872, 891301216256) crossing stripe boundary bad metadata [1012219314176, 1012219330560) crossing stripe boundary bad metadata [1017202409472, 1017202425856) crossing stripe boundary bad metadata [1017365397504, 1017365413888) crossing stripe boundary bad metadata [1020764422144, 1020764438528) crossing stripe boundary bad metadata [1251103342592, 1251103358976) crossing stripe boundary bad metadata [1251144695808, 1251144712192) crossing stripe boundary bad metadata [1251147055104, 1251147071488) crossing stripe boundary bad metadata [1259271225344, 1259271241728) crossing stripe boundary bad metadata [1266223611904, 1266223628288) crossing stripe boundary bad metadata [1304750063616, 130475008) crossing stripe boundary bad metadata [1304790106112, 1304790122496) crossing stripe boundary bad metadata [1304850792448, 1304850808832) crossing stripe boundary bad metadata [1304869928960, 1304869945344) crossing stripe boundary bad metadata [1305089540096, 1305089556480) crossing stripe boundary bad metadata [1309561651200, 1309561667584) crossing stripe boundary bad metadata [1309581443072, 1309581459456) crossing stripe boundary bad metadata [1309583671296, 1309583687680) crossing stripe boundary bad metadata [1309942808576, 1309942824960) crossing stripe boundary bad metadata [1310050549760, 1310050566144) crossing stripe boundary bad metadata [1313031585792, 1313031602176) crossing stripe boundary bad metadata [1313232912384, 1313232928768) crossing stripe boundary bad metadata [1555210764288, 1555210780672) crossing stripe boundary bad metadata [1555395182592,
Re: csum errors in VirtualBox VDI files
Am Sat, 26 Mar 2016 15:04:13 -0600 schrieb Chris Murphy: > On Sat, Mar 26, 2016 at 2:28 PM, Chris Murphy > wrote: > > On Sat, Mar 26, 2016 at 1:30 PM, Kai Krakow > > wrote: > >> Well, this time it hit me on the USB backup drive which uses no > >> bcache and no other fancy options except compress-force=zlib. > >> Apparently, I've only got a (real) screenshot which I'm going to > >> link here: > >> > >> https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 > > > > This is a curious screen shot. It's a dracut pre-mount shell, so > > nothing should be mounted yet. And btrfs check only works on an > > unmounted file system. And yet the bottom part of the trace shows a > > Btrfs volume being made read only, as if it was mounted read write > > and is still mounted. Huh? > > Wait. You said no bcache, and yet in this screen shot it shows 'btrfs > check /dev/bcache2 ...' right before the back trace. > > This thread is confusing. You're talking about two different btrfs > volumes intermixed, one uses bcache the other doesn't, yet they both > have corruption. I think it's hardware related: bad cable bad ram bad > power, something. No it's not, it's tested. That system ran rock stable until somewhere in the 4.4 kernel series (probably). It ran high loads without problems (loadavg >50), it ran huge IO copies concurrently without problems, it survived unintentional reboots without FS corruption, it ran VirtualBox VMs without problems. And the system still runs almost without problems: Except for the "object already exists" which forced my rootfs RO, I did not even take note that the FS has corruptions: Nothing in dmesg, everything fine. There's just VirtualBox crashing a VM now, and I see csum errors in that very VDI file - even after recovering the file from backup, it happens again and again. Qu mentioned that this may be a follow-up of other corruption - and tada: Yes, there are lots of them now (my last check was back in 4.1 or 4.2 series). But because I still can rsync all my important files, I'd like to get my backup drive in sane state again first. Both filesystems on this PC show similar corruption now - but they are connected to completely different buses (SATA3 bcache + 3x SATA2 backing store bache{0,1,2}, and USB3 without bcache = sde), use different compression (compress=lzo vs. compress-force=zlib), but similar redundancy scheme (draid=0,mraid=1 vs. draid=single,mraid=dup). A hardware problem would induce completely random errors on these pathes. Completely different hardware shows similar problems - but that system is currently not available to me, and will stay there for a while (it's a non-production installation at my workplace). Why would similar errors show up here, if it'd be a hardware error of the first system? Meanwhile, I conclude we can rule out bcache or hardware - three file systems show similar errors: 1. bcache on Crucial MX100 SATA3, 3x SATA2 backing HDD 2. bcache on Samsung Evo 850 SATA2, 1x SATA1 backing HDD 3. 1x plain USB3 btrfs (no bcache) Not even the SSD hardware is in common, just system configuration in general (Gentoo kernel, rootfs on btrfs) and workload (I do lot's of similar things on both machines). I need to grab the errors for machine setup 2 - tho I can't do that currently, that system is offline and will be for a while. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
Am Sat, 26 Mar 2016 14:28:22 -0600 schrieb Chris Murphy: > On Sat, Mar 26, 2016 at 1:30 PM, Kai Krakow > wrote: > > > Well, this time it hit me on the USB backup drive which uses no > > bcache and no other fancy options except compress-force=zlib. > > Apparently, I've only got a (real) screenshot which I'm going to > > link here: > > > > https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 > > This is a curious screen shot. It's a dracut pre-mount shell, so > nothing should be mounted yet. And btrfs check only works on an > unmounted file system. And yet the bottom part of the trace shows a > Btrfs volume being made read only, as if it was mounted read write and > is still mounted. Huh? It's a pre-mount shell because I wanted to check the rootfs from there. I mounted it once (and unmounted) before checking (that's bcache{0,1,2}). Yeah, you can get there forcibly by using rd.break=pre-mount - and yeah, nothing "should" be mounted unless I did so previously. But I cut that away as it contained unrelated errors to this problem and would be even more confusing. The file system that failed then was the one I just mounted to put the stdout of btrfsck on (sde1). That one showed these (screenshot) kernel console logs just in the middle of typing the command - so a few seconds after mounting. What may be consusing to you: I use more than one btrfs. ;-) bcache{0,1,2} = my rootfs (plus subvolumes) sde = my USB3 backup drive (or whatever the kernel assigns) Both run btrfs. The bcache*'s have their own problems currently, I'd like to set those aside first and get the backup drive back in good shape. The latter seems easier to fix. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Sat, Mar 26, 2016 at 3:01 PM, John Marrettwrote: >> Well off hand it seems like the missing 2.73TB has nothing on it at >> all, and doesn't need to be counted as missing. The other missing is >> counted, and should have all of its data replicated elsewhere. But >> then you're running into csum errors. So something still isn't right, >> we just don't understand what it is. > > I'm not sure what we can do to get a better understanding of these > errors, that said it may not be necessary if replace helps, more > below. > >> Btrfs replace has been around for a while. 'man btrfs replace' the >> command takes the form 'btrfs replace start' plus three required >> pieces of information. You should be able to infer the missing devid >> using 'btrfs show' looks like it's 6. > > I was looking under btrfs device, sorry about that. I do have the > command. I tried replace and it seemed more promising than the last > attempt, it wrote enough data to the new drive to overflow and break > my overlay. I'm trying it without the overlay on the destination > device, I'll report back later with the results. > > I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove > this check: > > https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770 > > I can switch to whatever kernel though as desired. Would you prefer a > mainline ubuntu packaged kernel? Straight from kernel.org? Things are a lot more deterministic for developers and testers if you're using something current. It might not matter in this case that you're using 4.2 but all you have to do is look at the git pulls in the list archives to see many hundreds, often over 1000, btrfs changes per kernel cycle. So, lots and lots of fixes have happened since 4.2. And any bugs found in 4.2 don't really matter, because you'd have to try to reproduce in 4.4.6 or 4.5, and then the fix would go into 4.6 before it'd get backported, and then 4.2 won't be getting backports done by upstream. That's why list folks always suggest using something so recent. Again, in this case it might not matter, I don't read or understand every single commit. If you do want to use a newer one, I'd build against kernel.org, just because the developers only use that base. And use 4.4.6 or 4.5. It's reasonable to keep the overlay on the existing devices, but remove the overlay for the replacement so that you're directly writing to it. If that blows up with 4.2 you can still start over with a newer kernel. *shrug* -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
On Sat, Mar 26, 2016 at 2:28 PM, Chris Murphywrote: > On Sat, Mar 26, 2016 at 1:30 PM, Kai Krakow wrote: > >> Well, this time it hit me on the USB backup drive which uses no bcache >> and no other fancy options except compress-force=zlib. Apparently, I've >> only got a (real) screenshot which I'm going to link here: >> >> https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 > > This is a curious screen shot. It's a dracut pre-mount shell, so > nothing should be mounted yet. And btrfs check only works on an > unmounted file system. And yet the bottom part of the trace shows a > Btrfs volume being made read only, as if it was mounted read write and > is still mounted. Huh? Wait. You said no bcache, and yet in this screen shot it shows 'btrfs check /dev/bcache2 ...' right before the back trace. This thread is confusing. You're talking about two different btrfs volumes intermixed, one uses bcache the other doesn't, yet they both have corruption. I think it's hardware related: bad cable bad ram bad power, something. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
> Well off hand it seems like the missing 2.73TB has nothing on it at > all, and doesn't need to be counted as missing. The other missing is > counted, and should have all of its data replicated elsewhere. But > then you're running into csum errors. So something still isn't right, > we just don't understand what it is. I'm not sure what we can do to get a better understanding of these errors, that said it may not be necessary if replace helps, more below. > Btrfs replace has been around for a while. 'man btrfs replace' the > command takes the form 'btrfs replace start' plus three required > pieces of information. You should be able to infer the missing devid > using 'btrfs show' looks like it's 6. I was looking under btrfs device, sorry about that. I do have the command. I tried replace and it seemed more promising than the last attempt, it wrote enough data to the new drive to overflow and break my overlay. I'm trying it without the overlay on the destination device, I'll report back later with the results. I'm running ubuntu linux-image-4.2.0-34-generic with a patch to remove this check: https://github.com/torvalds/linux/blob/master/fs/btrfs/super.c#L1770 I can switch to whatever kernel though as desired. Would you prefer a mainline ubuntu packaged kernel? Straight from kernel.org? -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On Sat, Mar 26, 2016 at 8:00 AM, Stephen Williamswrote: > I know this is quite a rare occurrence for home use but for Data center > use this is something that will happen A LOT. > This really should be placed in the wiki while we wait for a fix. I can > see a lot of sys admins crying over this. Maybe on the gotchas page? While it's not a data loss bug, it might be viewed as an uptime bug because the dataset is stuck being ro and hence unmodifiable, until a restore to a rw volume is complete. Since we can ro mount a volume, some way to safely make it a seed device could be useful. All that's needed to make it rw is adding even a small USB stick for example, and now at least ro snapshots can be taken and migrate data off the volume. A larger device that's used for rw would allow this raid to be brought back online. And then once the new array is up and has most data restored, a short downtime to get the latest incremental changes sent over. Yeah, the alternative to this is a cluster, and you just consider this one brick a loss and move on. But most regular users don't do clusters, even with big (for them) storage. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
On Sat, Mar 26, 2016 at 5:51 AM, Patrik Lundquistwrote: > # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 0 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 0 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > We didn't inherit the /dev/sde error count. Is that a bug? I'm not sure where this information is stored. Presumably in the fs metadata? So when mounted degraded the counter is zero's is that what's going on? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
On Sat, Mar 26, 2016 at 6:15 AM, John Marrettwrote: > Chris, > >> Post 'btrfs fi usage' for the fileystem. That may give some insight >> what's expected to be on all the missing drives. > > Here's the information, I believe that the missing we see in most > entries is the failed and absent drive, only the unallocated shows two > missing entries, the 2.73 TB is the missing but empty device. I don't > know if there's a way to prove it however. > > ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt > Overall: > Device size: 15.45TiB > Device allocated: 12.12TiB > Device unallocated: 3.33TiB > Device missing: 5.46TiB > Used: 10.93TiB > Free (estimated): 2.25TiB(min: 2.25TiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB(used: 0.00B) > > Data,RAID1: Size:6.04TiB, Used:5.46TiB >/dev/sda 2.61TiB >/dev/sdb 1.71TiB >/dev/sdc 1.72TiB >/dev/sdd 1.72TiB >/dev/sdf 1.71TiB >missing 2.61TiB > > Metadata,RAID1: Size:14.00GiB, Used:11.59GiB >/dev/sda 8.00GiB >/dev/sdb 2.00GiB >/dev/sdc 3.00GiB >/dev/sdd 4.00GiB >/dev/sdf 3.00GiB >missing 8.00GiB > > System,RAID1: Size:32.00MiB, Used:880.00KiB >/dev/sda 32.00MiB >missing 32.00MiB > > Unallocated: >/dev/sda 111.49GiB >/dev/sdb 98.02GiB >/dev/sdc 98.02GiB >/dev/sdd 98.02GiB >/dev/sdf 98.02GiB >missing 111.49GiB >missing 2.73TiB > > I tried to remove missing, first remove missing only removes the > 2.73TiB missing entry seen above. All the other missing entries > remain. Well off hand it seems like the missing 2.73TB has nothing on it at all, and doesn't need to be counted as missing. The other missing is counted, and should have all of its data replicated elsewhere. But then you're running into csum errors. So something still isn't right, we just don't understand what it is. > I can't "replace", it's not a valid command on my btrfs tools version; > I upgraded btrfs this morning in order to have the btrfs fi usage > command. Btrfs replace has been around for a while. 'man btrfs replace' the command takes the form 'btrfs replace start' plus three required pieces of information. You should be able to infer the missing devid using 'btrfs show' looks like it's 6. > ubuntu@btrfs-recovery:~$ sudo btrfs version > btrfs-progs v4.0 > ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs > ii btrfs-tools4.0-2 > amd64Checksumming Copy on Write Filesystem utilities I would use something newer, but btrfs replace is in 4.0. But I also don't see in this thread what kernel version you're using. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
On Sat, Mar 26, 2016 at 1:30 PM, Kai Krakowwrote: > Well, this time it hit me on the USB backup drive which uses no bcache > and no other fancy options except compress-force=zlib. Apparently, I've > only got a (real) screenshot which I'm going to link here: > > https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 This is a curious screen shot. It's a dracut pre-mount shell, so nothing should be mounted yet. And btrfs check only works on an unmounted file system. And yet the bottom part of the trace shows a Btrfs volume being made read only, as if it was mounted read write and is still mounted. Huh? Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: csum errors in VirtualBox VDI files
Am Wed, 23 Mar 2016 12:16:24 +0800 schrieb Qu Wenruo: > Kai Krakow wrote on 2016/03/22 19:48 +0100: > > Am Tue, 22 Mar 2016 16:47:10 +0800 > > schrieb Qu Wenruo : > > > >> Hi, > >> > >> Kai Krakow wrote on 2016/03/22 09:03 +0100: > [...] > >> > >> When it goes RO, it must have some warning in kernel log. > >> Would you please paste the kernel log? > > > > Apparently, that system does not boot now due to errors in bcache > > b-tree. That being that, it may well be some bcache error and not > > btrfs' fault. Apparently I couldn't catch the output, I've been in a > > hurry. It said "write error" and had some backtrace. I will come to > > this back later. > > > > Let's go to the system I currently care about (that one with the > > always breaking VDI file): > > > [...] > >> Does btrfs check report anything wrong? > > > > After the error occured? > > > > Yes, some text about the extent being compressed and btrfs repair > > doesn't currently handle that case (I tried --repair as I'm having a > > backup). I simply decided not to investigate that further at that > > point but delete and restore the affected file from backup. > > However, this is the message from dmesg (tho, I didn't catch the > > backtrace): > > > > btrfs_run_delayed_refs:2927: errno=-17 Object already exists > > That's nice, at least we have some clue. > > It's almost sure, it's a bug either in btrfs kernel which doesn't > handle delayed refs well(low possibility), or, corrupted fs which > create something kernel can't handle(I bet that's the case). [kernel 4.5.0 gentoo, btrfs-progs 4.4.1] Well, this time it hit me on the USB backup drive which uses no bcache and no other fancy options except compress-force=zlib. Apparently, I've only got a (real) screenshot which I'm going to link here: https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0 The same drive has no problems except "bad metadata crossing stripe boundary" - but a lot of them. This drive was never converted, it was freshly generated several months ago. ---8<--- $ sudo btrfsck /dev/disk/by-label/usb-backup Checking filesystem on /dev/disk/by-label/usb-backup UUID: 1318ec21-c421-4e36-a44a-7be3d41f9c3f checking extents bad metadata [156041216, 156057600) crossing stripe boundary bad metadata [181403648, 181420032) crossing stripe boundary bad metadata [392167424, 392183808) crossing stripe boundary bad metadata [783482880, 783499264) crossing stripe boundary bad metadata [784924672, 784941056) crossing stripe boundary bad metadata [130151612416, 130151628800) crossing stripe boundary bad metadata [162826813440, 162826829824) crossing stripe boundary bad metadata [162927083520, 162927099904) crossing stripe boundary bad metadata [619740659712, 619740676096) crossing stripe boundary bad metadata [619781947392, 619781963776) crossing stripe boundary bad metadata [619795644416, 619795660800) crossing stripe boundary bad metadata [619816091648, 619816108032) crossing stripe boundary bad metadata [620011388928, 620011405312) crossing stripe boundary bad metadata [890992459776, 890992476160) crossing stripe boundary bad metadata [891022737408, 891022753792) crossing stripe boundary bad metadata [891101773824, 891101790208) crossing stripe boundary bad metadata [891301199872, 891301216256) crossing stripe boundary [...] --->8--- My main drive (which this thread was about) has a huge amount of different problems according to btrfsck. Repair doesn't work: it says something about overlapping extents and that it needs a careful thought. I wanted to catch the output when the above problem occured. So I'd like to defer that until later and first fix my backup drive. If I lose my main drive, I simply restore from backup. It is very old anyway (still using 4k node size). Only downside it takes 24+ hours to restore. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
Can confirm that you only get one chance to fix the problem before the array is dead. I know this is quite a rare occurrence for home use but for Data center use this is something that will happen A LOT. This really should be placed in the wiki while we wait for a fix. I can see a lot of sys admins crying over this. -- Stephen Williams steph...@veryfast.biz On Sat, Mar 26, 2016, at 11:51 AM, Patrik Lundquist wrote: > So with the lessons learned: > > # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde > > # mount /dev/sdb /mnt; dmesg | tail > # touch /mnt/test1; sync; btrfs device usage /mnt > > Only raid10 profiles. > > # echo 1 >/sys/block/sde/device/delete > > We lost a disk. > > # touch /mnt/test2; sync; dmesg | tail > > We've got write errors. > > # btrfs device usage /mnt > > No 'single' profiles because we haven't remounted yet. > > # reboot > # wipefs -a /dev/sde; reboot > > # mount -o degraded /dev/sdb /mnt; dmesg | tail > # btrfs device usage /mnt > > Still only raid10 profiles. > > # touch /mnt/test3; sync; btrfs device usage /mnt > > Now we've got 'single' profiles. Replace now or get hosed. > > # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 0 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 0 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > We didn't inherit the /dev/sde error count. Is that a bug? > > # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft > -sconvert=raid10,soft -vf /mnt; dmesg | tail > > # btrfs device usage /mnt > > Back to only 'raid10' profiles. > > # umount /mnt; mount /dev/sdb /mnt; dmesg | tail > > # btrfs device stats /mnt > > [/dev/sde].write_io_errs 11 > [/dev/sde].read_io_errs0 > [/dev/sde].flush_io_errs 2 > [/dev/sde].corruption_errs 0 > [/dev/sde].generation_errs 0 > > The old counters are back. That's good, but wtf? > > # btrfs device stats -z /dev/sde > > Give /dev/sde a clean bill of health. Won't warn when mounting again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 10/27] btrfs: dedupe: Add basic tree structure for on-disk dedupe method
On 03/25/2016 11:11 PM, Chris Mason wrote: On Fri, Mar 25, 2016 at 09:59:39AM +0800, Qu Wenruo wrote: Chris Mason wrote on 2016/03/24 16:58 -0400: Are you storing the entire hash, or just the parts not represented in the key? I'd like to keep the on-disk part as compact as possible for this part. Currently, it's entire hash. More detailed can be checked in another mail. Although it's OK to truncate the last duplicated 8 bytes(64bit) for me, I still quite like current implementation, as one memcpy() is simpler. [ sorry FB makes urls look ugly, so I delete them from replys ;) ] Right, I saw that but wanted to reply to the specific patch. One of the lessons learned from the extent allocation tree and file extent items is they are just too big. Lets save those bytes, it'll add up. OK, I'll reduce the duplicated last 8 bytes. And also, removing the "length" member, as it can be always fetched from dedupe_info->block_size. The length itself is used to verify if we are at the transaction to a new dedupe size, but later we use full sync_fs(), such behavior is not needed any more. + +/* + * Objectid: bytenr + * Type: BTRFS_DEDUPE_BYTENR_ITEM_KEY + * offset: Last 64 bit of the hash + * + * Used for bytenr <-> hash search (for free_extent) + * all its content is hash. + * So no special item struct is needed. + */ + Can we do this instead with a backref from the extent? It'll save us a huge amount of IO as we delete things. That's the original implementation from Liu Bo. The problem is, it changes the data backref rules(originally, only EXTENT_DATA item can cause data backref), and will make dedupe INCOMPACT other than current RO_COMPACT. So I really don't like to change the data backref rule. Let me reread this part, the cost of maintaining the second index is dramatically higher than adding a backref. I do agree that's its nice to be able to delete the dedup trees without impacting the rest, but over the long term I think we'll regret the added balances. Thanks for pointing the problem. Yes, I didn't even consider this fact. But, on the other hand. such remove only happens when we remove the *last* reference of the extent. So, for medium to high dedupe rate case, such routine is not that frequent, which will reduce the impact. (Which is quite different for non-dedupe case) And for low dedupe rate case, why use dedupe anyway. In that case, compression would be much more appropriate if user just wants to reduce disk usage IMO. Another reason I don't want to touch delayed-ref codes is, it already has made us quite pain. We were fighting with delayed-ref from the beginning. The delayed ref, especially the ability to run delayed refs asynchronously, is the biggest problem we met. And that's why we added ability to increase data ref while holding delayed_refs->lock in patch 5, and then uses a long lock-and-try-inc method to search hash in patch 6. Any modification to delayed ref can easily lead to new bugs (Yes, I have proved it several times by myself). So I choose to use current method. If only want to reduce ondisk space, just trashing the hash and making DEDUPE_BYTENR_ITEM have no data would be good enough. As (bytenr, DEDEUPE_BYTENR_ITEM) can locate the hash uniquely. For the second index, the big problem is the cost of the btree operations. We're already pretty expensive in terms of the cost of deleting an extent, with dedup its 2x higher, with dedup + extra index, its 3x higher. The good news is, we only delete hash bytenr and its ref at the last de-reference. And in normal (medium to high dedupe rate) case, it's not a frequent operation IMHO. Thanks, Qu In fact no code really checked the hash for dedupe bytenr item, they all just swap objectid and offset, reset the type and do search for DEDUPE_HASH_ITEM. So it's OK to emit the hash. If we have to go with the second index, I do agree here. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID Assembly with Missing Empty Drive
Chris, > Post 'btrfs fi usage' for the fileystem. That may give some insight > what's expected to be on all the missing drives. Here's the information, I believe that the missing we see in most entries is the failed and absent drive, only the unallocated shows two missing entries, the 2.73 TB is the missing but empty device. I don't know if there's a way to prove it however. ubuntu@btrfs-recovery:~$ sudo btrfs fi usage /mnt Overall: Device size: 15.45TiB Device allocated: 12.12TiB Device unallocated: 3.33TiB Device missing: 5.46TiB Used: 10.93TiB Free (estimated): 2.25TiB(min: 2.25TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB(used: 0.00B) Data,RAID1: Size:6.04TiB, Used:5.46TiB /dev/sda 2.61TiB /dev/sdb 1.71TiB /dev/sdc 1.72TiB /dev/sdd 1.72TiB /dev/sdf 1.71TiB missing 2.61TiB Metadata,RAID1: Size:14.00GiB, Used:11.59GiB /dev/sda 8.00GiB /dev/sdb 2.00GiB /dev/sdc 3.00GiB /dev/sdd 4.00GiB /dev/sdf 3.00GiB missing 8.00GiB System,RAID1: Size:32.00MiB, Used:880.00KiB /dev/sda 32.00MiB missing 32.00MiB Unallocated: /dev/sda 111.49GiB /dev/sdb 98.02GiB /dev/sdc 98.02GiB /dev/sdd 98.02GiB /dev/sdf 98.02GiB missing 111.49GiB missing 2.73TiB I tried to remove missing, first remove missing only removes the 2.73TiB missing entry seen above. All the other missing entries remain. I can't "replace", it's not a valid command on my btrfs tools version; I upgraded btrfs this morning in order to have the btrfs fi usage command. ubuntu@btrfs-recovery:~$ sudo btrfs version btrfs-progs v4.0 ubuntu@btrfs-recovery:~$ dpkg -l | grep btrfs ii btrfs-tools4.0-2 amd64Checksumming Copy on Write Filesystem utilities For those interested in my recovery techniques, here's how I rebuild the overlay loop devices, be careful, these scripts make certain assumptions that may not be accurate for your system: On Client: sudo umount /mnt sudo /etc/init.d/open-iscsi stop On Server: /etc/init.d/iscsitarget stop loop_devices=$(losetup -a | grep overlay | tr ":" " " | awk ' { printf $1 " " } END { print "" } ') for fn in /dev/mapper/sd??; do dmsetup remove $fn; done for ln in $loop_devices; do losetup -d $ln; done cd /home/ubuntu rm sd*overlay for device in sda3 sdb3 sdc1 sdd1 sde1 sdf1; do dev=/dev/$device ovl=/home/ubuntu/$device-overlay truncate -s512M $ovl newdevname=$device size=$(blockdev --getsize "$dev") loop=$(losetup -f --show -- "$ovl") echo Setting up loop for $dev using overlay $ovl on loop $loop for target $newdevname printf '%s\n' "0 $size snapshot $dev $loop P 8" | dmsetup create "$newdevname" done Start the targets /etc/init.d/iscsitarget start On Client: sudo /etc/init.d/open-iscsi start -JohnF -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Raid Bug
So with the lessons learned: # mkfs.btrfs -m raid10 -d raid10 /dev/sdb /dev/sdc /dev/sdd /dev/sde # mount /dev/sdb /mnt; dmesg | tail # touch /mnt/test1; sync; btrfs device usage /mnt Only raid10 profiles. # echo 1 >/sys/block/sde/device/delete We lost a disk. # touch /mnt/test2; sync; dmesg | tail We've got write errors. # btrfs device usage /mnt No 'single' profiles because we haven't remounted yet. # reboot # wipefs -a /dev/sde; reboot # mount -o degraded /dev/sdb /mnt; dmesg | tail # btrfs device usage /mnt Still only raid10 profiles. # touch /mnt/test3; sync; btrfs device usage /mnt Now we've got 'single' profiles. Replace now or get hosed. # btrfs replace start -B 4 /dev/sde /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 0 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 0 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 We didn't inherit the /dev/sde error count. Is that a bug? # btrfs balance start -dconvert=raid10,soft -mconvert=raid10,soft -sconvert=raid10,soft -vf /mnt; dmesg | tail # btrfs device usage /mnt Back to only 'raid10' profiles. # umount /mnt; mount /dev/sdb /mnt; dmesg | tail # btrfs device stats /mnt [/dev/sde].write_io_errs 11 [/dev/sde].read_io_errs0 [/dev/sde].flush_io_errs 2 [/dev/sde].corruption_errs 0 [/dev/sde].generation_errs 0 The old counters are back. That's good, but wtf? # btrfs device stats -z /dev/sde Give /dev/sde a clean bill of health. Won't warn when mounting again. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html