Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs
The trouble is yet unresolved, symptoms are as they were, but I've diagnosed a step further. Maybe you can help me advance the diagnosis or better pose my question among debian experts, related to adjusting the building of initrd. On Thu, 18 Oct 2018 00:08:08 -0700 Qu Wenruo wrote > > > Still looks like a initramfs problem [rather] than btrfs problem. Yes, but linux-btrfs list still knows better than I how best to proceed, mainly how to distill the trouble description using proper terms, also lending broader understanding of what named modules serve what device activate or storage-access purpose. > > > In the busybox environment, have you tried listing /dev to see if that > > external device is found? External usb attached drives are definitely not found by a newly launched kernel, and particulars of why are still not self evident. Boot loader grub2 all along still has no trouble accessing -- presumably it's not able to leverage raid1 redundancy in btrfs but does have access to the ext mirror device and takes notice in passing of matching UUID's. > By default, btrfs must see *all* devices to mount RAID1/10/5/6/0. Unless > you're using "degraded" mount option. > You could argue it's a bad decision, but still you have the choice. Yes did manage to mount it degraded, just to see that content demirrored is also unclobbered, however can't finish the job by such means; hopefully in the process no metadata or other junk was written unintentionally (forgot to mount readonly - degraded) The task now seems to be finishing resolving which modules can bring in the rest of the critical infrastructure to allow access to the drives that had been no customized bother to bring online, prior to rootfs raid1 conversion. A recently found item of great interest is module "autofs4" which has userland friends such as systemd; it's present in cindy(LMDE3) which boots fine in spite of deriving from stretch, and was absent in stretch & buster which no longer boot. > > When manually trying to mount in busybox, it gives a similar error about > > missing the same external device, by its UUID_SUB > Then it's still something wrong about the initramfs. From your description, > it looks pretty like the lack of external disk driver is the root cause. Agreed *something* missing in initrd and-or module deployment is the cause of failing access to ext usb3 drive enclosure devices, but am still tracking down which other missing blobs may be of concern; since busybox won't allow "lsblk" "lshw" or "lsusb" -- only "blkid" or "ls /dev" work to detect devices -- my confidence and expediency in tracking is reduced; haven't yet happened upon the debian feature that lets more nonkernel programs and libraries--not just modules-- be built into initrd. As an aside probably not germane, the grub.cfg "linux" line to load the kernel in some cases has "ro" readonly option and others not; what's the difference signify? How to make informed choice whether to use ro? Why mount a real disk rootfs without write access, before corruption's detected? Would that not potentially cripple some vital features such as logging? So far it's clear that uas, usb_storage, & autofs4 may be built into initrd and then load ok, but they're not enough to restore normal systemd launch. Setting "MODULES=dep" in /etc/initramfs-tools/conf.d/driver-policy seems not smart enough for building in all necessary objects. Ideas welcome. regards- TP
Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs
On 2018/10/18 下午2:16, Tony Prokott wrote: > On Wed, 17 Oct 2018 17:57:25 -0700 Qu Wenruo > wrote > ... > > > But after chrooting to update-initramfs and cataloging resulting image > content, usb_storage and uas were present under /lib/modules/xxx already, and > failing systems still just busybox without a real rootfs rather than launch > systemd; even tried kernel option "rootwait" which had no effect on access to > ext storage; udev still seems not to have noticed the ext drives once busybox > had control. > > > > Still looks like a initramfs problem other than btrfs problem. > > > > In the busybox environment, have you tried listing /dev to see if that > > external device is found? > > agreed that initramfs smells bad, but it hadn't been a problem until btrfs > mounts (external-raid) had to rely on the usb channel; By default, btrfs must see *all* devices to mount RAID1/10/5/6/0. Unless you're using "degraded" mount option. You could argue it's a bad decision, but still you have the choice. > in busybox, ext drives/partitions are all missing from /dev; can't tell why > so, ahci and usb modules are loaded afaict > > > Since you have a busybox environment, have you checked if "btrfs" command > lives in the initramfs? > > yes btrfs command works from busybox > > > IIRC at least you need the following things/abilities to boot: > > > > 1) usb and sata drivers > >Means you could see both devices in the busybox environment under /dev > > > > 2) "Btrfs" command > >Mostly for scan > > Then you could try the following commands under busybox environment: > > # btrfs device scan > > # mount > > "btrfs dev scan" runs but doesn't indicate recognizing any; since raid1 > conversion, ext drives are required for any btrfs mounts to be seen whole. > When manually trying to mount in busybox, it gives a similar error about > missing external device by UUID_SUB Then it's still something wrong about the initramfs. From your description, it looks pretty like the lack of external disk driver is the root cause. Could you try "btrfs fi show" to see which devices is missing? I strongly suspect it's the external device, if so, then it is the mkinitramfs driver causing the problem. > > > If it works, it may mean you're missing "btrfs device scan" during boot > > so kernel can't see all RAID1 disks for btrfs and failed to boot. > > > > Please refer to your distribution initramfs creation tool to see how to > > add that scan. (Some distro has special hook for btrfs to handle such > case). > > may have to tweak the /etc/initramfs-tools/initramfs.conf or modules list; > MODULES=dep setting might act better than MODULES=most > will look into this further to see about contrasting block device modules > between cindy and the others Sorry I can't help much in this case, as I'm not debian user. But in Archlinux, it's pretty easy by just adding a 'block' hook. And in that case, Archlinux will install all kernel modules under 'drivers/usb/storage' directory. For details, you could refer to this file: https://git.archlinux.org/mkinitcpio.git/tree/install/block Thanks, Qu > > appreciate the timely response- > TP > signature.asc Description: OpenPGP digital signature
Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs
On Wed, 17 Oct 2018 17:57:25 -0700 Qu Wenruo wrote ... > > But after chrooting to update-initramfs and cataloging resulting image > > content, usb_storage and uas were present under /lib/modules/xxx already, > > and failing systems still just busybox without a real rootfs rather than > > launch systemd; even tried kernel option "rootwait" which had no effect on > > access to ext storage; udev still seems not to have noticed the ext drives > > once busybox had control. > > Still looks like a initramfs problem other than btrfs problem. > > In the busybox environment, have you tried listing /dev to see if that > external device is found? agreed that initramfs smells bad, but it hadn't been a problem until btrfs mounts (external-raid) had to rely on the usb channel; in busybox, ext drives/partitions are all missing from /dev; can't tell why so, ahci and usb modules are loaded afaict > Since you have a busybox environment, have you checked if "btrfs" command > lives in the initramfs? yes btrfs command works from busybox > IIRC at least you need the following things/abilities to boot: > > 1) usb and sata drivers >Means you could see both devices in the busybox environment under /dev > > 2) "Btrfs" command >Mostly for scan > Then you could try the following commands under busybox environment: > # btrfs device scan > # mount "btrfs dev scan" runs but doesn't indicate recognizing any; since raid1 conversion, ext drives are required for any btrfs mounts to be seen whole. When manually trying to mount in busybox, it gives a similar error about missing external device by UUID_SUB > If it works, it may mean you're missing "btrfs device scan" during boot > so kernel can't see all RAID1 disks for btrfs and failed to boot. > > Please refer to your distribution initramfs creation tool to see how to > add that scan. (Some distro has special hook for btrfs to handle such case). may have to tweak the /etc/initramfs-tools/initramfs.conf or modules list; MODULES=dep setting might act better than MODULES=most will look into this further to see about contrasting block device modules between cindy and the others appreciate the timely response- TP
Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs
On 2018/10/18 上午12:38, Tony Prokott wrote: > Good day. My technical trouble seems to be beyond the scope of active helpers > on debian's irc support channel. Reasonable supposition that it's quite > particular to the development stage of btrfs infrastructure on 4.17.xxx > backport kernels and userland tools available on debian 9.5 stretch as well > as buster, the testing suite to be released in the next several months as > 10.0 stable. > > > / # uname -a; lsb_release -a > > Linux localhost 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 > (2018-08-27) x86_64 GNU/Linux > > Distributor ID: LinuxMint > > Description: LMDE 3 Cindy > > Release: 3 > > Codename: cindy > > > > / # btrfs --version > > btrfs-progs v4.7.3 > > > > / # btrfs fi sh > > Label: 'sys' uuid: [snip] > > Total devices 2 FS bytes used 24.07GiB > > devid1 size 401.59GiB used 26.03GiB path /dev/sda2 > > devid2 size 401.76GiB used 26.03GiB path /dev/sdc1 > > > > / # btrfs fi df / > > Data, RAID1: total=24.00GiB, used=23.27GiB > > System, RAID1: total=32.00MiB, used=16.00KiB > > Metadata, RAID1: total=2.00GiB, used=820.00MiB > > GlobalReserve, single: total=69.17MiB, used=0.00B > > > > / # btrfs su li -ta / > > ID gen top level path > > -- --- - > > 260115103 5 /d9 > > 261115103 5 /d10 > > 262123876 5 /home > > 263115148 261 /d10/@ > > 264115136 261 /d10/@home > > 443123874 447 /md3/@ > > 444123876 447 /md3/@home > > 447115103 5 /md3 > > 451115144 260 /d9/@ > > 452115136 260 /d9/@home > > Providing no dmesg content so far, as it doesn't bear on the kind of > difficulty in question. My system requires expert help now to restore > bootability to 2 of its OS installations; it has a btrfs root file system in > subvolumes for stretch, buster, and LMDE3(cindy) which derives directly from > stretch and so has most core elements if not cfg defaults in common; even > kernel versions are alike, besides buster. subvolid=262 is a /home fs shared > among linux distros; 451, 263, and 443 are rootfs for stretch, buster and > cindy respectively. > > All 3 installations had been booting and running fine when data block group > profile was "single" on an internal sata HDD /dev/sda2; then an external usb3 > drive enclosure's sata HDD partition /dev/sdc1, also of size ~0.4TiB, was > added and balanced as btrfs "raid1"; raid conversion did not damage subvolume > content or filesystem integrity afaict, but rather rendered stretch and > buster unbootable (more to follow), whereas cindy carried on without hiccup. > > At first it seemed as though the initrd's might be missing a module or so, to > allow access to external drives -- i.e. grub starts the unbootable > kernel/initrd but drops to busybox prompt right away without starting > external drives, referring to allegedly "missing" btrfs device's UUID_SUB. > > But after chrooting to update-initramfs and cataloging resulting image > content, usb_storage and uas were present under /lib/modules/xxx already, and > failing systems still just busybox without a real rootfs rather than launch > systemd; even tried kernel option "rootwait" which had no effect on access to > ext storage; udev still seems not to have noticed the ext drives once busybox > had control. Still looks like a initramfs problem other than btrfs problem. In the busybox environment, have you tried listing /dev to see if that external device is found? > > I could list all initrd modules present in cindy & absent for others, but > need better knowledge than my reasonable guesses of what's required to make > btrfs volume companion devices cooperate at boot time, as initrd transitions > to steady state rootfs. Since you have a busybox environment, have you checked if "btrfs" command lives in the initramfs? IIRC at least you need the following things/abilities to boot: 1) usb and sata drivers Means you could see both devices in the busybox environment under /dev 2) "Btrfs" command Mostly for scan Then you could try the following commands under busybox environment: # btrfs device scan # mount If it works, it may mean you're missing "btrfs device scan" during boot so kernel can't see all RAID1 disks for btrfs and failed to boot. Please refer to your distribution initramfs creation tool to see how to add that scan. (Some distro has special hook for btrfs to handle such case). Thanks, Qu > > What would be a more practical diagnostic? Could stretch & buster initrd's > somehow be failing to do a btrfs device scan at the proper moment? Not so > interested in giving up on btrfs software raid so early in the game. > > thanks in advance- > TP [not a list subscriber] > > signature.asc Description:
Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs
Good day. My technical trouble seems to be beyond the scope of active helpers on debian's irc support channel. Reasonable supposition that it's quite particular to the development stage of btrfs infrastructure on 4.17.xxx backport kernels and userland tools available on debian 9.5 stretch as well as buster, the testing suite to be released in the next several months as 10.0 stable. > / # uname -a; lsb_release -a > Linux localhost 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 > (2018-08-27) x86_64 GNU/Linux > Distributor ID: LinuxMint > Description: LMDE 3 Cindy > Release: 3 > Codename: cindy > > / # btrfs --version > btrfs-progs v4.7.3 > > / # btrfs fi sh > Label: 'sys' uuid: [snip] > Total devices 2 FS bytes used 24.07GiB > devid1 size 401.59GiB used 26.03GiB path /dev/sda2 > devid2 size 401.76GiB used 26.03GiB path /dev/sdc1 > > / # btrfs fi df / > Data, RAID1: total=24.00GiB, used=23.27GiB > System, RAID1: total=32.00MiB, used=16.00KiB > Metadata, RAID1: total=2.00GiB, used=820.00MiB > GlobalReserve, single: total=69.17MiB, used=0.00B > > / # btrfs su li -ta / > ID gen top level path > -- --- - > 260 115103 5 /d9 > 261 115103 5 /d10 > 262 123876 5 /home > 263 115148 261 /d10/@ > 264 115136 261 /d10/@home > 443 123874 447 /md3/@ > 444 123876 447 /md3/@home > 447 115103 5 /md3 > 451 115144 260 /d9/@ > 452 115136 260 /d9/@home Providing no dmesg content so far, as it doesn't bear on the kind of difficulty in question. My system requires expert help now to restore bootability to 2 of its OS installations; it has a btrfs root file system in subvolumes for stretch, buster, and LMDE3(cindy) which derives directly from stretch and so has most core elements if not cfg defaults in common; even kernel versions are alike, besides buster. subvolid=262 is a /home fs shared among linux distros; 451, 263, and 443 are rootfs for stretch, buster and cindy respectively. All 3 installations had been booting and running fine when data block group profile was "single" on an internal sata HDD /dev/sda2; then an external usb3 drive enclosure's sata HDD partition /dev/sdc1, also of size ~0.4TiB, was added and balanced as btrfs "raid1"; raid conversion did not damage subvolume content or filesystem integrity afaict, but rather rendered stretch and buster unbootable (more to follow), whereas cindy carried on without hiccup. At first it seemed as though the initrd's might be missing a module or so, to allow access to external drives -- i.e. grub starts the unbootable kernel/initrd but drops to busybox prompt right away without starting external drives, referring to allegedly "missing" btrfs device's UUID_SUB. But after chrooting to update-initramfs and cataloging resulting image content, usb_storage and uas were present under /lib/modules/xxx already, and failing systems still just busybox without a real rootfs rather than launch systemd; even tried kernel option "rootwait" which had no effect on access to ext storage; udev still seems not to have noticed the ext drives once busybox had control. I could list all initrd modules present in cindy & absent for others, but need better knowledge than my reasonable guesses of what's required to make btrfs volume companion devices cooperate at boot time, as initrd transitions to steady state rootfs. What would be a more practical diagnostic? Could stretch & buster initrd's somehow be failing to do a btrfs device scan at the proper moment? Not so interested in giving up on btrfs software raid so early in the game. thanks in advance- TP [not a list subscriber]
RE: Unable to boot
Hello Chris, Thanks for your response. I tried the steps you gave me, but still no luck. Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the following error: [root@localhost liveuser]# mount /dev/md127 /tmp/hdd mount: wrong fs type, bad option, bad superblock on /dev/md127, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. For the simple mount command the dmesg is : http://pastebin.com/TiPR7U2j For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE Thank you, George Pochiscan Support Engineer Mobile: +40731831489 Phone: +40213225757 Fax: +40213222522 george.pochis...@sphs.ro www.spearheadsystems.ro 64 I.P. Pavlov Street, 1st District Bucharest, Romania IT innovation at its finest. From: Chris Murphy li...@colorremedies.com Sent: Friday, May 2, 2014 22:41 To: George Pochiscan Cc: linux-btrfs@vger.kernel.org Subject: Re: Unable to boot On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote: Hello, I have a problem with a server with Fedora 20 and BTRFS. This server had frequent hard restarts before the filesystem got corrupt and we are unable to boot it. We have a HP Proliant server with 4 disks @1TB each and Software RAID 5. It had Debian installed (i don't know the version) and right now i'm using fedora 20 live to try to rescue the system. Fedora 20 Live has kernel 3.11.10 and btrfs-progs 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without knowing exactly what the problem and solution is, is to try a much newer kernel and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but don't always succeed so you can go here to find the latest of everything: https://apps.fedoraproject.org/releng-dash/ Find Fedora Live Desktop or Live KDE and click on details. Click the green link under descendants livecd. And then under Output listing you'll see an ISO you can download, the one there right now is Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes daily. You might want to boot with kernel parameter slub_debug=- (that's a minus symbol) because all but Monday built Rawhide kernels have a bunch of kernel debug options enabled which makes it quite slow. When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output is: Checking filesystem on /dev/md127 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c checking extents checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 Csum didn't match checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match - extent buffer leak: start 1006686208 len 4096 found 32039247396 bytes used err is -22 total csum bytes: 41608612 total tree bytes: 388857856 total fs tree bytes: 310124544 total extent tree bytes: 22016000 btree space waste bytes: 126431234 file data blocks allocated: 47227326464 referenced 42595635200 Btrfs v3.12 I suggest a recent Rawhide build. And I suggest just trying to mount the file system normally first, and post anything that appears in dmesg. And if the mount fails, then try mount option -o recovery, and also post any dmesg messages from that too, and note whether or not it mounts. Finally if that doesn't work either then see if -o ro,recovery works and what kernel messages you get. When i attempt to repair i have the following error: - Backref 1005817856 parent 5 root 5 not found in extent tree backpointer mismatch on [1005817856 4096] owner ref check failed [1006686208 4096] repaired damaged extent references Failed to find [1000525824, 168, 4096] btrfs unable to find ref byte nr 1000525824 parent 0 root 1 owner 1 offset 0 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed. Aborted You really shouldn't use --repair right off the bat, it's not a recommended early step, you should try normal mounting with newer kernels first, then recovery mount options first. Sometimes the repair option makes things worse. I'm not sure what its safety status is as of v3.14. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ Fedora includes btrfs-zero-log already so depending on the kernel messages you might try that before
Re: Unable to boot
On Mon, May 05, 2014 at 03:04:05PM +, George Pochiscan wrote: Hello Chris, Thanks for your response. I tried the steps you gave me, but still no luck. Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the following error: [root@localhost liveuser]# mount /dev/md127 /tmp/hdd mount: wrong fs type, bad option, bad superblock on /dev/md127, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. For the simple mount command the dmesg is : http://pastebin.com/TiPR7U2j For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE This looks like btrfs-zero-log may help you, as it's having trouble recovering the log tree. Hugo. Thank you, George Pochiscan Support Engineer Mobile: +40731831489 Phone: +40213225757 Fax: +40213222522 george.pochis...@sphs.ro www.spearheadsystems.ro 64 I.P. Pavlov Street, 1st District Bucharest, Romania IT innovation at its finest. From: Chris Murphy li...@colorremedies.com Sent: Friday, May 2, 2014 22:41 To: George Pochiscan Cc: linux-btrfs@vger.kernel.org Subject: Re: Unable to boot On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote: Hello, I have a problem with a server with Fedora 20 and BTRFS. This server had frequent hard restarts before the filesystem got corrupt and we are unable to boot it. We have a HP Proliant server with 4 disks @1TB each and Software RAID 5. It had Debian installed (i don't know the version) and right now i'm using fedora 20 live to try to rescue the system. Fedora 20 Live has kernel 3.11.10 and btrfs-progs 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without knowing exactly what the problem and solution is, is to try a much newer kernel and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but don't always succeed so you can go here to find the latest of everything: https://apps.fedoraproject.org/releng-dash/ Find Fedora Live Desktop or Live KDE and click on details. Click the green link under descendants livecd. And then under Output listing you'll see an ISO you can download, the one there right now is Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes daily. You might want to boot with kernel parameter slub_debug=- (that's a minus symbol) because all but Monday built Rawhide kernels have a bunch of kernel debug options enabled which makes it quite slow. When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output is: Checking filesystem on /dev/md127 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c checking extents checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 Csum didn't match checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match - extent buffer leak: start 1006686208 len 4096 found 32039247396 bytes used err is -22 total csum bytes: 41608612 total tree bytes: 388857856 total fs tree bytes: 310124544 total extent tree bytes: 22016000 btree space waste bytes: 126431234 file data blocks allocated: 47227326464 referenced 42595635200 Btrfs v3.12 I suggest a recent Rawhide build. And I suggest just trying to mount the file system normally first, and post anything that appears in dmesg. And if the mount fails, then try mount option -o recovery, and also post any dmesg messages from that too, and note whether or not it mounts. Finally if that doesn't work either then see if -o ro,recovery works and what kernel messages you get. When i attempt to repair i have the following error: - Backref 1005817856 parent 5 root 5 not found in extent tree backpointer mismatch on [1005817856 4096] owner ref check failed [1006686208 4096] repaired damaged extent references Failed to find [1000525824, 168, 4096] btrfs unable to find ref byte nr 1000525824 parent 0 root 1 owner 1 offset 0 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed. Aborted You really shouldn't use --repair right off the bat, it's not a recommended early step, you should try normal mounting with newer
RE: Unable to boot
Hello Hugo, Running btrfs-zero-log /dev/md127 i have the following error: checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match btrfs-zero-log: extent-tree.c:2717: alloc_reserved_tree_block: Assertion `!(ret)' failed. Aborted (core dumped) Full output : http://pastebin.com/3h5zVuWg Full dmesg : http://pastebin.com/r9Fk8J8F Thank you, George Pochiscan Support Engineer Mobile: +40731831489 Phone: +40213225757 Fax: +40213222522 george.pochis...@sphs.ro www.spearheadsystems.ro 64 I.P. Pavlov Street, 1st District Bucharest, Romania IT innovation at its finest. From: Hugo Mills h...@carfax.org.uk Sent: Monday, May 5, 2014 18:07 To: George Pochiscan Cc: Chris Murphy; linux-btrfs@vger.kernel.org Subject: Re: Unable to boot On Mon, May 05, 2014 at 03:04:05PM +, George Pochiscan wrote: Hello Chris, Thanks for your response. I tried the steps you gave me, but still no luck. Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the following error: [root@localhost liveuser]# mount /dev/md127 /tmp/hdd mount: wrong fs type, bad option, bad superblock on /dev/md127, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. For the simple mount command the dmesg is : http://pastebin.com/TiPR7U2j For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE This looks like btrfs-zero-log may help you, as it's having trouble recovering the log tree. Hugo. Thank you, George Pochiscan Support Engineer Mobile: +40731831489 Phone: +40213225757 Fax: +40213222522 george.pochis...@sphs.ro www.spearheadsystems.ro 64 I.P. Pavlov Street, 1st District Bucharest, Romania IT innovation at its finest. From: Chris Murphy li...@colorremedies.com Sent: Friday, May 2, 2014 22:41 To: George Pochiscan Cc: linux-btrfs@vger.kernel.org Subject: Re: Unable to boot On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote: Hello, I have a problem with a server with Fedora 20 and BTRFS. This server had frequent hard restarts before the filesystem got corrupt and we are unable to boot it. We have a HP Proliant server with 4 disks @1TB each and Software RAID 5. It had Debian installed (i don't know the version) and right now i'm using fedora 20 live to try to rescue the system. Fedora 20 Live has kernel 3.11.10 and btrfs-progs 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without knowing exactly what the problem and solution is, is to try a much newer kernel and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but don't always succeed so you can go here to find the latest of everything: https://apps.fedoraproject.org/releng-dash/ Find Fedora Live Desktop or Live KDE and click on details. Click the green link under descendants livecd. And then under Output listing you'll see an ISO you can download, the one there right now is Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes daily. You might want to boot with kernel parameter slub_debug=- (that's a minus symbol) because all but Monday built Rawhide kernels have a bunch of kernel debug options enabled which makes it quite slow. When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output is: Checking filesystem on /dev/md127 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c checking extents checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 Csum didn't match checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match - extent buffer leak: start 1006686208 len 4096 found 32039247396 bytes used err is -22 total csum bytes: 41608612 total tree bytes: 388857856 total fs tree bytes: 310124544 total extent tree bytes: 22016000 btree space waste bytes: 126431234 file data blocks allocated: 47227326464 referenced 42595635200 Btrfs v3.12 I suggest a recent Rawhide build. And I suggest just trying to mount the file system normally first, and post anything that appears in dmesg. And if the mount fails, then try mount option -o recovery, and also post any dmesg messages from that too, and note whether or not it mounts. Finally
Re: Unable to boot
On May 5, 2014, at 9:11 AM, George Pochiscan george.pochis...@sphs.ro wrote: Hello Hugo, Running btrfs-zero-log /dev/md127 i have the following error: checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match btrfs-zero-log: extent-tree.c:2717: alloc_reserved_tree_block: Assertion `!(ret)' failed. Aborted (core dumped) Full output : http://pastebin.com/3h5zVuWg Full dmesg : http://pastebin.com/r9Fk8J8F OK. Well I'm out of ideas at this point. I'm not a developer, and don't know what the problem is or how to fix it. So my advice at this point will be like throwing spaghetti at a wall. (There is a lot of spaghetti available to throw at the wall when it comes to fixing btrfs if the normal mount code doesn't fix it automatically.) Baring better advice from pretty much anyone else: - First, btrfs-image -c9 -t4 /dev/md127 /path/for/large/file The resulting file will be somewhere between 50% to 100% of the size reported for metadata by btrfs filesystem df. put this somewhere in case a developer wants to look at it in the current state. - Then, btrfs check --init-csum-tree /dev/md127 will remove all checksums from the file system and should remove the csum errors preventing mount. The problem with this is it removes all checksums, so every read is reported as a mismatch but still permits the reads to proceed. As a result it's just a way to mount the file system, make a backup, and then created a new file system to restore to. So it's a recovery operation rather than a repair operation. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unable to boot
Hello, I have a problem with a server with Fedora 20 and BTRFS. This server had frequent hard restarts before the filesystem got corrupt and we are unable to boot it. We have a HP Proliant server with 4 disks @1TB each and Software RAID 5. It had Debian installed (i don't know the version) and right now i'm using fedora 20 live to try to rescue the system. When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output is: Checking filesystem on /dev/md127 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c checking extents checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 Csum didn't match checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match - extent buffer leak: start 1006686208 len 4096 found 32039247396 bytes used err is -22 total csum bytes: 41608612 total tree bytes: 388857856 total fs tree bytes: 310124544 total extent tree bytes: 22016000 btree space waste bytes: 126431234 file data blocks allocated: 47227326464 referenced 42595635200 Btrfs v3.12 When i attempt to repair i have the following error: - Backref 1005817856 parent 5 root 5 not found in extent tree backpointer mismatch on [1005817856 4096] owner ref check failed [1006686208 4096] repaired damaged extent references Failed to find [1000525824, 168, 4096] btrfs unable to find ref byte nr 1000525824 parent 0 root 1 owner 1 offset 0 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed. Aborted I have installed btrfs version 3.12 Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost liveuser]# btrfs fi show Label: none uuid: e068faf0-2c16-4566-9093-e6d1e21a5e3c Total devices 1 FS bytes used 40.04GiB devid1 size 1.82TiB used 43.04GiB path /dev/md127 Btrfs v3.12 Please advice. Thank you, George Pochiscan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot
On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote: Hello, I have a problem with a server with Fedora 20 and BTRFS. This server had frequent hard restarts before the filesystem got corrupt and we are unable to boot it. We have a HP Proliant server with 4 disks @1TB each and Software RAID 5. It had Debian installed (i don't know the version) and right now i'm using fedora 20 live to try to rescue the system. Fedora 20 Live has kernel 3.11.10 and btrfs-progs 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without knowing exactly what the problem and solution is, is to try a much newer kernel and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but don't always succeed so you can go here to find the latest of everything: https://apps.fedoraproject.org/releng-dash/ Find Fedora Live Desktop or Live KDE and click on details. Click the green link under descendants livecd. And then under Output listing you'll see an ISO you can download, the one there right now is Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes daily. You might want to boot with kernel parameter slub_debug=- (that's a minus symbol) because all but Monday built Rawhide kernels have a bunch of kernel debug options enabled which makes it quite slow. When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output is: Checking filesystem on /dev/md127 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c checking extents checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11 Csum didn't match checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9 Csum didn't match - extent buffer leak: start 1006686208 len 4096 found 32039247396 bytes used err is -22 total csum bytes: 41608612 total tree bytes: 388857856 total fs tree bytes: 310124544 total extent tree bytes: 22016000 btree space waste bytes: 126431234 file data blocks allocated: 47227326464 referenced 42595635200 Btrfs v3.12 I suggest a recent Rawhide build. And I suggest just trying to mount the file system normally first, and post anything that appears in dmesg. And if the mount fails, then try mount option -o recovery, and also post any dmesg messages from that too, and note whether or not it mounts. Finally if that doesn't work either then see if -o ro,recovery works and what kernel messages you get. When i attempt to repair i have the following error: - Backref 1005817856 parent 5 root 5 not found in extent tree backpointer mismatch on [1005817856 4096] owner ref check failed [1006686208 4096] repaired damaged extent references Failed to find [1000525824, 168, 4096] btrfs unable to find ref byte nr 1000525824 parent 0 root 1 owner 1 offset 0 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed. Aborted You really shouldn't use --repair right off the bat, it's not a recommended early step, you should try normal mounting with newer kernels first, then recovery mount options first. Sometimes the repair option makes things worse. I'm not sure what its safety status is as of v3.14. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ Fedora includes btrfs-zero-log already so depending on the kernel messages you might try that before a btrfsck --repair. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unable to boot btrfs filesystem, and btrfsck aborts
My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. Thanks, Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. Thanks, Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. If you can bringt it back in ro mode you can at least copy your data out of it if all else fails... I'm not a dev, just a random guy having an interest in btrfs, so if you don't have a backup and aren't able to create a dd copy of it right now you might wanna wait for a reply of someone who actually knows the code... Good luck -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote: On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. I think the bug happens during log recovery, so btrfs-zero-log might get it mountable again, with the caveat of losing the most recently fsynced changes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
If you are going to use btrfs-zero-log please create a btrfs-image first that you can then upload to a bug report so that this can be fixed. # btrfs-image -c 9 -t 8 /dev/yourbtrfs /tmp/fs_image On Mon, Mar 11, 2013 at 11:53 PM, Jan Steffens jan.steff...@gmail.com wrote: On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote: On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. If you can make a complete image backup of the drive before trying any things to bring it back. Try mounting with -o nospace_cache, also try -o ro and -o recovery as well as -o recovery,ro. I think the bug happens during log recovery, so btrfs-zero-log might get it mountable again, with the caveat of losing the most recently fsynced changes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to boot btrfs filesystem, and btrfsck aborts
On Mon, Mar 11, 2013 at 04:44:58PM -0600, Matthew Booth wrote: My laptop crashed hard earlier today. It reset immediately to a black screen followed by the BIOS. I have no idea why. However, it now fails to boot. I took a picture of the kernel panic that results from trying to mount the root filesystem: https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi To make things worse, btrfsck aborts with a double free, without fixing it. I took a picture of that, too: https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT As the kernel panic mentions btrfs_remove_free_space, I also tried mounting with clear_cache. Unfortunately it didn't dislodge anything. This is on a fully updated Fedora 18 system. I would really like to get this data back. If anybody could offer a suggestion I'd be very grateful. This is fixed in 3.9, I'll send those patches back to -stable, sorry I should have done that before now. If you can't get a 3.9 kernel to boot then just use btrfs-zero-log and you'll be good to go. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html