Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-23 Thread Tony Prokott
The trouble is yet unresolved, symptoms are as they were, but I've diagnosed a 
step further. Maybe you can help me advance the diagnosis or better pose my 
question among debian experts, related to adjusting the building of initrd.

  On Thu, 18 Oct 2018 00:08:08 -0700 Qu Wenruo  
wrote  
 > >  > Still looks like a initramfs problem [rather] than btrfs problem.  
Yes, but linux-btrfs list still knows better than I how best to proceed, mainly 
how to distill the trouble description using proper terms, also lending broader 
understanding of what named modules serve what device activate or 
storage-access purpose.

 > >  > In the busybox environment, have you tried listing /dev to see if that 
 > > external device is found?  
External usb attached drives are definitely not found by a newly launched 
kernel, and particulars of why are still not self evident. Boot loader grub2 
all along still has no trouble accessing -- presumably it's not able to 
leverage raid1 redundancy in btrfs but does have access to the ext mirror 
device and takes notice in passing of matching UUID's.

 > By default, btrfs must see *all* devices to mount RAID1/10/5/6/0. Unless 
 > you're using "degraded" mount option. 
 > You could argue it's a bad decision, but still you have the choice. 
Yes did manage to mount it degraded, just to see that content demirrored is 
also unclobbered, however can't finish the job by such means; hopefully in the 
process no metadata or other junk was written unintentionally (forgot to mount 
readonly - degraded)

The task now seems to be finishing resolving which modules can bring in the 
rest of the critical infrastructure to allow access to the drives that had been 
no customized bother to bring online, prior to rootfs raid1 conversion. A 
recently found item of great interest is module "autofs4" which has userland 
friends such as systemd; it's present in cindy(LMDE3) which boots fine in spite 
of deriving from stretch, and was absent in stretch & buster which no longer 
boot.

 > > When manually trying to mount in busybox, it gives a similar error about 
 > > missing the same external device, by its UUID_SUB 
 > Then it's still something wrong about the initramfs. From your description, 
 > it looks pretty like the lack of external disk driver is the root cause. 
Agreed *something* missing in initrd and-or module deployment is the cause of 
failing access to ext usb3 drive enclosure devices, but am still tracking down 
which other missing blobs may be of concern; since busybox won't allow "lsblk" 
"lshw" or "lsusb" -- only "blkid" or "ls /dev" work to detect devices -- my 
confidence and expediency in tracking is reduced; haven't yet happened upon the 
debian feature that lets more nonkernel programs and libraries--not just 
modules-- be built into initrd.

As an aside probably not germane, the grub.cfg "linux" line to load the kernel 
in some cases has "ro" readonly option and others not; what's the difference 
signify? How to make informed choice whether to use ro? Why mount a real disk 
rootfs without write access, before corruption's detected? Would that not 
potentially cripple some vital features such as logging?

So far it's clear that uas, usb_storage, & autofs4 may be built into initrd and 
then load ok, but they're not enough to restore normal systemd launch. Setting 
"MODULES=dep" in /etc/initramfs-tools/conf.d/driver-policy seems not smart 
enough for building in all necessary objects. Ideas welcome.

regards-
TP




Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-18 Thread Qu Wenruo


On 2018/10/18 下午2:16, Tony Prokott wrote:
>   On Wed, 17 Oct 2018 17:57:25 -0700 Qu Wenruo  
> wrote  
> ...
>  > > But after chrooting to update-initramfs and cataloging resulting image 
> content, usb_storage and uas were present under /lib/modules/xxx already, and 
> failing systems still just busybox without a real rootfs rather than launch 
> systemd; even tried kernel option "rootwait" which had no effect on access to 
> ext storage; udev still seems not to have noticed the ext drives once busybox 
> had control. 
>  >  
>  > Still looks like a initramfs problem other than btrfs problem. 
>  >  
>  > In the busybox environment, have you tried listing /dev to see if that 
>  > external device is found? 
> 
> agreed that initramfs smells bad, but it hadn't been a problem until btrfs 
> mounts (external-raid) had to rely on the usb channel;

By default, btrfs must see *all* devices to mount RAID1/10/5/6/0.
Unless you're using "degraded" mount option.

You could argue it's a bad decision, but still you have the choice.

> in busybox, ext drives/partitions are all missing from /dev; can't tell why 
> so, ahci and usb modules are loaded afaict
> 
>  > Since you have a busybox environment, have you checked if "btrfs" command 
> lives in the initramfs? 
> 
> yes btrfs command works from busybox
> 
>  > IIRC at least you need the following things/abilities to boot: 
>  >  
>  > 1) usb and sata drivers 
>  >Means you could see both devices in the busybox environment under /dev 
>  >  
>  > 2) "Btrfs" command 
>  >Mostly for scan 
>  > Then you could try the following commands under busybox environment: 
>  > # btrfs device scan 
>  > # mount   
> 
> "btrfs dev scan" runs but doesn't indicate recognizing any; since raid1 
> conversion, ext drives are required for any btrfs mounts to be seen whole.
> When manually trying to mount in busybox, it gives a similar error about 
> missing external device by UUID_SUB

Then it's still something wrong about the initramfs.

From your description, it looks pretty like the lack of external disk
driver is the root cause.

Could you try "btrfs fi show" to see which devices is missing?
I strongly suspect it's the external device, if so, then it is the
mkinitramfs driver causing the problem.

> 
>  > If it works, it may mean you're missing "btrfs device scan" during boot 
>  > so kernel can't see all RAID1 disks for btrfs and failed to boot. 
>  >  
>  > Please refer to your distribution initramfs creation tool to see how to 
>  > add that scan. (Some distro has special hook for btrfs to handle such 
> case). 
> 
> may have to tweak the /etc/initramfs-tools/initramfs.conf or modules list; 
> MODULES=dep setting might act better than MODULES=most
> will look into this further to see about contrasting block device modules 
> between cindy and the others

Sorry I can't help much in this case, as I'm not debian user.

But in Archlinux, it's pretty easy by just adding a 'block' hook.
And in that case, Archlinux will install all kernel modules under
'drivers/usb/storage' directory.

For details, you could refer to this file:
https://git.archlinux.org/mkinitcpio.git/tree/install/block

Thanks,
Qu

> 
> appreciate the timely response-
> TP
> 



signature.asc
Description: OpenPGP digital signature


Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-18 Thread Tony Prokott
  On Wed, 17 Oct 2018 17:57:25 -0700 Qu Wenruo  
wrote  
...
 > > But after chrooting to update-initramfs and cataloging resulting image 
 > > content, usb_storage and uas were present under /lib/modules/xxx already, 
 > > and failing systems still just busybox without a real rootfs rather than 
 > > launch systemd; even tried kernel option "rootwait" which had no effect on 
 > > access to ext storage; udev still seems not to have noticed the ext drives 
 > > once busybox had control. 
 >  
 > Still looks like a initramfs problem other than btrfs problem. 
 >  
 > In the busybox environment, have you tried listing /dev to see if that 
 > external device is found? 

agreed that initramfs smells bad, but it hadn't been a problem until btrfs 
mounts (external-raid) had to rely on the usb channel;
in busybox, ext drives/partitions are all missing from /dev; can't tell why so, 
ahci and usb modules are loaded afaict

 > Since you have a busybox environment, have you checked if "btrfs" command 
 > lives in the initramfs? 

yes btrfs command works from busybox

 > IIRC at least you need the following things/abilities to boot: 
 >  
 > 1) usb and sata drivers 
 >Means you could see both devices in the busybox environment under /dev 
 >  
 > 2) "Btrfs" command 
 >Mostly for scan 
 > Then you could try the following commands under busybox environment: 
 > # btrfs device scan 
 > # mount   

"btrfs dev scan" runs but doesn't indicate recognizing any; since raid1 
conversion, ext drives are required for any btrfs mounts to be seen whole.
When manually trying to mount in busybox, it gives a similar error about 
missing external device by UUID_SUB

 > If it works, it may mean you're missing "btrfs device scan" during boot 
 > so kernel can't see all RAID1 disks for btrfs and failed to boot. 
 >  
 > Please refer to your distribution initramfs creation tool to see how to 
 > add that scan. (Some distro has special hook for btrfs to handle such case). 

may have to tweak the /etc/initramfs-tools/initramfs.conf or modules list; 
MODULES=dep setting might act better than MODULES=most
will look into this further to see about contrasting block device modules 
between cindy and the others

appreciate the timely response-
TP



Re: Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-17 Thread Qu Wenruo


On 2018/10/18 上午12:38, Tony Prokott wrote:
> Good day. My technical trouble seems to be beyond the scope of active helpers 
> on debian's irc support channel. Reasonable supposition that it's quite 
> particular to the development stage of btrfs infrastructure on 4.17.xxx 
> backport kernels and userland tools available on debian 9.5 stretch as well 
> as buster, the testing suite to be released in the next several months as 
> 10.0 stable. 
> 
>  > / # uname -a; lsb_release -a
>  > Linux localhost 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 
> (2018-08-27) x86_64 GNU/Linux
>  > Distributor ID: LinuxMint
>  > Description: LMDE 3 Cindy
>  > Release: 3
>  > Codename: cindy
>  > 
>  > / # btrfs --version
>  > btrfs-progs v4.7.3
>  > 
>  > / # btrfs fi sh
>  > Label: 'sys'  uuid: [snip]
>  > Total devices 2 FS bytes used 24.07GiB
>  > devid1 size 401.59GiB used 26.03GiB path /dev/sda2
>  > devid2 size 401.76GiB used 26.03GiB path /dev/sdc1
>  > 
>  > / # btrfs fi df /
>  > Data, RAID1: total=24.00GiB, used=23.27GiB
>  > System, RAID1: total=32.00MiB, used=16.00KiB
>  > Metadata, RAID1: total=2.00GiB, used=820.00MiB
>  > GlobalReserve, single: total=69.17MiB, used=0.00B
>  > 
>  > / # btrfs su li -ta /
>  > ID gen top level   path
>  > -- --- -   
>  > 260115103  5   /d9
>  > 261115103  5   /d10
>  > 262123876  5   /home
>  > 263115148  261 /d10/@
>  > 264115136  261 /d10/@home
>  > 443123874  447 /md3/@
>  > 444123876  447 /md3/@home
>  > 447115103  5   /md3
>  > 451115144  260 /d9/@
>  > 452115136  260 /d9/@home
> 
> Providing no dmesg content so far, as it doesn't bear on the kind of 
> difficulty in question. My system requires expert help now to restore 
> bootability to 2 of its OS installations; it has a btrfs root file system in 
> subvolumes for stretch, buster, and LMDE3(cindy) which derives directly from 
> stretch and so has most core elements if not cfg defaults in common; even 
> kernel versions are alike, besides buster. subvolid=262 is a  /home fs shared 
> among  linux distros; 451, 263, and 443 are rootfs for stretch, buster and 
> cindy respectively.
> 
> All 3 installations had been booting and running fine when data block group 
> profile was "single" on an internal sata HDD /dev/sda2; then an external usb3 
> drive enclosure's sata HDD partition /dev/sdc1, also of size ~0.4TiB, was 
> added and balanced as btrfs "raid1"; raid conversion did not damage subvolume 
> content or filesystem integrity afaict, but rather rendered stretch and 
> buster unbootable (more to follow), whereas cindy carried on without hiccup.
> 
> At first it seemed as though the initrd's might be missing a module or so, to 
> allow access to external drives -- i.e. grub starts the unbootable 
> kernel/initrd but drops to busybox prompt right away without starting 
> external drives, referring to allegedly "missing" btrfs device's UUID_SUB.
> 
> But after chrooting to update-initramfs and cataloging resulting image 
> content, usb_storage and uas were present under /lib/modules/xxx already, and 
> failing systems still just busybox without a real rootfs rather than launch 
> systemd; even tried kernel option "rootwait" which had no effect on access to 
> ext storage; udev still seems not to have noticed the ext drives once busybox 
> had control.

Still looks like a initramfs problem other than btrfs problem.

In the busybox environment, have you tried listing /dev to see if that
external device is found?

> 
> I could list all initrd modules present in cindy & absent for others, but 
> need better knowledge than my reasonable guesses of what's required to make 
> btrfs volume companion devices cooperate at boot time, as initrd transitions 
> to steady state rootfs.

Since you have a busybox environment, have you checked if "btrfs"
command lives in the initramfs?

IIRC at least you need the following things/abilities to boot:

1) usb and sata drivers
   Means you could see both devices in the busybox environment under
   /dev

2) "Btrfs" command
   Mostly for scan

Then you could try the following commands under busybox environment:

# btrfs device scan
# mount  

If it works, it may mean you're missing "btrfs device scan" during boot
so kernel can't see all RAID1 disks for btrfs and failed to boot.

Please refer to your distribution initramfs creation tool to see how to
add that scan. (Some distro has special hook for btrfs to handle such case).

Thanks,
Qu

> 
> What would be a more practical diagnostic? Could stretch & buster initrd's 
> somehow be failing to do a btrfs device scan at the proper moment? Not so 
> interested in giving up on btrfs software raid so early in the game.
> 
> thanks in advance-
> TP [not a list subscriber]
> 
> 



signature.asc
Description: 

Conversion to btrfs raid1 profile on added ext device renders some systems unable to boot into converted rootfs

2018-10-17 Thread Tony Prokott
Good day. My technical trouble seems to be beyond the scope of active helpers 
on debian's irc support channel. Reasonable supposition that it's quite 
particular to the development stage of btrfs infrastructure on 4.17.xxx 
backport kernels and userland tools available on debian 9.5 stretch as well as 
buster, the testing suite to be released in the next several months as 10.0 
stable. 

 > / # uname -a; lsb_release -a
 > Linux localhost 4.17.0-0.bpo.3-amd64 #1 SMP Debian 4.17.17-1~bpo9+1 
 > (2018-08-27) x86_64 GNU/Linux
 > Distributor ID: LinuxMint
 > Description: LMDE 3 Cindy
 > Release: 3
 > Codename: cindy
 > 
 > / # btrfs --version
 > btrfs-progs v4.7.3
 > 
 > / # btrfs fi sh
 > Label: 'sys'  uuid: [snip]
 > Total devices 2 FS bytes used 24.07GiB
 > devid1 size 401.59GiB used 26.03GiB path /dev/sda2
 > devid2 size 401.76GiB used 26.03GiB path /dev/sdc1
 > 
 > / # btrfs fi df /
 > Data, RAID1: total=24.00GiB, used=23.27GiB
 > System, RAID1: total=32.00MiB, used=16.00KiB
 > Metadata, RAID1: total=2.00GiB, used=820.00MiB
 > GlobalReserve, single: total=69.17MiB, used=0.00B
 > 
 > / # btrfs su li -ta /
 > ID   gen top level   path
 > --   --- -   
 > 260  115103  5   /d9
 > 261  115103  5   /d10
 > 262  123876  5   /home
 > 263  115148  261 /d10/@
 > 264  115136  261 /d10/@home
 > 443  123874  447 /md3/@
 > 444  123876  447 /md3/@home
 > 447  115103  5   /md3
 > 451  115144  260 /d9/@
 > 452  115136  260 /d9/@home

Providing no dmesg content so far, as it doesn't bear on the kind of difficulty 
in question. My system requires expert help now to restore bootability to 2 of 
its OS installations; it has a btrfs root file system in subvolumes for 
stretch, buster, and LMDE3(cindy) which derives directly from stretch and so 
has most core elements if not cfg defaults in common; even kernel versions are 
alike, besides buster. subvolid=262 is a  /home fs shared among  linux distros; 
451, 263, and 443 are rootfs for stretch, buster and cindy respectively.

All 3 installations had been booting and running fine when data block group 
profile was "single" on an internal sata HDD /dev/sda2; then an external usb3 
drive enclosure's sata HDD partition /dev/sdc1, also of size ~0.4TiB, was added 
and balanced as btrfs "raid1"; raid conversion did not damage subvolume content 
or filesystem integrity afaict, but rather rendered stretch and buster 
unbootable (more to follow), whereas cindy carried on without hiccup.

At first it seemed as though the initrd's might be missing a module or so, to 
allow access to external drives -- i.e. grub starts the unbootable 
kernel/initrd but drops to busybox prompt right away without starting external 
drives, referring to allegedly "missing" btrfs device's UUID_SUB.

But after chrooting to update-initramfs and cataloging resulting image content, 
usb_storage and uas were present under /lib/modules/xxx already, and failing 
systems still just busybox without a real rootfs rather than launch systemd; 
even tried kernel option "rootwait" which had no effect on access to ext 
storage; udev still seems not to have noticed the ext drives once busybox had 
control.

I could list all initrd modules present in cindy & absent for others, but need 
better knowledge than my reasonable guesses of what's required to make btrfs 
volume companion devices cooperate at boot time, as initrd transitions to 
steady state rootfs.

What would be a more practical diagnostic? Could stretch & buster initrd's 
somehow be failing to do a btrfs device scan at the proper moment? Not so 
interested in giving up on btrfs software raid so early in the game.

thanks in advance-
TP [not a list subscriber]




RE: Unable to boot

2014-05-05 Thread George Pochiscan
Hello Chris,

Thanks for your response. I tried the steps you gave me, but still no luck.

Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the 
following error:

[root@localhost liveuser]# mount /dev/md127 /tmp/hdd
mount: wrong fs type, bad option, bad superblock on /dev/md127,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

For the simple  mount command the dmesg is : http://pastebin.com/TiPR7U2j
For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf
For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE


Thank you,
George Pochiscan
Support Engineer

Mobile: +40731831489
Phone: +40213225757
Fax: +40213222522
george.pochis...@sphs.ro
www.spearheadsystems.ro
64 I.P. Pavlov Street, 1st District
Bucharest, Romania

IT innovation at its finest.


From: Chris Murphy li...@colorremedies.com
Sent: Friday, May 2, 2014 22:41
To: George Pochiscan
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Unable to boot

On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote:

 Hello,

 I have a problem with a server with Fedora 20 and BTRFS. This server had 
 frequent hard restarts before the filesystem got corrupt and we are unable to 
 boot it.

 We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
 It had Debian installed (i don't know the version) and right now i'm using 
 fedora 20 live to try to rescue the  system.

Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
knowing exactly what the problem and solution is, is to try a much newer kernel 
and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but 
don't always succeed so you can go here to find the latest of everything:

https://apps.fedoraproject.org/releng-dash/

Find Fedora Live Desktop or Live KDE and click on details. Click the green link 
under descendants   livecd. And then under Output listing you'll see an ISO you 
can download, the one there right now is 
Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
daily.

You might want to boot with kernel parameter slub_debug=- (that's a minus 
symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
debug options enabled which makes it quite slow.



 When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
 output is:

 Checking filesystem on /dev/md127
 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
 checking extents
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 Csum didn't match
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 Csum didn't match
 -

 extent buffer leak: start 1006686208 len 4096
 found 32039247396 bytes used err is -22
 total csum bytes: 41608612
 total tree bytes: 388857856
 total fs tree bytes: 310124544
 total extent tree bytes: 22016000
 btree space waste bytes: 126431234
 file data blocks allocated: 47227326464
 referenced 42595635200
 Btrfs v3.12


I suggest a recent Rawhide build. And I suggest just trying to mount the file 
system normally first, and post anything that appears in dmesg. And if the 
mount fails, then try mount option -o recovery, and also post any dmesg 
messages from that too, and note whether or not it mounts. Finally if that 
doesn't work either then see if -o ro,recovery works and what kernel messages 
you get.





 When i attempt to repair i have the following error:
 -
 Backref 1005817856 parent 5 root 5 not found in extent tree
 backpointer mismatch on [1005817856 4096]
 owner ref check failed [1006686208 4096]
 repaired damaged extent references
 Failed to find [1000525824, 168, 4096]
 btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
 Aborted
 

You really shouldn't use --repair right off the bat, it's not a recommended 
early step, you should try normal mounting with newer kernels first, then 
recovery mount options first. Sometimes the repair option makes things worse. 
I'm not sure what its safety status is as of v3.14.

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Fedora includes btrfs-zero-log already so depending on the kernel messages you 
might try that before

Re: Unable to boot

2014-05-05 Thread Hugo Mills
On Mon, May 05, 2014 at 03:04:05PM +, George Pochiscan wrote:
 Hello Chris,
 
 Thanks for your response. I tried the steps you gave me, but still no luck.
 
 Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the 
 following error:
 
 [root@localhost liveuser]# mount /dev/md127 /tmp/hdd
 mount: wrong fs type, bad option, bad superblock on /dev/md127,
missing codepage or helper program, or other error
 
In some cases useful info is found in syslog - try
dmesg | tail or so.
 
 For the simple  mount command the dmesg is : http://pastebin.com/TiPR7U2j
 For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf
 For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE

   This looks like btrfs-zero-log may help you, as it's having trouble
recovering the log tree.

   Hugo.

 Thank you,
 George Pochiscan
 Support Engineer
 
 Mobile: +40731831489
 Phone: +40213225757
 Fax: +40213222522
 george.pochis...@sphs.ro
 www.spearheadsystems.ro
 64 I.P. Pavlov Street, 1st District
 Bucharest, Romania
 
 IT innovation at its finest.
 
 
 From: Chris Murphy li...@colorremedies.com
 Sent: Friday, May 2, 2014 22:41
 To: George Pochiscan
 Cc: linux-btrfs@vger.kernel.org
 Subject: Re: Unable to boot
 
 On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote:
 
  Hello,
 
  I have a problem with a server with Fedora 20 and BTRFS. This server had 
  frequent hard restarts before the filesystem got corrupt and we are unable 
  to boot it.
 
  We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
  It had Debian installed (i don't know the version) and right now i'm using 
  fedora 20 live to try to rescue the  system.
 
 Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
 knowing exactly what the problem and solution is, is to try a much newer 
 kernel and btrfs-progs, like a Fedora Rawhide live media. These are built 
 daily, but don't always succeed so you can go here to find the latest of 
 everything:
 
 https://apps.fedoraproject.org/releng-dash/
 
 Find Fedora Live Desktop or Live KDE and click on details. Click the green 
 link under descendants   livecd. And then under Output listing you'll see an 
 ISO you can download, the one there right now is 
 Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
 daily.
 
 You might want to boot with kernel parameter slub_debug=- (that's a minus 
 symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
 debug options enabled which makes it quite slow.
 
 
 
  When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
  output is:
 
  Checking filesystem on /dev/md127
  UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
  checking extents
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  Csum didn't match
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  Csum didn't match
  -
 
  extent buffer leak: start 1006686208 len 4096
  found 32039247396 bytes used err is -22
  total csum bytes: 41608612
  total tree bytes: 388857856
  total fs tree bytes: 310124544
  total extent tree bytes: 22016000
  btree space waste bytes: 126431234
  file data blocks allocated: 47227326464
  referenced 42595635200
  Btrfs v3.12
 
 
 I suggest a recent Rawhide build. And I suggest just trying to mount the file 
 system normally first, and post anything that appears in dmesg. And if the 
 mount fails, then try mount option -o recovery, and also post any dmesg 
 messages from that too, and note whether or not it mounts. Finally if that 
 doesn't work either then see if -o ro,recovery works and what kernel messages 
 you get.
 
 
 
 
 
  When i attempt to repair i have the following error:
  -
  Backref 1005817856 parent 5 root 5 not found in extent tree
  backpointer mismatch on [1005817856 4096]
  owner ref check failed [1006686208 4096]
  repaired damaged extent references
  Failed to find [1000525824, 168, 4096]
  btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset   0
  btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' 
  failed.
  Aborted
  
 
 You really shouldn't use --repair right off the bat, it's not a recommended 
 early step, you should try normal mounting with newer

RE: Unable to boot

2014-05-05 Thread George Pochiscan
Hello Hugo,

Running btrfs-zero-log /dev/md127 i have the following error:

checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
Csum didn't match
btrfs-zero-log: extent-tree.c:2717: alloc_reserved_tree_block: Assertion 
`!(ret)' failed.
Aborted (core dumped)

Full output : http://pastebin.com/3h5zVuWg
Full dmesg : http://pastebin.com/r9Fk8J8F

Thank you,

George Pochiscan
Support Engineer

Mobile: +40731831489
Phone: +40213225757
Fax: +40213222522
george.pochis...@sphs.ro
www.spearheadsystems.ro
64 I.P. Pavlov Street, 1st District
Bucharest, Romania

IT innovation at its finest.


From: Hugo Mills h...@carfax.org.uk
Sent: Monday, May 5, 2014 18:07
To: George Pochiscan
Cc: Chris Murphy; linux-btrfs@vger.kernel.org
Subject: Re: Unable to boot

On Mon, May 05, 2014 at 03:04:05PM +, George Pochiscan wrote:
 Hello Chris,

 Thanks for your response. I tried the steps you gave me, but still no luck.

 Each time i try to mount ( normally, -o recovery, -o ro,recovery) i have the 
 following error:

 [root@localhost liveuser]# mount /dev/md127 /tmp/hdd
 mount: wrong fs type, bad option, bad superblock on /dev/md127,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.

 For the simple  mount command the dmesg is : http://pastebin.com/TiPR7U2j
 For mount -o recovery option, the dmesg is : http://pastebin.com/NURDTeYf
 For mount -o ro,recovery options, the dmesg is : http://pastebin.com/UUmdWGgE

   This looks like btrfs-zero-log may help you, as it's having trouble
recovering the log tree.

   Hugo.

 Thank you,
 George Pochiscan
 Support Engineer

 Mobile: +40731831489
 Phone: +40213225757
 Fax: +40213222522
 george.pochis...@sphs.ro
 www.spearheadsystems.ro
 64 I.P. Pavlov Street, 1st District
 Bucharest, Romania

 IT innovation at its finest.

 
 From: Chris Murphy li...@colorremedies.com
 Sent: Friday, May 2, 2014 22:41
 To: George Pochiscan
 Cc: linux-btrfs@vger.kernel.org
 Subject: Re: Unable to boot

 On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote:

  Hello,
 
  I have a problem with a server with Fedora 20 and BTRFS. This server had 
  frequent hard restarts before the filesystem got corrupt and we are unable 
  to boot it.
 
  We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
  It had Debian installed (i don't know the version) and right now i'm using 
  fedora 20 live to try to rescue the  system.

 Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
 0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
 knowing exactly what the problem and solution is, is to try a much newer 
 kernel and btrfs-progs, like a Fedora Rawhide live media. These are built 
 daily, but don't always succeed so you can go here to find the latest of 
 everything:

 https://apps.fedoraproject.org/releng-dash/

 Find Fedora Live Desktop or Live KDE and click on details. Click the green 
 link under descendants   livecd. And then under Output listing you'll see an 
 ISO you can download, the one there right now is 
 Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
 daily.

 You might want to boot with kernel parameter slub_debug=- (that's a minus 
 symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
 debug options enabled which makes it quite slow.


 
  When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
  output is:
 
  Checking filesystem on /dev/md127
  UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
  checking extents
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
  Csum didn't match
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
  Csum didn't match
  -
 
  extent buffer leak: start 1006686208 len 4096
  found 32039247396 bytes used err is -22
  total csum bytes: 41608612
  total tree bytes: 388857856
  total fs tree bytes: 310124544
  total extent tree bytes: 22016000
  btree space waste bytes: 126431234
  file data blocks allocated: 47227326464
  referenced 42595635200
  Btrfs v3.12


 I suggest a recent Rawhide build. And I suggest just trying to mount the file 
 system normally first, and post anything that appears in dmesg. And if the 
 mount fails, then try mount option -o recovery, and also post any dmesg 
 messages from that too, and note whether or not it mounts. Finally

Re: Unable to boot

2014-05-05 Thread Chris Murphy

On May 5, 2014, at 9:11 AM, George Pochiscan george.pochis...@sphs.ro wrote:

 Hello Hugo,
 
 Running btrfs-zero-log /dev/md127 i have the following error:
 
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 Csum didn't match
 btrfs-zero-log: extent-tree.c:2717: alloc_reserved_tree_block: Assertion 
 `!(ret)' failed.
 Aborted (core dumped)
 
 Full output : http://pastebin.com/3h5zVuWg
 Full dmesg : http://pastebin.com/r9Fk8J8F

OK. Well I'm out of ideas at this point. I'm not a developer, and don't know 
what the problem is or how to fix it. So my advice at this point will be like 
throwing spaghetti  at a wall. (There is a lot of spaghetti available to throw 
at the wall when it comes to fixing btrfs if the normal mount code doesn't fix 
it automatically.)

Baring better advice from pretty much anyone else:

- First, btrfs-image -c9 -t4 /dev/md127 /path/for/large/file

The resulting file will be somewhere between 50% to 100% of the size reported 
for metadata by btrfs filesystem df. put this somewhere in case a developer 
wants to look at it in the current state.

- Then, btrfs check --init-csum-tree /dev/md127 will remove all checksums from 
the file system and should remove the csum errors preventing mount. The problem 
with this is it removes all checksums, so every read is reported as a mismatch 
but still permits the reads to proceed. As a result it's just a way to mount 
the file system, make a backup, and then created a new file system to restore 
to. So it's a recovery operation rather than a repair operation.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to boot

2014-05-02 Thread George Pochiscan
Hello,

I have a problem with a server with Fedora 20 and BTRFS. This server had 
frequent hard restarts before the filesystem got corrupt and we are unable to 
boot it.

We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
It had Debian installed (i don't know the version) and right now i'm using 
fedora 20 live to try to rescue the  system.

When we try btrfsck /dev/md127 i have a lot of checksum errors, and the output 
is: 

Checking filesystem on /dev/md127
UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
checking extents
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
Csum didn't match
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
Csum didn't match
-

extent buffer leak: start 1006686208 len 4096
found 32039247396 bytes used err is -22
total csum bytes: 41608612
total tree bytes: 388857856
total fs tree bytes: 310124544
total extent tree bytes: 22016000
btree space waste bytes: 126431234
file data blocks allocated: 47227326464
 referenced 42595635200
Btrfs v3.12



When i attempt to repair i have the following error:
-
Backref 1005817856 parent 5 root 5 not found in extent tree
backpointer mismatch on [1005817856 4096]
owner ref check failed [1006686208 4096]
repaired damaged extent references
Failed to find [1000525824, 168, 4096]
btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
Aborted





I have installed btrfs version 3.12

Linux localhost 3.11.10-301.fc20.x86_64 #1 SMP Thu Dec 5 14:01:17 UTC 2013 
x86_64 x86_64 x86_64 GNU/Linux

[root@localhost liveuser]# btrfs fi show
Label: none  uuid: e068faf0-2c16-4566-9093-e6d1e21a5e3c
Total devices 1 FS bytes used 40.04GiB
devid1 size 1.82TiB used 43.04GiB path /dev/md127
Btrfs v3.12


Please advice.

Thank you,
George Pochiscan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot

2014-05-02 Thread Chris Murphy

On May 2, 2014, at 4:00 AM, George Pochiscan george.pochis...@sphs.ro wrote:

 Hello,
 
 I have a problem with a server with Fedora 20 and BTRFS. This server had 
 frequent hard restarts before the filesystem got corrupt and we are unable to 
 boot it.
 
 We have a HP Proliant server with 4 disks @1TB each and Software RAID 5.
 It had Debian installed (i don't know the version) and right now i'm using 
 fedora 20 live to try to rescue the  system.

Fedora 20 Live has kernel 3.11.10 and btrfs-progs 
0.20.rc1.20131114git9f0c53f-1.fc20. So the general rule of thumb without 
knowing exactly what the problem and solution is, is to try a much newer kernel 
and btrfs-progs, like a Fedora Rawhide live media. These are built daily, but 
don't always succeed so you can go here to find the latest of everything:

https://apps.fedoraproject.org/releng-dash/

Find Fedora Live Desktop or Live KDE and click on details. Click the green link 
under descendants   livecd. And then under Output listing you'll see an ISO you 
can download, the one there right now is 
Fedora-Live-Desktop-x86_64-rawhide-20140502.iso - but of course this changes 
daily.

You might want to boot with kernel parameter slub_debug=- (that's a minus 
symbol) because all but Monday built Rawhide kernels have a bunch of kernel 
debug options enabled which makes it quite slow.


 
 When we try btrfsck /dev/md127 i have a lot of checksum errors, and the 
 output is: 
 
 Checking filesystem on /dev/md127
 UUID: e068faf0-2c16-4566-9093-e6d1e21a5e3c
 checking extents
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 checksum verify failed on 1006686208 found 457560AC wanted 6B3ECE11
 Csum didn't match
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 checksum verify failed on 1001492480 found 74CC3F5D wanted C222A2C9
 Csum didn't match
 -
 
 extent buffer leak: start 1006686208 len 4096
 found 32039247396 bytes used err is -22
 total csum bytes: 41608612
 total tree bytes: 388857856
 total fs tree bytes: 310124544
 total extent tree bytes: 22016000
 btree space waste bytes: 126431234
 file data blocks allocated: 47227326464
 referenced 42595635200
 Btrfs v3.12


I suggest a recent Rawhide build. And I suggest just trying to mount the file 
system normally first, and post anything that appears in dmesg. And if the 
mount fails, then try mount option -o recovery, and also post any dmesg 
messages from that too, and note whether or not it mounts. Finally if that 
doesn't work either then see if -o ro,recovery works and what kernel messages 
you get.


 
 
 
 When i attempt to repair i have the following error:
 -
 Backref 1005817856 parent 5 root 5 not found in extent tree
 backpointer mismatch on [1005817856 4096]
 owner ref check failed [1006686208 4096]
 repaired damaged extent references
 Failed to find [1000525824, 168, 4096]
 btrfs unable to find ref byte nr 1000525824 parent 0 root 1  owner 1 offset 0
 btrfsck: extent-tree.c:1752: write_one_cache_group: Assertion `!(ret)' failed.
 Aborted
 

You really shouldn't use --repair right off the bat, it's not a recommended 
early step, you should try normal mounting with newer kernels first, then 
recovery mount options first. Sometimes the repair option makes things worse. 
I'm not sure what its safety status is as of v3.14.

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

Fedora includes btrfs-zero-log already so depending on the kernel messages you 
might try that before a btrfsck --repair.



Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Matthew Booth
My laptop crashed hard earlier today. It reset immediately to a black
screen followed by the BIOS. I have no idea why.

However, it now fails to boot. I took a picture of the kernel panic
that results from trying to mount the root filesystem:
https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

To make things worse, btrfsck aborts with a double free, without
fixing it. I took a picture of that, too:
https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

As the kernel panic mentions btrfs_remove_free_space, I also tried
mounting with clear_cache. Unfortunately it didn't dislodge anything.

This is on a fully updated Fedora 18 system. I would really like to
get this data back. If anybody could offer a suggestion I'd be very
grateful.

Thanks,

Matt
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Harald Glatt
On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.

 Thanks,

 Matt
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

If you can make a complete image backup of the drive before trying any
things to bring it back.
Try mounting with -o nospace_cache, also try -o ro and -o recovery as
well as -o recovery,ro.

If you can bringt it back in ro mode you can at least copy your data
out of it if all else fails...

I'm not a dev, just a random guy having an interest in btrfs, so if
you don't have a backup and aren't able to create a dd copy of it
right now you might wanna wait for a reply of someone who actually
knows the code...

Good luck
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Jan Steffens
On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote:
 On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.
 If you can make a complete image backup of the drive before trying any
 things to bring it back.
 Try mounting with -o nospace_cache, also try -o ro and -o recovery as
 well as -o recovery,ro.

I think the bug happens during log recovery, so btrfs-zero-log might
get it mountable again, with the caveat of losing the most recently
fsynced changes.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Harald Glatt
If you are going to use btrfs-zero-log please create a btrfs-image
first that you can then upload to a bug report so that this can be
fixed.

# btrfs-image -c 9 -t 8 /dev/yourbtrfs /tmp/fs_image

On Mon, Mar 11, 2013 at 11:53 PM, Jan Steffens jan.steff...@gmail.com wrote:
 On Mon, Mar 11, 2013 at 11:49 PM, Harald Glatt m...@hachre.de wrote:
 On Mon, Mar 11, 2013 at 11:44 PM, Matthew Booth matt...@heisenbug.com 
 wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.

 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi

 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT

 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.

 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.
 If you can make a complete image backup of the drive before trying any
 things to bring it back.
 Try mounting with -o nospace_cache, also try -o ro and -o recovery as
 well as -o recovery,ro.

 I think the bug happens during log recovery, so btrfs-zero-log might
 get it mountable again, with the caveat of losing the most recently
 fsynced changes.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to boot btrfs filesystem, and btrfsck aborts

2013-03-11 Thread Josef Bacik
On Mon, Mar 11, 2013 at 04:44:58PM -0600, Matthew Booth wrote:
 My laptop crashed hard earlier today. It reset immediately to a black
 screen followed by the BIOS. I have no idea why.
 
 However, it now fails to boot. I took a picture of the kernel panic
 that results from trying to mount the root filesystem:
 https://plus.google.com/107763699965053810188/posts/QZZt7GYzBZi
 
 To make things worse, btrfsck aborts with a double free, without
 fixing it. I took a picture of that, too:
 https://plus.google.com/107763699965053810188/posts/gKYqGgFhWyT
 
 As the kernel panic mentions btrfs_remove_free_space, I also tried
 mounting with clear_cache. Unfortunately it didn't dislodge anything.
 
 This is on a fully updated Fedora 18 system. I would really like to
 get this data back. If anybody could offer a suggestion I'd be very
 grateful.


This is fixed in 3.9, I'll send those patches back to -stable, sorry I should
have done that before now.  If you can't get a 3.9 kernel to boot then just use
btrfs-zero-log and you'll be good to go.  Thanks,

Josef 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html