I forgot to include one more, that could well be relevant: Gábor Boskovits <[email protected]> ezt írta (időpont: 2019. szept. 4., Sze, 22:49):
> Hello Giovanni, > > Giovanni Biscuolo <[email protected]> ezt írta (időpont: 2019. szept. 4., Sze, > 16:36): > >> Hi Guix! >> >> Yesterday I had to physically replace a failed disk on milano-guix-1 >> (one of Guix build machines), that disk was part of a BTRFS RAID10 >> multi disk array and now the machine is unbootable >> > > Sorry to hear that. > > >> The BTRFS RAID10 array was made of 6 disks and was running well, some >> days ago Christopher Baines found that the 5th disk (/dev/sde) of that >> array failed and was able to remount it in degraded mode in order to >> re-balance the array and go on working without data loss >> >> Unfortunately I was not able to perform a "btrfs replace..." since >> adding a new disk (we have spare slots) was not detected by the >> kernel... HP ProLiant Smart Array is not so smart after all (aka bye bye >> hot swapping of disks) :-S... >> >> So I had to reboot the server and enter the config tool, added the new >> drive as a new Smart Array logical volume (RAID0 with 1 drive) [1] and >> removed the failed logical volume >> >> The problem now is that the boot process stops when trying to mount the >> BTRFS filesystem, the error is: >> >> --8<---------------cut here---------------start------------->8--- >> BTRFS error (device sda3): devid 5 uuid [omissis] is missing >> --8<---------------cut here---------------end--------------->8--- >> >> ([omissis] means I'm not copying the exact uuid, sda3 is the first block >> device in the BTRFS pool) >> >> All I get now is the guix rescue environment prompt, that I do not know >> how to use: I'm not able to boot with BRTFS in degraded mode :-S >> >> Christopher suggested I might be able to at least mount the filesystem >> with the degraded option in the guix rescue environment, which might be >> something like: >> >> --8<---------------cut here---------------start------------->8--- >> (mkdir "/mnt/broken-root") >> (mount "/dev/sda3" "/mnt/broken-root" "btrfs" 0 "degraded") >> --8<---------------cut here---------------end--------------->8--- >> >> but we do not know how to proceed from there. >> > > I don't know what would work from here, but here are a few ideas: > 1. somehow hack the degraded root option into the bootloader config, like > here: > https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1229456 > 2. try to switch_root, using /bin/sh as init, and try to fix the > bootloader config from there. > 3. see what the original script is doing, either by having a look at how > it is composed: > see for example: gnu/system.scm: > operating-system-default-essential-services, > gnu/services.scm: %boot-service and most prominently: > gnu/services/shepherd.scm: shepherd-boot-gexp > > Also take a look at this: gnu/build/linux-boot.scm. > Wdyt? > > >> Obviously I have no way now to reconfigure guix, the only idea I got is >> to boot from an USB rescue disk (e.g. grml) and try to do a "btrfs >> replace..." from there: that should fix the BTRFS array and should allow >> a mount in non-degraded mode, so the next Guix boot should succeed >> >> That machine is physically far away from me and I should collect as much >> info as possible before I go there to test for a solution (no remote >> serial console unfortunately) >> >> I'm searching the web for a solution, any hint will be greatly >> appreciated :-) >> >> Meanwhile milano-guix-1 build machine is offline... :-( >> >> Thank you for your attention, Gio' >> >> >> >> >> [1] AFAIU that is the only way to present a single disk to the OS and >> let the OS manage it as part of a **software** RAID pool (hardware RAID >> is not an option) >> >> -- >> Giovanni Biscuolo >> >> Xelera IT Infrastructures >> > > Best regards, > g_bor > > -- > OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21 > Best regards, g_bor -- OpenPGP Key Fingerprint: 7988:3B9F:7D6A:4DBF:3719:0367:2506:A96C:CF63:0B21
