bug#43132: [GUIX SYSTEM]: Malfunction
Hi Mark and Maxim! >> Ext4 won't detect bitrot (silent corruption of your drive's data). >> You'll probably wake one day with a fsck that won't be able to recover >> some files, or worst, a completely dead drive. >> >> Your backups would also contain corrupted data (garbage in, garbage >> out!). > > For what it's worth, I wholeheartedly agree with Maxim. Btrfs did you a > great service by calling attention to this problem with your drive, and > it would be a shame to ignore it and switch back to ext4 where your data > may instead be silently corrupted. > > I've been using btrfs for several years now on my x86_64 Guix system, > and it has served me well. Previously, I used ext4, which would > silently leave some of my files empty after crashes. I've never seen > that happen with btrfs. Yeah, makes sense. I have placed an order for WDS100T2B0A. Thanks folks! Regards, RG. signature.asc Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Maxim Cournoyer writes: > Raghav Gururajan writes: > >>> Maxim Cournoyer skribis: >>> I took Raghav to #btrfs last week, where with the help of gentle folks a failing drive was established as the most likely culprit. In other words, Btrfs checksuming capabilities helped quickly discovering a hardware problem which might otherwise have silently caused non-recoverable damage to Raghav's data. >>> >>> Good, thanks for following up! >>> >>> Ludo’. >> >> Thank you! >> >> Yeah, seems like my disk is shot, but I am not sure. I have reinstalled >> guix with ext4, instead of btrfs, as these issues started to arise after >> migration to btrfs from ext4. So far, my system is doing well. Lets see >> how it goes. :-) > > Sounds like playing with fire to me :-). > > Ext4 won't detect bitrot (silent corruption of your drive's data). > You'll probably wake one day with a fsck that won't be able to recover > some files, or worst, a completely dead drive. > > Your backups would also contain corrupted data (garbage in, garbage > out!). For what it's worth, I wholeheartedly agree with Maxim. Btrfs did you a great service by calling attention to this problem with your drive, and it would be a shame to ignore it and switch back to ext4 where your data may instead be silently corrupted. I've been using btrfs for several years now on my x86_64 Guix system, and it has served me well. Previously, I used ext4, which would silently leave some of my files empty after crashes. I've never seen that happen with btrfs. Mark
bug#43132: [GUIX SYSTEM]: Malfunction
Hello Raghav, Raghav Gururajan writes: > Hi! > >> Maxim Cournoyer skribis: >> >>> I took Raghav to #btrfs last week, where with the help of gentle folks a >>> failing drive was established as the most likely culprit. >>> >>> In other words, Btrfs checksuming capabilities helped quickly >>> discovering a hardware problem which might otherwise have silently >>> caused non-recoverable damage to Raghav's data. >> >> Good, thanks for following up! >> >> Ludo’. > > Thank you! > > Yeah, seems like my disk is shot, but I am not sure. I have reinstalled > guix with ext4, instead of btrfs, as these issues started to arise after > migration to btrfs from ext4. So far, my system is doing well. Lets see > how it goes. :-) Sounds like playing with fire to me :-). Ext4 won't detect bitrot (silent corruption of your drive's data). You'll probably wake one day with a fsck that won't be able to recover some files, or worst, a completely dead drive. Your backups would also contain corrupted data (garbage in, garbage out!). Maxim
bug#43132: [GUIX SYSTEM]: Malfunction
Hi! > Maxim Cournoyer skribis: > >> I took Raghav to #btrfs last week, where with the help of gentle folks a >> failing drive was established as the most likely culprit. >> >> In other words, Btrfs checksuming capabilities helped quickly >> discovering a hardware problem which might otherwise have silently >> caused non-recoverable damage to Raghav's data. > > Good, thanks for following up! > > Ludo’. Thank you! Yeah, seems like my disk is shot, but I am not sure. I have reinstalled guix with ext4, instead of btrfs, as these issues started to arise after migration to btrfs from ext4. So far, my system is doing well. Lets see how it goes. :-) Regards, RG. signature.asc Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi, Maxim Cournoyer skribis: > I took Raghav to #btrfs last week, where with the help of gentle folks a > failing drive was established as the most likely culprit. > > In other words, Btrfs checksuming capabilities helped quickly > discovering a hardware problem which might otherwise have silently > caused non-recoverable damage to Raghav's data. Good, thanks for following up! Ludo’.
bug#43132: [GUIX SYSTEM]: Malfunction
Hello, Ludovic Courtès writes: > Hey Raghav, > > Did you eventually find what went wrong? Should we close this bug or at > least retitle it? > > Thanks, > Ludo’. I took Raghav to #btrfs last week, where with the help of gentle folks a failing drive was established as the most likely culprit. In other words, Btrfs checksuming capabilities helped quickly discovering a hardware problem which might otherwise have silently caused non-recoverable damage to Raghav's data. I'm closing this bug now. Thanks! Maxim
bug#43132: [GUIX SYSTEM]: Malfunction
Hey Raghav, Did you eventually find what went wrong? Should we close this bug or at least retitle it? Thanks, Ludo’.
bug#43132: [GUIX SYSTEM]: Malfunction
Hi, On Mon, 31 Aug 2020 23:04:25 -0400 Raghav Gururajan wrote: > Hi Danny! > > > Usually that means file-system corruption, which very likely was caused by a > > hardware (disk) problem. I've had it before, and shortly after the disk > > died. > > Oh no! My disk is a SSD, which is only about 2 years old. Isn't that too > soon? > > Btw, is there a tool to check the health of the disk? Yes--usually it's a program in the disk firmware. You can steer it and look at what it did using smartctl (in package smartmontools). But I'd advise to check dmesg because it could also be a RAM problem, or a number of other things. (UNIX also has fsck to check the filesystem, but it already automatically does that on reboot when problems arised. So little need to manually fiddle with that) pgplug0ccb0cX.pgp Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi Danny! > Usually that means file-system corruption, which very likely was caused by a > hardware (disk) problem. I've had it before, and shortly after the disk died. Oh no! My disk is a SSD, which is only about 2 years old. Isn't that too soon? Btw, is there a tool to check the health of the disk? > What does "sudo dmesg" show around the time it made it read-only? Ah, I will have to wait until it happens again. Regards, RG. signature.asc Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi Raghav, On Mon, 31 Aug 2020 05:48:30 -0400 Raghav Gururajan wrote: > Hello Guix! > > [1] Out of no where, when I did `guix environment foo`, I got: > > \note: build failure may have been caused by lack of free disk space > builder for `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv' > failed with exit code 1 > > [2] When I redid the command 2nd time, I got: > > error (ignored): cannot unlink `/tmp/guix-build-profile.drv-0': > Read-only file system > error (ignored): cannot unlink > `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv.chroot/tmp/guix-build-profile.drv-0': > Read-only file system > guix environment: error: cannot link > `/gnu/store/.links/1jd7y4xvj853m4aygnyixci5h2y7a1py6iavp9kwzvcinyniqwbd' to > `/gnu/store/3klrs2bkcmypwnmx61q24rc7csgk19f8-profile/share/icons/Adwaita/64x64/emotes/face-smile-big > symbolic.symbolic.png': Read-only file system Usually that means file-system corruption, which very likely was caused by a hardware (disk) problem. I've had it before, and shortly after the disk died. What does "sudo dmesg" show around the time it made it read-only? pgp9n1cLQSXPh.pgp Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Julien Lepiller writes: > No, it's supposed to be like that. /gnu/store is mounted read-only (on > the guix system) to prevent you from writing to it. Very sorry for the confusion, I forgot that! (and I did not check before) :-( >>> /dev/mapper/secondary on / type btrfs >>> (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) >> >>[...] >> >>> /dev/mapper/secondary on /gnu/store type btrfs >>> (ro,relatime,ssd,space_cache,subvolid=5,subvol=/) ^ Same subvolume as / This is the output from a running Guix system of mine: --8<---cut here---start->8--- /dev/sda5 on / type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/) [...] /dev/sda5 on /gnu/store type btrfs (ro,relatime,space_cache,subvolid=5,subvol=/gnu/store) --8<---cut here---end--->8--- Thanks! Gio' [...] -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
bug#43132: [GUIX SYSTEM]: Malfunction
No, it's supposed to be like that. /gnu/store is mounted read-only (on the guix system) to prevent you from writing to it. The guix daemon has write access to the store when it wants to add a new item, or garbage collect. Le 31 août 2020 07:11:13 GMT-04:00, Giovanni Biscuolo a écrit : >Hello Raghav > >when forwarding the output of commands next time, plz beware your MUS >does not reformat the relevant :-) > >This seems as a system issue on your side, not a Guix bug > >Raghav Gururajan writes: > >>> It seems connected to a filesystem issue: can you also tell us >what's >>> the output of "mount"? > >[...] > >> w on / type btrfs >> (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) > >[...] > >> /dev/mapper/secondary on /gnu/store type btrfs >> (ro,relatime,ssd,space_cache,subvolid=5,subvol=/) > >I see two problems here: > >1. the btrfs volume /dev/mapper/secondary seems mounted twice, and with >the same subvolume; I never tryed to mount the same btrfs volume on two >different mountpoints: is this the reason your /gnu/store is read-only? > >2. /gnu/store is mounted read-only, that's why you get the errors > >Please can you try removing the mounting of /gnu/store from your >filesystem configuration (or fstab if on a foreign distro)? > >[...] > >HTH! Gio' > >-- >Giovanni Biscuolo > >Xelera IT Infrastructures
bug#43132: [GUIX SYSTEM]: Malfunction
Hello Raghav when forwarding the output of commands next time, plz beware your MUS does not reformat the relevant :-) This seems as a system issue on your side, not a Guix bug Raghav Gururajan writes: >> It seems connected to a filesystem issue: can you also tell us what's >> the output of "mount"? [...] > w on / type btrfs > (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) [...] > /dev/mapper/secondary on /gnu/store type btrfs > (ro,relatime,ssd,space_cache,subvolid=5,subvol=/) I see two problems here: 1. the btrfs volume /dev/mapper/secondary seems mounted twice, and with the same subvolume; I never tryed to mount the same btrfs volume on two different mountpoints: is this the reason your /gnu/store is read-only? 2. /gnu/store is mounted read-only, that's why you get the errors Please can you try removing the mounting of /gnu/store from your filesystem configuration (or fstab if on a foreign distro)? [...] HTH! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi Gio! > It seems connected to a filesystem issue: can you also tell us what's > the output of "mount"? none on /proc type proc (rw,relatime) none on /dev type devtmpfs (rw,relatime,size=3934712k,nr_inodes=983678,mode=755) none on /sys type sysfs (rw,relatime) /dev/mapper/secondary on / type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) none on /dev/pts type devpts (rw,relatime,gid=996,mode=620,ptmxmode=000) none on /sys/kernel/debug type debugfs (rw,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,relatime) /dev/mapper/secondary on /gnu/store type btrfs (ro,relatime,ssd,space_cache,subvolid=5,subvol=/) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) none on /run/systemd type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) none on /run/user type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) cgroup on /sys/fs/cgroup type tmpfs (rw,relatime) cgroup on /sys/fs/cgroup/elogind type cgroup (rw,relatime,name=elogind) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu) cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct) cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event) cgroup on /sys/fs/cgroup/pids type cgroup (rw,relatime,pids) cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=789160k,mode=700,uid=1000,gid=998) /dev/sdb1 on /media/rg/CARD type vfat (rw,nosuid,nodev,relatime,uid=1000,gid=998,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,showexec,utf8,flush,errors=remount-ro,uhelper=udisks2) Regards, RG. signature.asc Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi Raghav Raghav Gururajan writes: [...] > [2] When I redid the command 2nd time, I got: > > error (ignored): cannot unlink `/tmp/guix-build-profile.drv-0': > Read-only file system It seems connected to a filesystem issue: can you also tell us what's the output of "mount"? [...] Thanks, Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hi Efraim! > What's the output of 'df -h' and 'df -i'? There's not much change in > error message if you're out of space or just out of inodes. rg@secondary ~$ df -h Filesystem Size Used Avail Use% Mounted on none3.8G 0 3.8G 0% /dev /dev/dm-0 120G 95G 24G 81% / tmpfs 3.8G 8.6M 3.8G 1% /dev/shm none3.8G 20K 3.8G 1% /run/systemd none3.8G 0 3.8G 0% /run/user cgroup 3.8G 0 3.8G 0% /sys/fs/cgroup tmpfs 771M 4.0K 771M 1% /run/user/1000 /dev/sdb160G 59G 638M 99% /media/rg/CARD rg@secondary ~$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on none 983678 592 9830861% /dev /dev/dm-0 0 0 0 - / tmpfs 98645371 9863821% /dev/shm none 98645321 9864321% /run/systemd none 986453 2 9864511% /run/user cgroup 98645312 9864411% /sys/fs/cgroup tmpfs 98645314 9864391% /run/user/1000 /dev/sdb1 0 0 0 - /media/rg/CARD Regards, RG. signature.asc Description: OpenPGP digital signature
bug#43132: [GUIX SYSTEM]: Malfunction
On Mon, Aug 31, 2020 at 05:48:30AM -0400, Raghav Gururajan wrote: > Hello Guix! > > [1] Out of no where, when I did `guix environment foo`, I got: > > \note: build failure may have been caused by lack of free disk space > builder for `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv' > failed with exit code 1 > > [2] When I redid the command 2nd time, I got: > > error (ignored): cannot unlink `/tmp/guix-build-profile.drv-0': > Read-only file system > error (ignored): cannot unlink > `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv.chroot/tmp/guix-build-profile.drv-0': > Read-only file system > guix environment: error: cannot link > `/gnu/store/.links/1jd7y4xvj853m4aygnyixci5h2y7a1py6iavp9kwzvcinyniqwbd' to > `/gnu/store/3klrs2bkcmypwnmx61q24rc7csgk19f8-profile/share/icons/Adwaita/64x64/emotes/face-smile-big > symbolic.symbolic.png': Read-only file system > > [3] When I redid the command 3rd time, I got: > > guix environment: error: fport_read: Connection reset by peer > > [4] When I redid the command 4th time, I got: > > guix environment: error: failed to connect to > `/var/guix/daemon-socket/socket': Connection refused > > [5] So I tried to restart guix-daemon and got a weird output: > > sudo: unable to open /var/run/sudo/ts/rg: Read-only file system > Password: > Service guix-daemon is not running. > Service guix-daemon is currently disabled. > > [6] Then I tried to enable the daemon: > > sudo: unable to open /var/run/sudo/ts/rg: Read-only file system > Password: > Enabled service guix-daemon. > > [7] Then I tried to start the daemon: > > sudo: unable to open /var/run/sudo/ts/rg: Read-only file system > Password: > Service guix-daemon has been started. > > [8] Now, I retried the `guix environment foo` and got same error as in 4. > > [9] At this point, all the other running applications started to throw > errors regarding read-only file-system. I could not even save the above > errors in a text editor. Glad that I had the IceCat running and I was > able to email it to myself. IceCat wasn't affected, as I think the > web-process was containerized. Everything was back to normal after restart. > > [10] I am experiencing this situation for the 3rd time this month. It > never happened before this month. > > INFO: > > `guix describe` > > guix dad963a > repository URL: https://git.savannah.gnu.org/git/guix.git > commit: dad963a4393ea51409baa63817b26b449ed58338 > > Both my user profile and root profile are on the same commit. > > Regards, > RG. > What's the output of 'df -h' and 'df -i'? There's not much change in error message if you're out of space or just out of inodes. -- Efraim Flashner אפרים פלשנר GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted signature.asc Description: PGP signature
bug#43132: [GUIX SYSTEM]: Malfunction
Hello Guix! [1] Out of no where, when I did `guix environment foo`, I got: \note: build failure may have been caused by lack of free disk space builder for `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv' failed with exit code 1 [2] When I redid the command 2nd time, I got: error (ignored): cannot unlink `/tmp/guix-build-profile.drv-0': Read-only file system error (ignored): cannot unlink `/gnu/store/2ajnpcblwpgzjdhx3050qapy3li31pr5-profile.drv.chroot/tmp/guix-build-profile.drv-0': Read-only file system guix environment: error: cannot link `/gnu/store/.links/1jd7y4xvj853m4aygnyixci5h2y7a1py6iavp9kwzvcinyniqwbd' to `/gnu/store/3klrs2bkcmypwnmx61q24rc7csgk19f8-profile/share/icons/Adwaita/64x64/emotes/face-smile-big symbolic.symbolic.png': Read-only file system [3] When I redid the command 3rd time, I got: guix environment: error: fport_read: Connection reset by peer [4] When I redid the command 4th time, I got: guix environment: error: failed to connect to `/var/guix/daemon-socket/socket': Connection refused [5] So I tried to restart guix-daemon and got a weird output: sudo: unable to open /var/run/sudo/ts/rg: Read-only file system Password: Service guix-daemon is not running. Service guix-daemon is currently disabled. [6] Then I tried to enable the daemon: sudo: unable to open /var/run/sudo/ts/rg: Read-only file system Password: Enabled service guix-daemon. [7] Then I tried to start the daemon: sudo: unable to open /var/run/sudo/ts/rg: Read-only file system Password: Service guix-daemon has been started. [8] Now, I retried the `guix environment foo` and got same error as in 4. [9] At this point, all the other running applications started to throw errors regarding read-only file-system. I could not even save the above errors in a text editor. Glad that I had the IceCat running and I was able to email it to myself. IceCat wasn't affected, as I think the web-process was containerized. Everything was back to normal after restart. [10] I am experiencing this situation for the 3rd time this month. It never happened before this month. INFO: `guix describe` guix dad963a repository URL: https://git.savannah.gnu.org/git/guix.git commit: dad963a4393ea51409baa63817b26b449ed58338 Both my user profile and root profile are on the same commit. Regards, RG. signature.asc Description: OpenPGP digital signature