[Bug 1973167] Re: linux-image-4.15.0-177-generic freezes on the welcome screen
Thank you for your analysis and test kernel. A lot of our machines (Supermicro X11 / Xeon W-2133 based) also suffer from the problem introduced by 4.15.0-177. I can confirm that the kernel 4.15.0-182-generic #191+lp1973167 provided by Kai-Heng Feng fixes the issue on my HW. Could you please proceed with the roll-out of this patch? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1973167 Title: linux-image-4.15.0-177-generic freezes on the welcome screen To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973167/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915130] Re: libnfsidmap-regex package broken in current LTS release (focal)
Unfortunately, my glibc on Ubuntu 20.04 is too old: rpc.idmapd: libnfsidmap: Unable to load plugin: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /lib/x86_64-linux-gnu/libnfsidmap/regex.so) mememe@deatcsXXXfcYYY:/# dpkg-query --showformat=\${Version} --show libc6 2.31-0ubuntu9.42.31-0ubuntu9.4 A dry run analysis LGTM: mememe@deatcsXXXfcYYY:/# readelf -s /tmp/nfsidmap-test/libnfsidmap.so.1.0.0 | egrep 'nfsidmap_config_get|conf_get_str' 78: 41f010 FUNCGLOBAL DEFAULT 12 nfsidmap_config_get mememe@deatcsXXXfcYYY:/# readelf -s /tmp/nfsidmap-test/libnfsidmap/regex.so | egrep 'nfsidmap_config_get|conf_get_str' 12: 0 NOTYPE GLOBAL DEFAULT UND nfsidmap_config_get -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915130 Title: libnfsidmap-regex package broken in current LTS release (focal) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libnfsidmap-regex/+bug/1915130/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, sorry for the late reply. Today I triggered another fstrim with the linux-image-5.4.0-75-generic kernel and made a final check on the RAID - for me no trouble occured yet. Thank you for pursuing this topic so persistently and providing the patches to the Ubuntu kernel finally. Best regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, Thanks for your effort to add this feature to the Ubuntu kernels. I installed linux-image-5.4.0-75-generic on 2021-06-08. Neither during normal work nor during manual fstrim any problems so far. Best regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, thank you for your continuous effort. I tested your 5.4.0-72-generic #80+TEST1896578v20210504b1-Ubuntu until now without trouble. I also started fstrim manually on a machine which did not do it for some time due to disabled fstrim service. Regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, thank you for providing the test-kernel and instructions. I will give it a try. Regards, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew, are these tests still relevant for you? BR, Thimo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915130] [NEW] libnfsidmap-regex package broken in current LTS release (focal)
Public bug reported: Dear professionals, When using the regex translation method for idmapping, idmapd complains about an unresolved symbol (nfsidmap_config_get) in the regex.so shared object: systemctl status -l nfs-idmapd.service ● nfs-idmapd.service - NFSv4 ID-name mapping service Loaded: loaded (/lib/systemd/system/nfs-idmapd.service; static; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2021-02-04 13:51:52 CET; 22min ago Process: 43954 ExecStart=/usr/sbin/rpc.idmapd $RPCIDMAPDARGS (code=exited, status=1/FAILURE) Feb 04 13:51:52 defil37 rpc.idmapd[43954]: sss_nfs_init: use memcache: 1 Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/sss.so for method sss Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: Unable to load plugin: /lib/x86_64-linux-gnu/libnfsidmap/regex.so: undefined symbol: nfsidmap_config_get Feb 04 13:51:52 defil37 rpc.idmapd[43954]: libnfsidmap: requested translation method, 'regex', is not available Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: libnfsidmap: Unable to load plugin: /lib/x86_64-linux-gnu/libnfsidmap/regex.so: undefined symbol: nfsidmap_config_get Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: libnfsidmap: requested translation method, 'regex', is not available Feb 04 13:51:52 defil37 rpc.idmapd[43954]: rpc.idmapd: Unable to create name to user id mappings. Feb 04 13:51:52 defil37 systemd[1]: nfs-idmapd.service: Control process exited, code=exited, status=1/FAILURE Feb 04 13:51:52 defil37 systemd[1]: nfs-idmapd.service: Failed with result 'exit-code'. Feb 04 13:51:52 defil37 systemd[1]: Failed to start NFSv4 ID-name mapping service. This symbol obviously does not exist in the libnfsidmap.so.0.3.0 shared object: readelf -s /usr/lib/x86_64-linux-gnu/libnfsidmap.so.0.3.0 | grep nfsidmap_config_get When checking the sources at https://github.com/isginf/libnfsidmap- regex/blob/1179b2ec3392c91a40da228afada46fd210113a2/regex.c#L57 it seems this library is accidentally using the wrong interface since the symbol "conf_get_str" exists in libnfsidmap.so.0.3.0. Since the groovy version of the package "libnfsidmap-regex" did not raise the dependencies, I also checked this one and could verify that it: a) uses the "conf_get_str" function b) loads cleanly I would like to ask you to either re-compile this package correctly or publish the groovy package version also for focal. ** Affects: libnfsidmap-regex (Ubuntu) Importance: Undecided Status: New ** Tags: idmapd nfsd regex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915130 Title: libnfsidmap-regex package broken in current LTS release (focal) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libnfsidmap-regex/+bug/1915130/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
This is just the procedure with the least damage I found. Still data loss may happen (and actually happened to some of our systems). Probably first re-adding (after zeroing) the second component to the RAID and then fsck-ing leads to the exact same result but I wanted to keep the second component as fall-back until I could see the results of fsck. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] Re: raid10: discard leads to corrupted file system
Hi Matthew and all, thank you for taking action immediately. I really appreciate your effort. After investigating the issue further I have to add that the mount option discard seems to trigger the issue, too. @Trent The general problem here is that RAID10 can balance single read streams to all disks (which is probably the major advantage over RAID1 effectively providing you RAID0 read speed; RAID1 needs parallel reads to achieve this). That said it is no big surprise that several machines at our site went to readonly mode after *some time* (probably reading some filesystem relevant data from the "bad disk"). Unfortunately the "clean first disk" only happens if you act immediately, otherwise you might have some data corruption. I verified this on one system where the root partition was affected using the debsums tool (just run debsums -xa) after fixing FS errors. My procedure to recover was: Assembly of the RAID: mdadm --assemble /dev/md127 /dev/nvme0n1p2 mdadm --run /dev/md127 Filesystem check on all partitions (note the -f parameter, some FS "think" they are clean): fsck.ext4 -f /dev/VolGroup/... Re-add the second component: mdadm --zero-superblock /dev/nvme1n1p2 mdadm --add /dev/md127 /dev/nvme1n1p2 Best regards -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1907262] [NEW] raid10: discard leads to corrupted file system
Public bug reported: Seems to be closely related to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578 After updating the Ubuntu 18.04 kernel from 4.15.0-124 to 4.15.0-126 the fstrim command triggered by fstrim.timer causes a severe number of mismatches between two RAID10 component devices. This bug affects several machines in our company with different HW configurations (All using ECC RAM). Both, NVMe and SATA SSDs are affected. How to reproduce: - Create a RAID10 LVM and filesystem on two SSDs mdadm -C -v -l10 -n2 -N "lv-raid" -R /dev/md0 /dev/nvme0n1p2 /dev/nvme1n1p2 pvcreate -ff -y /dev/md0 vgcreate -f -y VolGroup /dev/md0 lvcreate -n root-L 100G -ay -y VolGroup mkfs.ext4 /dev/VolGroup/root mount /dev/VolGroup/root /mnt - Write some data, sync and delete it dd if=/dev/zero of=/mnt/data.raw bs=4K count=1M sync rm /mnt/data.raw - Check the RAID device echo check >/sys/block/md0/md/sync_action - After finishing (see /proc/mdstat), check the mismatch_cnt (should be 0): cat /sys/block/md0/md/mismatch_cnt - Trigger the bug fstrim /mnt - Re-Check the RAID device echo check >/sys/block/md0/md/sync_action - After finishing (see /proc/mdstat), check the mismatch_cnt (probably in the range of N*1): cat /sys/block/md0/md/mismatch_cnt After investigating this issue on several machines it *seems* that the first drive does the trim correctly while the second one goes wild. At least the number and severity of errors found by a USB stick live session fsck.ext4 suggests this. To perform the single drive evaluation the RAID10 was started using a single drive at once: mdadm --assemble /dev/md127 /dev/nvme0n1p2 mdadm --run /dev/md127 fsck.ext4 -n -f /dev/VolGroup/root vgchange -a n /dev/VolGroup mdadm --stop /dev/md127 mdadm --assemble /dev/md127 /dev/nvme1n1p2 mdadm --run /dev/md127 fsck.ext4 -n -f /dev/VolGroup/root When starting these fscks without -n, on the first device it seems the directory structure is OK while on the second device there is only the lost+found folder left. Side-note: Another machine using HWE kernel 5.4.0-56 (after using -53 before) seems to have a quite similar issue. Unfortunately the risk/regression assessment in the aforementioned bug is not complete: the workaround only mitigates the issues during FS creation. This bug on the other hand is triggered by a weekly service (fstrim) causing severe file system corruption. ** Affects: linux (Ubuntu) Importance: Undecided Status: Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1907262 Title: raid10: discard leads to corrupted file system To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1752251] Re: Missing mcelog userspace package in bionic - or maybe linux kernel config should disable mcelog_legacy
The removal of the package is quite unfortunate since rasdaemon is still missing the email notification feature. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1752251 Title: Missing mcelog userspace package in bionic - or maybe linux kernel config should disable mcelog_legacy To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1752251/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs