Bug#982459: closing 982459
close 982459 5.18.2-1 thanks
Processed: Re: Bug#982459: mdadm examine corrupts host ext4
Processing control commands: > retitle -1 mdadm --examine in chroot without /dev mounted corrupts host's > filesystem Bug #982459 [src:linux] mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem Changed Bug title to 'mdadm --examine in chroot without /dev mounted corrupts host's filesystem' from 'mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem'. > found -1 5.10.127-2 Bug #982459 [src:linux] mdadm --examine in chroot without /dev mounted corrupts host's filesystem Marked as found in versions linux/5.10.127-2. > fixed -1 5.18.2-1~bpo11+1 Bug #982459 [src:linux] mdadm --examine in chroot without /dev mounted corrupts host's filesystem Marked as fixed in versions linux/5.18.2-1~bpo11+1. -- 982459: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982459 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#982459: mdadm examine corrupts host ext4
Control: retitle -1 mdadm --examine in chroot without /dev mounted corrupts host's filesystem Control: found -1 5.10.127-2 Control: fixed -1 5.18.2-1~bpo11+1 On Tuesday, 2 August 2022 11:03:09 CET Chris Hofstaedtler wrote: > Control: reassign -1 src:linux On 10 Feb 2021 14:29:52 +0100 Patrick Cernko wrote: > $MDADM --examine --scan --config=partitions > > If I run this command in a chroot on a machine with md0 as host's root > filesystem WITHOUT mounting /proc, /sys and /dev in the chroot, mdadm > CORRUPTS the host's root filesystem (/dev/md0 with ext4 filesystem > format). I can reproduce this problem every time I do this. > > Kernel: Linux 5.4.78.1.amd64-smp (SMP w/4 CPU cores) > Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_USER, > TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Patrick: AFAICT, that is not a Debian (provided) kernel. Are or were you able to reproduce this issue with a Debian kernel? If so, which (exact) version? > * Håkan T Johansson [220801 19:31]: > > On Sun, 31 Jul 2022, Chris Hofstaedtler wrote: > > > I can't see a difference that should matter from userspace. > > > > > > I have stared a bit at the kernel code... there have been quite some > > > changes and fixes in this area. Which kernel version were you > > > running when testing this? > > > > > > Could you retry on something >= 5.9? I.e. some version with patch > > > 08fc1ab6d748ab1a690fd483f41e2938984ce353. > > > > I believe that I was running 5.10 (bullseye). Håkan: IIUC, the bug occurs with the 5.10.127-2 kernel. If you try it with the most recent 5.10 kernel, does the issue still occur? If we have a 'good' and a 'bad' 5.10 kernel, that would make it easier to narrow down in which commit it was fixed. > > It looks like 5.18 (from backports) does not show the issue! (i.e. works) > > > > host: > > linux-image-5.18.0-0.bpo.1-amd64 5.18.2-1~bpo11+1 > > > > [bug still occurs with] > > host: > >linux-image-5.10.0-16-amd64 5.10.127-2 Updated the bug accordingly. > > This time I did get some dmesg BUG output as well (attached). For reference [dmesg 1]: [mån aug 1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0010 [mån aug 1 15:53:08 2022] #PF: supervisor read access in kernel mode [mån aug 1 15:53:08 2022] #PF: error_code(0x) - not-present page [mån aug 1 15:53:08 2022] PGD 0 P4D 0 [mån aug 1 15:53:08 2022] Oops: [#1] SMP PTI [mån aug 1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P OE 5.10.0-16-amd64 #1 Debian 5.10.127-2 [mån aug 1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005 [mån aug 1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4] [mån aug 1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00 [mån aug 1 15:53:08 2022] RSP: 0018:ae27c059fd60 EFLAGS: 00010246 [mån aug 1 15:53:08 2022] RAX: RBX: 9d1b94505480 RCX: 9d1bc52e5e38 [mån aug 1 15:53:08 2022] RDX: 9d1bc13782d8 RSI: 0c14 RDI: c096feb0 [mån aug 1 15:53:08 2022] RBP: 9d1bc52e5e38 R08: 9d1be04d5230 R09: 0001 [mån aug 1 15:53:08 2022] R10: 9d1bc985f000 R11: 001d R12: 9d1bc13782d8 [mån aug 1 15:53:08 2022] R13: 9d1be04d5000 R14: 0c14 R15: 9d1bc13782d8 [mån aug 1 15:53:08 2022] FS: 7fed5ecb1840() GS:9d1cd7c8() knlGS: [mån aug 1 15:53:08 2022] CS: 0010 DS: ES: CR0: 80050033 [mån aug 1 15:53:08 2022] CR2: 0010 CR3: 0001a46d8000 CR4: 06e0 [mån aug 1 15:53:08 2022] Call Trace: [mån aug 1 15:53:08 2022] ext4_orphan_del+0x23f/0x290 [ext4] [mån aug 1 15:53:08 2022] ext4_evict_inode+0x31f/0x630 [ext4] [mån aug 1 15:53:08 2022] evict+0xd1/0x1a0 [mån aug 1 15:53:08 2022] __dentry_kill+0xe4/0x180 [mån aug 1 15:53:08 2022] dput+0x149/0x2f0 [mån aug 1 15:53:08 2022] __fput+0xe4/0x240 [mån aug 1 15:53:08 2022] task_work_run+0x65/0xa0 [mån aug 1 15:53:08 2022] exit_to_user_mode_prepare+0x111/0x120 [mån aug 1 15:53:08 2022] syscall_exit_to_user_mode+0x28/0x140 [mån aug 1 15:53:08 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [mån aug 1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77 > > I also noticed that the BUG: report in dmesg does not happen directly > > when doing 'mdadm --examine --scan --config=partitions'. It rather > > occurs when some activity happens on the host filesystem, e.g. > > a 'touch /root/a' command. > > > > I have tried with both kernels several times, and it was repeatable that > > 5.10 got stuck while 5.18 does not show issues. Repeatable is good :-) If you have a minimal set of steps to reproduce the issue, can you share that? > If you have the time, maybe trying the various kernel
Processed: Re: Bug#982459: mdadm examine corrupts host ext4
Processing control commands: > reassign -1 src:linux Bug #982459 [mdadm] mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem Bug reassigned from package 'mdadm' to 'src:linux'. No longer marked as found in versions mdadm/4.1-1. Ignoring request to alter fixed versions of bug #982459 to the same values previously set -- 982459: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982459 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#982459: mdadm examine corrupts host ext4
Control: reassign -1 src:linux Dear Håkan, thanks for reporting back and testing! * Håkan T Johansson [220801 19:31]: > On Sun, 31 Jul 2022, Chris Hofstaedtler wrote: > > > I can't see a difference that should matter from userspace. > > > > I have stared a bit at the kernel code... there have been quite some > > changes and fixes in this area. Which kernel version were you > > running when testing this? > > > > Could you retry on something >= 5.9? I.e. some version with patch > >08fc1ab6d748ab1a690fd483f41e2938984ce353. > > I believe that I was running 5.10 (bullseye). > > It looks like 5.18 (from backports) does not show the issue! (i.e. works) Okay, I think we are now clearly in "this is not an mdadm bug per se" territory (-> reassigning to src:linux). [..] > This time I did get some dmesg BUG output as well (attached). > It does not seem to be the same backtrace on two occurances. > > I also noticed that the BUG: report in dmesg does not happen directly > when doing 'mdadm --examine --scan --config=partitions'. It rather > occurs when some activity happens on the host filesystem, e.g. > a 'touch /root/a' command. > > host: > linux-image-5.18.0-0.bpo.1-amd64 5.18.2-1~bpo11+1 > > (did not re-install anything else, except upgraded zfs, also from > backports (since pure bullseye would not compile with 5.18)) > > Does not exhibit the problem. > > I have tried with both kernels several times, and it was repeatable that > 5.10 got stuck while 5.18 does not show issues. Its good that this now works in 5.18. However I'm not sure how we should find the commit fixing this - in 5.14 lots of block layer code was shuffled around/refactored. If you have the time, maybe trying the various kernel versions between 5.10 and 5.18 would be a good start. If they are not in backports anymore, they should still be at http://snapshot.debian.org/package/linux/ > Reminder: to get the issue, /dev/ should not be mounted in the chroot. > With /dev/ mounted, 5.10 also works. I'll see if I can repro this on 5.10, but need to find a box first. Best, Chris > [mån aug 1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: > 0010 > [mån aug 1 15:53:08 2022] #PF: supervisor read access in kernel mode > [mån aug 1 15:53:08 2022] #PF: error_code(0x) - not-present page > [mån aug 1 15:53:08 2022] PGD 0 P4D 0 > [mån aug 1 15:53:08 2022] Oops: [#1] SMP PTI > [mån aug 1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P > OE 5.10.0-16-amd64 #1 Debian 5.10.127-2 > [mån aug 1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge > 2850/0T7971, BIOS A04 09/22/2005 > [mån aug 1 15:53:08 2022] RIP: > 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4] > [mån aug 1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 > 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 > 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00 > [mån aug 1 15:53:08 2022] RSP: 0018:ae27c059fd60 EFLAGS: 00010246 > [mån aug 1 15:53:08 2022] RAX: RBX: 9d1b94505480 RCX: > 9d1bc52e5e38 > [mån aug 1 15:53:08 2022] RDX: 9d1bc13782d8 RSI: 0c14 RDI: > c096feb0 > [mån aug 1 15:53:08 2022] RBP: 9d1bc52e5e38 R08: 9d1be04d5230 R09: > 0001 > [mån aug 1 15:53:08 2022] R10: 9d1bc985f000 R11: 001d R12: > 9d1bc13782d8 > [mån aug 1 15:53:08 2022] R13: 9d1be04d5000 R14: 0c14 R15: > 9d1bc13782d8 > [mån aug 1 15:53:08 2022] FS: 7fed5ecb1840() > GS:9d1cd7c8() knlGS: > [mån aug 1 15:53:08 2022] CS: 0010 DS: ES: CR0: 80050033 > [mån aug 1 15:53:08 2022] CR2: 0010 CR3: 0001a46d8000 CR4: > 06e0 > [mån aug 1 15:53:08 2022] Call Trace: > [mån aug 1 15:53:08 2022] ext4_orphan_del+0x23f/0x290 [ext4] > [mån aug 1 15:53:08 2022] ext4_evict_inode+0x31f/0x630 [ext4] > [mån aug 1 15:53:08 2022] evict+0xd1/0x1a0 > [mån aug 1 15:53:08 2022] __dentry_kill+0xe4/0x180 > [mån aug 1 15:53:08 2022] dput+0x149/0x2f0 > [mån aug 1 15:53:08 2022] __fput+0xe4/0x240 > [mån aug 1 15:53:08 2022] task_work_run+0x65/0xa0 > [mån aug 1 15:53:08 2022] exit_to_user_mode_prepare+0x111/0x120 > [mån aug 1 15:53:08 2022] syscall_exit_to_user_mode+0x28/0x140 > [mån aug 1 15:53:08 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [mån aug 1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77 > [mån aug 1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 > b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f > 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8 > [mån aug 1 15:53:08 2022] RSP: 002b:7ffd50452818 EFLAGS: 0202 > ORIG_RAX: 0003 > [mån aug 1 15:53:08 2022] RAX: RBX: 55dab4578910 RCX: > 7fed5eea2d77 > [mån
Bug#982459: mdadm examine corrupts host ext4
On Sun, 31 Jul 2022, Chris Hofstaedtler wrote: I can't see a difference that should matter from userspace. I have stared a bit at the kernel code... there have been quite some changes and fixes in this area. Which kernel version were you running when testing this? Could you retry on something >= 5.9? I.e. some version with patch 08fc1ab6d748ab1a690fd483f41e2938984ce353. Dear Chris, I believe that I was running 5.10 (bullseye). It looks like 5.18 (from backports) does not show the issue! (i.e. works) Some more details: I have now tried again: host: linux-image-5.10.0-16-amd64 5.10.127-2 mdadm 4.2-1~bpo11+1 chroot: mdadm 4.1-11 Some more details: This time I did get some dmesg BUG output as well (attached). It does not seem to be the same backtrace on two occurances. I also noticed that the BUG: report in dmesg does not happen directly when doing 'mdadm --examine --scan --config=partitions'. It rather occurs when some activity happens on the host filesystem, e.g. a 'touch /root/a' command. host: linux-image-5.18.0-0.bpo.1-amd64 5.18.2-1~bpo11+1 (did not re-install anything else, except upgraded zfs, also from backports (since pure bullseye would not compile with 5.18)) Does not exhibit the problem. I have tried with both kernels several times, and it was repeatable that 5.10 got stuck while 5.18 does not show issues. Reminder: to get the issue, /dev/ should not be mounted in the chroot. With /dev/ mounted, 5.10 also works. Best regards, Håkan[mÃ¥n aug 1 15:53:08 2022] BUG: kernel NULL pointer dereference, address: 0010 [mÃ¥n aug 1 15:53:08 2022] #PF: supervisor read access in kernel mode [mÃ¥n aug 1 15:53:08 2022] #PF: error_code(0x) - not-present page [mÃ¥n aug 1 15:53:08 2022] PGD 0 P4D 0 [mÃ¥n aug 1 15:53:08 2022] Oops: [#1] SMP PTI [mÃ¥n aug 1 15:53:08 2022] CPU: 2 PID: 284256 Comm: cron Tainted: P OE 5.10.0-16-amd64 #1 Debian 5.10.127-2 [mÃ¥n aug 1 15:53:08 2022] Hardware name: Dell Computer Corporation PowerEdge 2850/0T7971, BIOS A04 09/22/2005 [mÃ¥n aug 1 15:53:08 2022] RIP: 0010:__ext4_journal_get_write_access+0x29/0x120 [ext4] [mÃ¥n aug 1 15:53:08 2022] Code: 00 0f 1f 44 00 00 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 55 48 89 cd 53 48 83 ec 10 48 89 3c 24 e8 ab d7 bb e1 48 8b 45 30 <4c> 8b 78 10 4d 85 ff 74 2f 49 8b 87 e0 00 00 00 49 8b 9f 88 03 00 [mÃ¥n aug 1 15:53:08 2022] RSP: 0018:ae27c059fd60 EFLAGS: 00010246 [mÃ¥n aug 1 15:53:08 2022] RAX: RBX: 9d1b94505480 RCX: 9d1bc52e5e38 [mÃ¥n aug 1 15:53:08 2022] RDX: 9d1bc13782d8 RSI: 0c14 RDI: c096feb0 [mÃ¥n aug 1 15:53:08 2022] RBP: 9d1bc52e5e38 R08: 9d1be04d5230 R09: 0001 [mÃ¥n aug 1 15:53:08 2022] R10: 9d1bc985f000 R11: 001d R12: 9d1bc13782d8 [mÃ¥n aug 1 15:53:08 2022] R13: 9d1be04d5000 R14: 0c14 R15: 9d1bc13782d8 [mÃ¥n aug 1 15:53:08 2022] FS: 7fed5ecb1840() GS:9d1cd7c8() knlGS: [mÃ¥n aug 1 15:53:08 2022] CS: 0010 DS: ES: CR0: 80050033 [mÃ¥n aug 1 15:53:08 2022] CR2: 0010 CR3: 0001a46d8000 CR4: 06e0 [mÃ¥n aug 1 15:53:08 2022] Call Trace: [mÃ¥n aug 1 15:53:08 2022] ext4_orphan_del+0x23f/0x290 [ext4] [mÃ¥n aug 1 15:53:08 2022] ext4_evict_inode+0x31f/0x630 [ext4] [mÃ¥n aug 1 15:53:08 2022] evict+0xd1/0x1a0 [mÃ¥n aug 1 15:53:08 2022] __dentry_kill+0xe4/0x180 [mÃ¥n aug 1 15:53:08 2022] dput+0x149/0x2f0 [mÃ¥n aug 1 15:53:08 2022] __fput+0xe4/0x240 [mÃ¥n aug 1 15:53:08 2022] task_work_run+0x65/0xa0 [mÃ¥n aug 1 15:53:08 2022] exit_to_user_mode_prepare+0x111/0x120 [mÃ¥n aug 1 15:53:08 2022] syscall_exit_to_user_mode+0x28/0x140 [mÃ¥n aug 1 15:53:08 2022] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [mÃ¥n aug 1 15:53:08 2022] RIP: 0033:0x7fed5eea2d77 [mÃ¥n aug 1 15:53:08 2022] Code: 44 00 00 48 8b 15 19 a1 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 e9 a0 0c 00 f7 d8 64 89 02 b8 [mÃ¥n aug 1 15:53:08 2022] RSP: 002b:7ffd50452818 EFLAGS: 0202 ORIG_RAX: 0003 [mÃ¥n aug 1 15:53:08 2022] RAX: RBX: 55dab4578910 RCX: 7fed5eea2d77 [mÃ¥n aug 1 15:53:08 2022] RDX: 7fed5ef6e8a0 RSI: RDI: 0006 [mÃ¥n aug 1 15:53:08 2022] RBP: R08: R09: 7fed5ef6dbe0 [mÃ¥n aug 1 15:53:08 2022] R10: 006f R11: 0202 R12: 7fed5ef6f4a0 [mÃ¥n aug 1 15:53:08 2022] R13: R14: R15: 0001 [mÃ¥n aug 1 15:53:08 2022] Modules linked in: msr autofs4 nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace sunrpc nfs_ssc fscache xt_mac xt_length xt_recent xt_multiport xt_tcpudp xt_state xt_conntrack
Bug#982459: mdadm examine corrupts host ext4
Hi Håkan, * Håkan T Johansson [220730 23:43]: > I have now tried with the mdadm 4.2~rc2-2 installed in both the chroot > environment (tried only that first), and also the host system. > Unfortunately, the host / fs is still affected when running > 'update-initramfs -u', when /dev is not mounted. [..] > is kind of readable, though, then I'm lost. I can't see a difference that should matter from userspace. I have stared a bit at the kernel code... there have been quite some changes and fixes in this area. Which kernel version were you running when testing this? Could you retry on something >= 5.9? I.e. some version with patch 08fc1ab6d748ab1a690fd483f41e2938984ce353. Thanks, Chris
Bug#982459:
Hi, On Sun, Aug 15, 2021 at 2:45 AM Håkan T Johansson wrote: > > I believe that I have been hit by this bug too. Thanks for the bug amendment! The 4.1 release happened nearly three years ago. With bullseye released, I just uploaded the latest release candidate 4.2~rc2-2 from upstream to Debian unstable. Feel free to try that too. Thank you! Kind regards Felix Lechner
Bug#982459:
Hi, I believe that I have been hit by this bug too. What has happened for me is that the machine in question 'almost' locks up, with a read-only /, and such that most commands to debug further never complete due to waiting for filesystem action. It then requires a reboot. 'dmesg' has worked, and then shows ext4-related issues. However, they were not recorded to /var/log. I generally do not find any corruption on the filesystem itself when running fsck afterwards. On the machine I have a number of chroot debian installations of different releases. By pure chance I found that 'update-initramfs' was the trigger for the system hangs. I could then repeatably trigger the issue again. (Before this, it would happen as part of system maintenance (unattended upgrades in the chroots), so just spuriously hang the machine.) In my case, the chroot installations live on a ZFS filesystem. But the host system itself is on (multiple; /, /usr/, /var/ ) MD raid1. I have had /proc mounted in the chroots. But had forgotten /dev . After mounting /dev (and /dev/pts) in the chroots, the issue has not happened again. The issue was when the host system ran Buster, I then upgraded to Bullseye ~2 weeks ago, hoping it would be resolved, but the issue was still present after the upgrade. Only after that upgrade I found the update-initramfs trigger. I am running with sysvinit, both on host and chroots. Currently, I do not have hands-on access to the system, so cannot inspect or reboot it reliably. Should be able to do some further tests in a few weeks. Best regards, Håkan
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
Hi, On Tue, Jul 13, 2021 at 12:42 AM Judit Foglszinger wrote: > > tried again but still fail to reproduce Thanks for trying to reproduce this bug! I am not sure it makes any difference either way, but I recently uploaded upstream's new release candidate 4.2~rc1 to experimental: https://packages.debian.org/source/experimental/mdadm Kind regards Felix Lechner
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
Hi, > I could reproduce the bug with /dev *NOT* mounted in chroot. It seems > independent of /sys being mounted in chroot. tried again but still fail to reproduce (same configuration as last time, just with /proc mounted to chroot/proc, rest not mounted). Additionally tried it with a RAID0 and also to install a kernel with initrd to the chroot, though again didn't manage to get the host file system corrupted. (system used for that second try was RC2 of bullseye on virtualbox, raid was configured using the Debian installer) I think, I need to give up on this. Maybe someone else has an idea. signature.asc Description: This is a digitally signed message part.
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
Hi, On 18.06.21 12:48, Patrick Cernko wrote: I will try to reproduce the bug now with one of /dev or /sys mounted and check if it still occurs or not. I will send my report about this later as this will take some time again. I could reproduce the bug with /dev *NOT* mounted in chroot. It seems independent of /sys being mounted in chroot. Best Regards, -- Patrick Cernko Joint Administration: Information Services and Technology Max-Planck-Institute fuer Informatik & Softwaresysteme smime.p7s Description: S/MIME Cryptographic Signature
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
Hi, On 25.04.21 00:36, Judit Foglszinger wrote: can you reproduce this bug on bullseye? (4.1-11) If so, what is your configuration (VM used, type of RAID)? Are all three conditions (/proc, /dev and /sys not mounted) required or does this also happen, if eg /dev and /sys are there but not /proc? If it still occurs until there would be a proper fix by upstream, a workaround like "are we in a chroot, if so, are the required things mounted, if not, fail", could be used to avoid the file system corruption. My own observations: Could not reproduce in virtualbox (both chroot and host system using recent bullseye), using RAID1, /dev/md0 on / type ext4 (rw,relatime,errors=remount-ro) # chroot chroot / # mdadm --examine --scan --config=partitions / # mdadm: cannot open /proc/partitions / # mdadm: No devices listed in partitions (in background on host running the mentioned find / command) No filesystem corruption after over 15 minutes, running the mdadm command in chroot several times didn't make a difference on that. I'm really sorry: Somehow I missed this mail when it came in my inbox 6 weeks ago. I only recognized the answer when I checked bugs.debian.org last week. I tried to reproduce the bug again and discovered, that my description contained a serious error: In fact /proc MUST be mounted in the chroot to observe the bug! I also could reproduce the bug with mdadm-4.1-11 (from bullseye) installed in the buster chroot (all other packages still from buster). I will try to reproduce the bug now with one of /dev or /sys mounted and check if it still occurs or not. I will send my report about this later as this will take some time again. Sorry for the delayed answer and the error in my initial bug report. Best Regards, -- Patrick Cernko Joint Administration: Information Services and Technology Max-Planck-Institute fuer Informatik & Softwaresysteme smime.p7s Description: S/MIME Cryptographic Signature
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
tags 982459 +moreinfo user debian-rele...@lists.debian.org usertags -1 + bsp-2021-04-AT-Salzburg thank you Hi, can you reproduce this bug on bullseye? (4.1-11) If so, what is your configuration (VM used, type of RAID)? Are all three conditions (/proc, /dev and /sys not mounted) required or does this also happen, if eg /dev and /sys are there but not /proc? If it still occurs until there would be a proper fix by upstream, a workaround like "are we in a chroot, if so, are the required things mounted, if not, fail", could be used to avoid the file system corruption. My own observations: Could not reproduce in virtualbox (both chroot and host system using recent bullseye), using RAID1, /dev/md0 on / type ext4 (rw,relatime,errors=remount-ro) # chroot chroot / # mdadm --examine --scan --config=partitions / # mdadm: cannot open /proc/partitions / # mdadm: No devices listed in partitions (in background on host running the mentioned find / command) No filesystem corruption after over 15 minutes, running the mdadm command in chroot several times didn't make a difference on that. signature.asc Description: This is a digitally signed message part.
Bug#982459: mdadm --examine in chroot without /proc,/dev,/sys mounted corrupts host's filesystem
Package: mdadm Version: 4.1-1 Severity: critical Tags: upstream When installing a kernel with initrd enabled, initramfs-tools calls /usr/share/initramfs-tools/hooks/mdadm. Doing this in a chroot previously created with debootstrap causes the hook to call $MDADM --examine --scan --config=partitions If I run this command in a chroot on a machine with md0 as host's root filesystem WITHOUT mounting /proc, /sys and /dev in the chroot, mdadm CORRUPTS the host's root filesystem (/dev/md0 with ext4 filesystem format). I can reproduce this problem every time I do this. To detect it, I made a background job reading all file in /: > while sleep 1; do > find / -xdev -type f -exec cat {} + > /dev/null > echo 3 > /proc/sys/vm/drop_caches # drop caches to for re-read > done A few seconds to minutes after invoking the corrupting command, you can see messages like this in kernel log: > EXT4-fs error (device md0): ext4_validate_inode_bitmap:100: comm uptimed: Corrupt inode bitmap - block_group = 96, inode_bitmap = 3145744 > EXT4-fs error (device md0): ext4_validate_block_bitmap:384: comm uptimed: bg 97: bad block bitmap checksum > EXT4-fs error (device md0) in ext4_free_blocks:4964: Filesystem failed CRC > EXT4-fs error (device md0) in ext4_free_inode:357: Corrupt filesystem We did not try to repair such filesystems but reinstalled the machine every time this occured while investigating. I tried to debug the problem and could bring it down to a BLKPG_DEL_PARTITION ioctl issued on a temporary device inode created by mdadm while running. This call is done in util.c:int test_partition(int fd) which is (somehow) called by Examine.c:int Examine(...) Invoking the same command in the chroot after mounting /dev, /proc and /sys in the chroot does not corrupt the host's filesystem. Please forward this bug report to upstream in order to get a fix/workaround or at least a huge warning implemented in mdadm to avoid data corruption for other users. -- Package-specific info: -- System Information: Debian Release: 10.7 APT prefers proposed-updates APT policy: (500, 'proposed-updates'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.4.78.1.amd64-smp (SMP w/4 CPU cores) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_USER, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) Versions of packages mdadm depends on: ii debconf [debconf-2.0] 1.5.71 ii libc6 2.28-10 ii lsb-base 10.2019051400 ii udev 241-7~deb10u5 Versions of packages mdadm recommends: ii exim4-daemon-light [mail-transport-agent] 4.92-8+deb10u4 ii kmod 26-1 Versions of packages mdadm suggests: pn dracut-core -- Configuration Files: /etc/cron.daily/mdadm [Errno 2] Datei oder Verzeichnis nicht gefunden: '/etc/cron.daily/mdadm' -- debconf information excluded /* O_DIRECT */ #define _GNU_SOURCE /* mknod */ #include #include #include #include /* printf */ #include /* open */ #include #include #include /* errno */ #include /* ioctl */ #include /* BLKPG */ /* #include */ /* * following taken from linux/blkpg.h because they aren't * anywhere else and it isn't safe to #include linux/ * stuff. */ #define BLKPG _IO(0x12,105) /* The argument structure */ struct blkpg_ioctl_arg { int op; int flags; int datalen; void *data; }; /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 #define BLKPG_VOLNAMELTH 64 /* The data structure for ADD_PARTITION and DEL_PARTITION */ struct blkpg_partition { long long start; /* starting offset in bytes */ long long length; /* length in bytes */ int pno; /* partition number */ char devname[BLKPG_DEVNAMELTH]; /* partition name, like sda5 or c0d1p2, to be used in kernel messages */ char volname[BLKPG_VOLNAMELTH]; /* volume label */ }; /* memset */ #include /* lseek */ #include #include /* BLKGETSIZE64 */ #ifndef BLKGETSIZE64 #define BLKGETSIZE64 _IOR(0x12,114,size_t) /* return device size in bytes (u64 *arg) */ #endif #define align(p, a) (((long)(p) + (a - 1)) & ~(a - 1)) int do_seek(int fd, unsigned long long offset, int whence) { printf("lseek(%d, %ld, SEEK_SET)\n", fd, offset); printf("lseek()=%ld\n", lseek(fd, offset, whence)); return 1; } int do_read(int fd, size_t size) { char unalignedbuffer[1]; char* buffer; ssize_t bytes; if (size > 4096) { printf("Reading %d bytes not supported yet!\n", size); return 0; } buffer = (char*)align(unalignedbuffer, 512); printf("read(%d, buffer, %d)\n", fd, size); bytes = read(fd, buffer, size); if (bytes < 0) { printf("read()=%d,