[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
I'd like to follow up because the issue seems to have cleared up for us after installing linux 5.0.1 about 40 days ago. It's hard to say whether everyone is experiencing the same bugs, but give 5.x a shot and let us know how it goes! Just to recap. Every week or so we were seeing R/O file systems with the follow errors, which required reboot & fsck. EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 144, block bitmap and bg descriptor inconsistent: 23914 vs 23913 free clusters Aborting journal on device vda2-8. EXT4-fs (vda2): Remounting filesystem read-only We never experienced any corruption on the host itself, only under KVM guests. Host DELL Poweredge 2950III Several KVM Guests: linux OS, distro&kernel doesn't make any difference, all randomly vulnerable during periods of high disk activity. Not sure it matters, but in our case we were using LVM2 volumes on the host and kvm media was configured as follows "media=disk,if=virtio,cache=none,aio=native,format=raw". We initially thought just one guest was affected, but over time we saw it happen with many distros and kernels. It wasn't until we had an extended period of downtime that we decided to reinstall the host with a 5.x kernel. None of the guests experienced any issues since, fingers crossed. At this point, it's hard to recommend Ubuntu 19.04 given that it's only a few months away from EOL, however the 5.x kernel seems promising whereas the Ubuntu 18.04LTS runs an older kernel that is still known to exhibit the corruption. For LTS I'd look into running it under a custom setup with a newer kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
I have a "PE 2950III Intel(R) Xeon(R) CPU X5460 @ 3.16GHz" server here and I've been trying to test this out. I'm using an "rsync" copy of an original server exhibiting the problem. So far though I've been unable to reproduce the original error at all. It would seem that using the exact same OS/kernel/binaries, the error doesn't happen on a fresh filesystem, I guess there must have been something about the filesystem image itself that triggered the fault. So my dilemma is that I don't know how to reproduce this fault on a fresh install. So while I can test this update, I'm not sure how valid the test will be on an installation that isn't faulting. Does anyone have a suggestion or have an idea about how to reproduce the conditions? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
Good work. We also have PE2950III systems running "Intel(R) Xeon(R) CPU X5460 @ 3.16GHz". If this is indeed the fix, I'm confused why it would only affect certain cpus? https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?h=7dec5603b6b8dc4c3e1c65d318bd2a5a8c62a424 I'll have to come up with a plan to replace debian's stable/jessie kernel with an unmanaged one on the host. I'm not keen on doing that as the DRAC units on these are not very reliable... -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
Oops...the above kvm command line is correct but it did not crash with -m 1000, that's what production is using now. It was crashing consistently with -m 512 about a minute into the synthetic FS load. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
chris, Here is what you asked for, sorry for not getting it earlier. I don't use virsh. This is how I started KVM to trigger the problem interactively (curses interface): kvm -drive file=/dev/raid/shared,media=disk,if=none,cache=none,aio=native,format=raw,id=hd0 -device virtio-blk-pci,drive=hd0 -smp 2 -m 1000 -netdev tap,ifname=vm_shared,script=no,downscript=no,id=eth0 -device virtio-net- pci,netdev=eth0,mac=52:54:00:12:34:58 -name shared -runas shared -curses fdisk -l Disk /dev/vda: 100 GiB, 107374182400 bytes, 209715200 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: AE4BDF3E-0B83-4C17-B104-A5139722F263 Device Start End Sectors Size Type /dev/vda1 2048 3905535 3903488 1.9G Linux swap /dev/vda2 3905536 209713151 205807616 98.1G Linux filesystem It hasn't happened in this particular VM since upping the RAM so the VM doesn't swap. My intention was to reproduce on non-production hardware, and then try different kernels, rule out LVM, virtio, etc. But I'm in the middle of a new assignment, I probably won't have time to do this myself before December. ** Attachment added: "/boot/config-3.16.0-4-amd64 for the VM in question" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+attachment/4518472/+files/config.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
I'm posting again to add that I conducted some more tests and ext3 does not encounter corruption under the same conditions. I hope this information is helpful to others, if anyone needs more information let me know and I'll see what I can do. I'll probably switch my own VMs to ext3 so I don't have to worry about these FS crashes. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
It's happened again. I've spent several hours on this and I've been able to recreate the failure under some synthetic conditions with a sacrificial VM. The filebench defaults do not cause an ext4 crash for me, but the following do: load workloads/fileserver set $dir=/tmp/ set $nfiles=20 set $meandirwidth=3 run 120 The ext4 error never happens in the filebench'es init phase, only 50s or so into the 50 threaded run phase. Less extreme settings won't produce a consistent crash. Reducing the amount of free memory makes the errors much more likely. This is before running filebench: total used free sharedbuffers cached Mem: 482M99M 382M 300K27M20M -/+ buffers/cache:52M 429M Swap: 1.9G94M 1.8G This is while running filebench one second before the crash: total used free sharedbuffers cached Mem: 482M 476M 5.6M 284K27M18M -/+ buffers/cache: 430M51M Swap: 1.9G 253M 1.6G 2769.63 The error is reproducible in cloned VMs. Moving swap to another disk changes nothing. As far as I can tell, the error never happens with ext4 filesystems other than the root FS where executables are running from. I've tried bonnie, stress-ng, and simple scripts, I have not been able to get these to crash ext4. The sacrificial VM has not crashed after add an extra 500MB to it. Although production was never under such heavy loads, I've added 500MB to the production VM to see if it helps anyways. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
It happened here again. 8/24 ext4 corruption 9/14 ext4 corruption 9/29 update/reboot 10/16 ext4 corruption This time the corruption was severe. 1743 files from multiple directories got moved into lost+found. It took me almost 2 hours this morning to verify & fix everything. Fortunately every time this has happened, all the files were dated prior to the daily backup and "diff -qr ..." shows exactly what was lost. As far as I can tell this is not a memory issue and the ext4 FS is using 10G out of 100G. Every time the corruption has been in /var/mail. However the VM is mostly used for mail so it may not be significant. The /var/mail branch itself is 4.3G 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19) x86_64 GNU/Linux I'm holding back the kernels on other VM's so this is the only VM with the problem. Is anyone able to reproduce this on demand? I really need to do something because this is causing downtime during normal business hours. I'll probably try one of the following: 1. Rebuild the FS from scratch and see if ext4 corruption continues. 2. Use ext3 or something else. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y
I'm on Debian, but it's happening to me as well. KVM with virtual disks backed by LVM volumes on the host. Both the VM and the host are running Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17) First occurred on August 24th. Manual fsck required, lots of files converted to lost-inodes. [1992712.418275] EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 96, block bitmap and bg descriptor inconsistent: 24017 vs 24015 free clusters [1992712.513438] Aborting journal on device vda2-8. [1992712.514007] EXT4-fs (vda2): Remounting filesystem read-only [1992712.514205] EXT4-fs error (device vda2) in ext4_evict_inode:243: Journal has aborted Happened again today September 14th in the same VM. [1489393.753098] EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 144, block bitmap and bg descriptor inconsistent: 23914 vs 23913 free clusters [1489393.803865] Aborting journal on device vda2-8. [1489393.804439] EXT4-fs (vda2): Remounting filesystem read-only This is the first syslog activity since I rebooted in August, no block IO errors on the guest or host. Manual fsck required, files were lost, but everything is running again. It has not happened in other VMs running older kernels. It also has not happened on the host, however there's very little file system activity on the host. The fact that it hasn't happened on other VMs leads me to believe the bug is inside the guest rather than with KVM - perhaps ext4 or the virtual disk driver. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1423672 Title: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs