[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2019-07-30 Thread LouieGosselin
I'd like to follow up because the issue seems to have cleared up for us
after installing linux 5.0.1 about 40 days ago. It's hard to say whether
everyone is experiencing the same bugs, but give 5.x a shot and let us
know how it goes!


Just to recap. Every week or so we were seeing R/O file systems with the follow 
errors, which required reboot & fsck.

EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 144, block 
bitmap and bg descriptor inconsistent: 23914 vs 23913 free clusters
Aborting journal on device vda2-8.
EXT4-fs (vda2): Remounting filesystem read-only

We never experienced any corruption on the host itself, only under KVM
guests.

Host DELL Poweredge 2950III
Several KVM Guests: linux OS, distro&kernel doesn't make any difference, all 
randomly vulnerable during periods of high disk activity.

Not sure it matters, but in our case we were using LVM2 volumes on the
host and kvm media was configured as follows
"media=disk,if=virtio,cache=none,aio=native,format=raw".

We initially thought just one guest was affected, but over time we saw
it happen with many distros and kernels. It wasn't until we had an
extended period of downtime that we decided to reinstall the host with a
5.x kernel. None of the guests experienced any issues since, fingers
crossed.


At this point, it's hard to recommend Ubuntu 19.04 given that it's only a few 
months away from EOL, however the 5.x kernel seems promising whereas the Ubuntu 
18.04LTS runs an older kernel that is still known to exhibit the corruption. 
For LTS I'd look into running it under a custom setup with a newer kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2016-05-25 Thread LouieGosselin
I have a "PE 2950III Intel(R) Xeon(R) CPU X5460 @ 3.16GHz" server here
and I've been trying to test this out. I'm using an "rsync" copy of an
original server exhibiting the problem. So far though I've been unable
to reproduce the original error at all.

It would seem that using the exact same OS/kernel/binaries, the error
doesn't happen on a fresh filesystem, I guess there must have been
something about the filesystem image itself that triggered the fault. So
my dilemma is that I don't know how to reproduce this fault on a fresh
install. So while I can test this update, I'm not sure how valid the
test will be on an installation that isn't faulting.

Does anyone have a suggestion or have an idea about how to reproduce the
conditions?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2016-03-21 Thread LouieGosselin
Good work.

We also have PE2950III systems running "Intel(R) Xeon(R) CPU X5460  @
3.16GHz".

If this is indeed the fix, I'm confused why it would only affect certain cpus?
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?h=7dec5603b6b8dc4c3e1c65d318bd2a5a8c62a424

I'll have to come up with a plan to replace debian's stable/jessie
kernel with an unmanaged one on the host. I'm not keen on doing that as
the DRAC units on these are not very reliable...

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-11-13 Thread LouieGosselin
Oops...the above kvm command line is correct but it did not crash with -m 1000, 
that's what production is using now.
It was crashing consistently with  -m 512 about a minute into the synthetic FS 
load.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-11-13 Thread LouieGosselin
chris,

Here is what you asked for, sorry for not getting it earlier.

I don't use virsh. This is how I started KVM to trigger the problem
interactively (curses interface):

kvm -drive
file=/dev/raid/shared,media=disk,if=none,cache=none,aio=native,format=raw,id=hd0
-device virtio-blk-pci,drive=hd0 -smp 2 -m 1000 -netdev
tap,ifname=vm_shared,script=no,downscript=no,id=eth0 -device virtio-net-
pci,netdev=eth0,mac=52:54:00:12:34:58 -name shared -runas shared -curses


fdisk -l
Disk /dev/vda: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: AE4BDF3E-0B83-4C17-B104-A5139722F263

Device   Start   End   Sectors  Size Type
/dev/vda1 2048   3905535   3903488  1.9G Linux swap
/dev/vda2  3905536 209713151 205807616 98.1G Linux filesystem


It hasn't happened in this particular VM since upping the RAM so the VM doesn't 
swap.

My intention was to reproduce on non-production hardware, and then try
different kernels, rule out LVM, virtio, etc.  But I'm in the middle of
a new assignment, I probably won't have time to do this myself before
December.


** Attachment added: "/boot/config-3.16.0-4-amd64 for the VM in question"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+attachment/4518472/+files/config.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-10-28 Thread LouieGosselin
I'm posting again to add that I conducted some more tests and ext3 does
not encounter corruption under the same conditions. I hope this
information is helpful to others, if anyone needs more information let
me know and I'll see what I can do. I'll probably switch my own VMs to
ext3 so I don't have to worry about these FS crashes.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-10-28 Thread LouieGosselin
It's happened again. I've spent several hours on this and I've been able
to recreate the failure under some synthetic conditions with a
sacrificial VM.

The filebench defaults do not cause an ext4 crash for me, but the
following do:

load workloads/fileserver
set $dir=/tmp/
set $nfiles=20
set $meandirwidth=3
run 120

The ext4 error never happens in the filebench'es init phase, only 50s or
so into the 50 threaded run phase. Less extreme settings won't produce a
consistent crash.

Reducing the amount of free memory makes the errors much more likely.

This is before running filebench:
 total   used   free sharedbuffers cached
Mem:  482M99M   382M   300K27M20M
-/+ buffers/cache:52M   429M
Swap: 1.9G94M   1.8G

This is while running filebench one second before the crash:
 total   used   free sharedbuffers cached
Mem:  482M   476M   5.6M   284K27M18M
-/+ buffers/cache:   430M51M
Swap: 1.9G   253M   1.6G
2769.63

The error is reproducible in cloned VMs.

Moving swap to another disk changes nothing.

As far as I can tell, the error never happens with ext4 filesystems
other than the root FS where executables are running from.

I've tried bonnie, stress-ng, and simple scripts, I have not been able
to get these to crash ext4.

The sacrificial VM has not crashed after add an extra 500MB to it.

Although production was never under such heavy loads, I've added 500MB
to the production VM to see if it helps anyways.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-10-16 Thread LouieGosselin
It happened here again.

8/24 ext4 corruption
9/14 ext4 corruption
9/29 update/reboot
10/16 ext4 corruption

This time the corruption was severe. 1743 files from multiple directories got 
moved into lost+found. 
It took me almost 2 hours this morning to verify & fix everything. Fortunately 
every time this has happened, all the files were dated prior to the daily 
backup and "diff -qr ..." shows exactly what was lost.

As far as I can tell this is not a memory issue and the ext4 FS is using
10G out of 100G.

Every time the corruption has been in /var/mail.  However the VM is
mostly used for mail so it may not be significant. The /var/mail branch
itself is 4.3G

3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u4 (2015-09-19) x86_64
GNU/Linux

I'm holding back the kernels on other VM's so this is the only VM with
the problem.


Is anyone able to reproduce this on demand?

I really need to do something because this is causing downtime during normal 
business hours. I'll probably try one of the following:
1. Rebuild the FS from scratch and see if ext4 corruption continues.
2. Use ext3 or something else.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1423672] Re: ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor inconsistent: X vs Y

2015-09-14 Thread LouieGosselin
I'm on Debian, but it's happening to me as well.
KVM with virtual disks backed by LVM volumes on the host.

Both the VM and the host are running
Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17)


First occurred on August 24th. Manual fsck required, lots of files converted to 
lost-inodes.
[1992712.418275] EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 
96, block bitmap and bg descriptor inconsistent: 24017 vs 24015 free clusters
[1992712.513438] Aborting journal on device vda2-8.
[1992712.514007] EXT4-fs (vda2): Remounting filesystem read-only
[1992712.514205] EXT4-fs error (device vda2) in ext4_evict_inode:243: Journal 
has aborted

Happened again today September 14th in the same VM.
[1489393.753098] EXT4-fs error (device vda2): ext4_mb_generate_buddy:757: group 
144, block bitmap and bg descriptor inconsistent: 23914 vs 23913 free clusters
[1489393.803865] Aborting journal on device vda2-8.
[1489393.804439] EXT4-fs (vda2): Remounting filesystem read-only

This is the first syslog activity since I rebooted in August, no block
IO errors on the guest or host.

Manual fsck required, files were lost, but everything is running again.

It has not happened in other VMs running older kernels. It also has not
happened on the host, however there's very little file system activity
on the host. The fact that it hasn't happened on other VMs leads me to
believe the bug is inside the guest rather than with KVM - perhaps ext4
or the virtual disk driver.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1423672

Title:
  ext4_mb_generate_buddy:756: group N, block bitmap and bg descriptor
  inconsistent: X vs Y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs