RE: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
> I grabbed this part from the log after the machine crashed again > following trying to transfer a bunch of files that included ones with > csum errors, let me know if this looks like the same issue you were > having: > Idk? You hit a soft lockup, mine got a "kernel BUG at..." Your stack trace diverges from mine after bio_endio. James -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
I grabbed this part from the log after the machine crashed again following trying to transfer a bunch of files that included ones with csum errors, let me know if this looks like the same issue you were having: Mar 31 00:49:42 sl-server kernel: NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [kworker/u67:5:80994] Mar 31 00:49:42 sl-server kernel: Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter dm_mirror dm_region_hash dm_log dm_mod kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xfs aesni_intel lrw gf128mul glue_helper libcrc32c ablk_helper cryptd joydev input_leds edac_mce_amd k10temp edac_core fam15h_power sp5100_tco sg i2c_piix4 8250_fintek acpi_cpufreq shpchp nfsd auth_rpcgss nfs_acl Mar 31 00:49:42 sl-server kernel: lockd grace sunrpc ip_tables btrfs xor ata_generic pata_acpi raid6_pq sd_mod mgag200 crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci serio_raw pata_atiixp libahci igb drm ptp pps_core mpt3sas dca raid_class libata i2c_algo_bit scsi_transport_sas fjes uas usb_storage Mar 31 00:49:42 sl-server kernel: CPU: 21 PID: 80994 Comm: kworker/u67:5 Not tainted 4.5.0-1.el7.elrepo.x86_64 #1 Mar 31 00:49:42 sl-server kernel: Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.511/25/2013 Mar 31 00:49:42 sl-server kernel: Workqueue: btrfs-endio btrfs_endio_helper [btrfs] Mar 31 00:49:42 sl-server kernel: task: 8817f6fa8000 ti: 8800b731 task.ti: 8800b731 Mar 31 00:49:42 sl-server kernel: RIP: 0010:[] [] btrfs_decompress_buf2page+0x123/0x200 [btrfs] Mar 31 00:49:42 sl-server kernel: RSP: 0018:8800b7313be0 EFLAGS: 0246 Mar 31 00:49:42 sl-server kernel: RAX: RBX: RCX: Mar 31 00:49:42 sl-server kernel: RDX: RSI: c9000e3d8000 RDI: 88144c7cc000 Mar 31 00:49:42 sl-server kernel: RBP: 8800b7313c48 R08: 8810f0295000 R09: 0020 Mar 31 00:49:42 sl-server kernel: R10: 8810d2ba7869 R11: 00010008 R12: 8817f6fa8000 Mar 31 00:49:42 sl-server kernel: R13: 8800b7313ce0 R14: 0008 R15: 1000 Mar 31 00:49:42 sl-server kernel: FS: 7efce58fb740() GS:881807d4() knlGS: Mar 31 00:49:42 sl-server kernel: CS: 0010 DS: ES: CR0: 8005003b Mar 31 00:49:42 sl-server kernel: CR2: 7f00caf249e8 CR3: 001062121000 CR4: 000406e0 Mar 31 00:49:42 sl-server kernel: Stack: Mar 31 00:49:42 sl-server kernel: 0020 f000 8810f0295000 8744 Mar 31 00:49:42 sl-server kernel: 00010008 c9000e3d7000 ea005131f300 0001 Mar 31 00:49:42 sl-server kernel: 0797 2869 0869 8810d2ba7000 Mar 31 00:49:42 sl-server kernel: Call Trace: Mar 31 00:49:42 sl-server kernel: [] lzo_decompress_biovec+0x202/0x300 [btrfs] Mar 31 00:49:42 sl-server kernel: [] end_compressed_bio_read+0x1f6/0x2f0 [btrfs] Mar 31 00:49:42 sl-server kernel: [] bio_endio+0x40/0x60 Mar 31 00:49:42 sl-server kernel: [] end_workqueue_fn+0x3c/0x40 [btrfs] Mar 31 00:49:42 sl-server kernel: [] normal_work_helper+0xc0/0x2c0 [btrfs] Mar 31 00:49:42 sl-server kernel: [] btrfs_endio_helper+0x12/0x20 [btrfs] Mar 31 00:49:42 sl-server kernel: [] process_one_work+0x14f/0x400 Mar 31 00:49:42 sl-server kernel: [] worker_thread+0x125/0x4b0 Mar 31 00:49:42 sl-server kernel: [] ? rescuer_thread+0x370/0x370 Mar 31 00:49:42 sl-server kernel: [] kthread+0xd8/0xf0 Mar 31 00:49:42 sl-server kernel: [] ? kthread_park+0x60/0x60 Mar 31 00:49:42 sl-server kernel: [] ret_from_fork+0x3f/0x70 Mar 31 00:49:42 sl-server kernel: [] ? kthread_park+0x60/0x60 Mar 31 00:49:42 sl-server kernel: Code: c7 48 8b 45 c0 49 03 7d 00 4a 8d 34 38 e8 06 18 00 e1 41 83 ac 24 28 12 00 00 01 41 8b 84 24 28 12 00 00 85 c0 0f 88 bf 00 00 00 <48> 89 d8 49 03 45 00 49 01 df 49 29 de 48 01 5d d0 48 3d 00 10 Mar 31 00:49:43 sl-server sh[1297]: abrt-dump-oops: Found oopses: 1 Mar 31 00:49:43 sl-server sh[1297]: abrt-dump-oops: Creating problem directories Mar 31 00:49:43 sl-server sh[1297]: abrt-dump-oops: Not going to make dump directories world readable because PrivateReports is on -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
Hello, Your experience looks similar to an issue that I've been running into recently. I have a btrfs array in RAID0 with compression=lzo set. The machine runs fine for awhile, then crashes at (seemingly) random with an error message in the journal about a stuck CPU and an issue with the kworker process. There are also a bunch of files on it that have been corrupted and throw csum errors when trying to access them. Combine that with some scheduled jobs that run every night that transfer files, and it's making more sense that this issue could be the same as you encountered. This happened on Scientific Linux 7.2 with kernel-ml (which I think is on version 4.5 now) installed from elrepo and the latest btrfs-progs. I also booted from an Ubuntu 15.10 USB drive and mounted the damaged array and ran "find /home -type f -exec cat {} /dev/null \;" from it and it looks like that has failed as well. I'll try to get the journal output posted and see if that could help narrow down the cause of the problem. Let me know if there's anything else you want me to take a look at or test on my machine that could help. Thanks, Mitch Fossen On Mon, Mar 28, 2016 at 9:36 AM James Johnston wrote: > > Hi, > > Thanks for the corroborating report - it does sound to me like you ran into > the > same problem I've found. (I don't suppose you ever captured any of the > crashes? If they assert on the same thing as me then it's even stronger > evidence.) > > > The failure mode of this particular ssd was premature failure of more and > > more sectors, about 3 MiB worth over several months based on the raw > > count of reallocated sectors in smartctl -A, but using scrub to rewrite > > them from the good device would normally work, forcing the firmware to > > remap that sector to one of the spares as scrub corrected the problem. > > I wonder what the risk of a CRC collision was in your situation? > > Certainly my test of "dd if=/dev/zero of=/dev/sdb" was very abusive, and I > wonder if the result after scrubbing is trustworthy, or if there was some > collisions. But I wasn't checking to see if data coming out the other end was > OK - I was just trying to see if the kernel crashes or not (e.g. a USB stick > holding a bad btrfs file system should not crash a system). > > > But /home (on an entirely separate filesystem, but a filesystem still on > > a pair of partitions, one on each of the same two ssds) would often have > > more, and because I have a particular program that I start with my X and > > KDE session that reads a bunch of files into cache as it starts up, I had > > a systemd service configured to start at boot and cat all the files in > > that particular directory to /dev/null, thus caching them so when I later > > started X and KDE (I don't run a *DM and thus login at the text CLI and > > startx, with a kde session, from the CLI) and thus this program, all the > > files it reads would already be in cache. > > > > If that service was allowed to run, it would read in all > > those files and the resulting errors would often crash the kernel. > > This sounds oddly familiar to how I made it crash. :) > > > So I quickly learned that if I powered up and the kernel crashed at that > > point, I could reboot with the emergency kernel parameter, which would > > tell systemd to give me a maintenance-mode root login prompt after doing > > its normal mounts but before starting the normal post-mount services, and > > I could run scrub from there. That would normally repair things without > > triggering the crash, and when I had run scrub repeatedly if necessary to > > correct any unverified errors in the first runs, I could then exit > > emergency mode and let systemd start the normal services, including the > > service that read all these files off the now freshly scrubbed > > filesystem, without further issues. > > That is one thing I did not test. I only ever scrubbed after first doing the > "cat all files to null" test. So in the case of compression, I never got that > far. Probably someone should test the scrubbing more thoroughly (i.e. with > that abusive "dd" test I did) just to be sure that it is stable to confirm > your > observations, and that the problem is only limited to ordinary file I/O on the > file system. > > > And apparently the devs don't test the > > someone less common combination of both compression and high numbers of > > raid1 correctable checksum errors, or they would have probably detected > > and fixed the problem from that. > > Well, I've only tested with RAID-1. I don't know if: > > 1. The problem occurs with other RAID levels like RAID-10, RAID5/6. > > 2. The kernel crashes in non-duplicated levels. In these cases, data loss is > inevitable since the data is missing, but these losses should be handled > cleanly, and not by crashing the kernel. For example: > > a. Checksum errors in RAID-0. > b. Checksum errors on a single hard drive (not multiple device array). > > I guess
Re: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
James Johnston posted on Mon, 28 Mar 2016 14:34:14 + as excerpted: > Thanks for the corroborating report - it does sound to me like you ran > into the same problem I've found. (I don't suppose you ever captured > any of the crashes? If they assert on the same thing as me then it's > even stronger evidence.) No... In fact, as I have compress=lzo on all my btrfs, until you found out that it didn't happen in the uncompressed case, I simply considered that part and parcel of btrfs not being fully stabilized and mature yet. I didn't even consider it a specific bug on its own, and thus didn't report it or trace it in any way, and simply worked around it, even tho I certainly found it frustrating. >> The failure mode of this particular ssd was premature failure of more >> and more sectors, about 3 MiB worth over several months based on the >> raw count of reallocated sectors in smartctl -A, but using scrub to >> rewrite them from the good device would normally work, forcing the >> firmware to remap that sector to one of the spares as scrub corrected >> the problem. > > I wonder what the risk of a CRC collision was in your situation? > > Certainly my test of "dd if=/dev/zero of=/dev/sdb" was very abusive, and > I wonder if the result after scrubbing is trustworthy, or if there was > some collisions. But I wasn't checking to see if data coming out the > other end was OK - I was just trying to see if the kernel crashes or not > (e.g. a USB stick holding a bad btrfs file system should not crash a > system). I had absolutely no trouble with the scrubbed data, or at least none I attributed to that, tho I didn't have the data cross-hashed and cross- check the post-scrub result against earlier hashes or anything, so a few CRC collisions could have certainly snuck thru. But even were some to have done so, or even if they didn't in practice, if they could have in theory, just the standard crc checks are so far beyond what's built into a normal filesystem like the reiserfs that's still my second (and non-btrfs) level backup. So it's not like I'm majorly concerned. If I was paranoid, as I mentioned I could certainly be doing cross-checks against multiple hashes, but I survived without any sort of routine data integrity checking for years, and even a practical worst-case-scenario crc-collision is already an infinite percentage better than that (just as 1 is an infinite percentage of 0), so it's nothing I'm going to worry about unless I actually start seeing real cases of it. >> So I quickly learned that if I powered up and the kernel crashed at >> that point, I could reboot with the emergency kernel parameter, which >> would tell systemd to give me a maintenance-mode root login prompt >> after doing its normal mounts but before starting the normal post-mount >> services, and I could run scrub from there. That would normally repair >> things without triggering the crash, and when I had run scrub >> repeatedly if necessary to correct any unverified errors in the first >> runs, I could then exit emergency mode and let systemd start the normal >> services, including the service that read all these files off the now >> freshly scrubbed filesystem, without further issues. > > That is one thing I did not test. I only ever scrubbed after first > doing the "cat all files to null" test. So in the case of compression, > I never got that far. Probably someone should test the scrubbing more > thoroughly (i.e. with that abusive "dd" test I did) just to be sure that > it is stable to confirm your observations, and that the problem is only > limited to ordinary file I/O on the file system. I suspect that when the devs duplicate the bug and ultimately trace it down, we'll know from the code-path whether scrub could have hit it or not, without actually testing the scrub case on its own. And along with the fix it's a fair bet will be an fstests patch that will verify no regressions there once fixed, as well. Once the fstests patch is in, it should be just a small tweak to test whether scrub's subject to the problem if it uses a different code-path, or not, and in fact once they find and verify with a fix the problem here, even if scrub doesn't use that code-path, I expect they'll be verifying scrub's own code-paths as well. >> And apparently the devs don't test the someone less common combination >> of both compression and high numbers of raid1 correctable checksum >> errors, or they would have probably detected and fixed the problem from >> that. > > Well, I've only tested with RAID-1. I don't know if: > > 1. The problem occurs with other RAID levels like RAID-10, RAID5/6. > > 2. The kernel crashes in non-duplicated levels. In these cases, data > loss is inevitable since the data is missing, but these losses should be > handled cleanly, and not by crashing the kernel. Good points. Again, I expect the extent of the bug based on its code- path and what actually uses it, should be readi
RE: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
Hi, Thanks for the corroborating report - it does sound to me like you ran into the same problem I've found. (I don't suppose you ever captured any of the crashes? If they assert on the same thing as me then it's even stronger evidence.) > The failure mode of this particular ssd was premature failure of more and > more sectors, about 3 MiB worth over several months based on the raw > count of reallocated sectors in smartctl -A, but using scrub to rewrite > them from the good device would normally work, forcing the firmware to > remap that sector to one of the spares as scrub corrected the problem. I wonder what the risk of a CRC collision was in your situation? Certainly my test of "dd if=/dev/zero of=/dev/sdb" was very abusive, and I wonder if the result after scrubbing is trustworthy, or if there was some collisions. But I wasn't checking to see if data coming out the other end was OK - I was just trying to see if the kernel crashes or not (e.g. a USB stick holding a bad btrfs file system should not crash a system). > But /home (on an entirely separate filesystem, but a filesystem still on > a pair of partitions, one on each of the same two ssds) would often have > more, and because I have a particular program that I start with my X and > KDE session that reads a bunch of files into cache as it starts up, I had > a systemd service configured to start at boot and cat all the files in > that particular directory to /dev/null, thus caching them so when I later > started X and KDE (I don't run a *DM and thus login at the text CLI and > startx, with a kde session, from the CLI) and thus this program, all the > files it reads would already be in cache. > > If that service was allowed to run, it would read in all > those files and the resulting errors would often crash the kernel. This sounds oddly familiar to how I made it crash. :) > So I quickly learned that if I powered up and the kernel crashed at that > point, I could reboot with the emergency kernel parameter, which would > tell systemd to give me a maintenance-mode root login prompt after doing > its normal mounts but before starting the normal post-mount services, and > I could run scrub from there. That would normally repair things without > triggering the crash, and when I had run scrub repeatedly if necessary to > correct any unverified errors in the first runs, I could then exit > emergency mode and let systemd start the normal services, including the > service that read all these files off the now freshly scrubbed > filesystem, without further issues. That is one thing I did not test. I only ever scrubbed after first doing the "cat all files to null" test. So in the case of compression, I never got that far. Probably someone should test the scrubbing more thoroughly (i.e. with that abusive "dd" test I did) just to be sure that it is stable to confirm your observations, and that the problem is only limited to ordinary file I/O on the file system. > And apparently the devs don't test the > someone less common combination of both compression and high numbers of > raid1 correctable checksum errors, or they would have probably detected > and fixed the problem from that. Well, I've only tested with RAID-1. I don't know if: 1. The problem occurs with other RAID levels like RAID-10, RAID5/6. 2. The kernel crashes in non-duplicated levels. In these cases, data loss is inevitable since the data is missing, but these losses should be handled cleanly, and not by crashing the kernel. For example: a. Checksum errors in RAID-0. b. Checksum errors on a single hard drive (not multiple device array). I guess more testing is needed, but I don't have time to do this more exhaustive testing right now, especially for these other RAID levels I'm not planning to use (as I'm doing this in my limited free time). (For now, I can just turn off compression & move on.) Do any devs do regular regression testing for these sorts of edge cases once they come up? (i.e. this problem won't come back, will it?) > So thanks for the additional tests and narrowing it down to the > compression on raid1 with many checksum errors case. Now that you've > found out how the problem can be replicated, I'd guess we'll have a fix > patch in relatively short order. =:^) Hopefully! Like I said, it might not be limited to RAID-1 though. I only tested RAID-1. > That said, based on my own experience, I don't consider the problem dire > enough to switch off compression on my btrfs raid1s here. After all, I > both figured out how to live with the problem on my failing ssd before I > knew all this detail, and have eliminated the symptoms for the time being > at least, as the devices I'm using now are currently reliable enough that > I don't have to deal with this issue. > > And in the even that I do encounter the problem again, in severe enough > form that I can't even get a successful scrub in to fix it, possibly due > to catastrophic failure of a d
Re: Compression causes kernel crashes if there are I/O or checksum errors (was: RE: kernel BUG at fs/btrfs/volumes.c:5519 when hot-removing device in RAID-1)
James Johnston posted on Mon, 28 Mar 2016 04:41:24 + as excerpted: > After puzzling over the btrfs failure I reported here a week ago, I > think there is a bad incompatibility between compression and RAID-1 > (maybe other RAID levels too?). I think it is unsafe for users to use > compression, at least with multiple devices until this is > fixed/investigated further. That seems like a drastic claim, but I know > I will not be using it for now. Otherwise, checksum errors scattered > across multiple devices that *should* be recoverable will render the > file system unusable, even to read data from. (One alternative > hypothesis might be that defragmentation causes the issue, since I used > defragment to compress existing files.) > > I finally was able to simplify this to a hopefully easy to reproduce > test case, described in lengthier detail below. In summary, suppose we > start with an uncompressed btrfs file system on only one disk containing > the root file system, > such as created by a clean install of a Linux distribution. I then: > (1) enable compress=lzo in fstab, reboot, and then defragment the disk > to compress all the existing files, (2) add a second drive to the array > and balance for RAID-1, (3) reboot for good measure, (4) cause a high > level of I/O errors, such as hot-removal of the second drive, OR simply > a high level of bit rot (i.e. use dd to corrupt most of the disk, while > either mounted or unmounted). This is guaranteed to cause the kernel to > crash. Described that way, my own experience confirms your tests, except that (1) I hadn't tested the no-compression case to know it was any different, and (2) in my case I was actually using btrfs raid1 mode and scrub to be able to continue to deal with a failing ssd out of a pair, for quite some while after I would have ordinarily had to replace it were I not using something like btrfs raid1 with checksummed file integrity and scrubbing errors with replacements from the good device. Here's how it worked for me and why I ultimately agree with your conclusions, at least regarding compressed raid1 mode crashes due to too many failed checksum failures (since I have no reference to agree or disagree with the uncompressed case). As I said above, I had one ssd failing, but was taking the opportunity while I had it to watch its behavior deeper into the failure than I normally would, and while I was at it, get familiar enough with btrfs scrub to repair errors that it became just another routine command for me (to the point that I even scripted up a custom scrub command complete with my normally used options, etc). On the relatively small (largest was 24 GiB per device, paired device btrfs raid1) multiple btrfs on partitions on the two devices scrub was normally under a minute to run even when doing quite a few repairs, so it wasn't as if it was taking me the hours to days it can take at TB scale on spinning rust. The failure mode of this particular ssd was premature failure of more and more sectors, about 3 MiB worth over several months based on the raw count of reallocated sectors in smartctl -A, but using scrub to rewrite them from the good device would normally work, forcing the firmware to remap that sector to one of the spares as scrub corrected the problem. One not immediately intuitive thing I found with scrub, BTW, was that if it finished with unverified errors, I needed to rerun scrub again to do further repairs. I've since confirmed with someone who can read code (I sort of do but more at the admin playing with patches level than the dev level) that my guess at the reason behind this behavior was correct. When a metadata node fails checksum verification and is repaired, the checksums that it in turn contained cannot be verified in that pass and show up as unverified errors. A repeated scrub once those errors are fixed can verify and fix if necessary those additional nodes, and occasionally up to three or four runs were necessary to fully verify and repair all blocks, eliminating all unverified errors, at which point further scrubs found no further errors. It occurred to me as I write this, that the problem I saw and you have confirmed with testing and now reported, may actually be related to some interaction between these unverified errors and compressed blocks. Anyway, as it happens, my / filesystem is normally mounted ro except during updates and by the end I was scrubbing after updates, and even after extended power-downs, so it generally had only a few errors. But /home (on an entirely separate filesystem, but a filesystem still on a pair of partitions, one on each of the same two ssds) would often have more, and because I have a particular program that I start with my X and KDE session that reads a bunch of files into cache as it starts up, I had a systemd service configured to start at boot and cat all the files in that particular directory to /dev/null, thus caching