Re: checksum error in metadata node - best way to move root fs to new drive?
On 2016-08-12 11:06, Duncan wrote: Austin S. Hemmelgarn posted on Fri, 12 Aug 2016 08:04:42 -0400 as excerpted: On a file server? No, I'd ensure proper physical security is established and make sure it's properly secured against network based attacks and then not worry about it. Unless you have things you want to hide from law enforcement or your government (which may or may not be legal where you live) or can reasonably expect someone to steal the system, you almost certainly don't actually need whole disk encryption. There are two specific exceptions to this though: 1. If your employer requires encryption on this system, that's their call. 2. Encrypted swap is a good thing regardless, because it prevents security credentials from accidentally being written unencrypted to persistent storage. In the US, medical records are pretty well protected under penalty of law (HIPPA, IIRC?). Anyone storing medical records here would do well to have full filesystem encryption for that reason. Of course financial records are sensitive as well, or even just forum login information, and then there's the various industrial spies from various countries (China being the one most frequently named) that would pay good money for unencrypted devices from the right sources. Medical and even financial records really fall under my first exception, but it's still no substitute for proper physical security. As far as user account information, that depends on what your legal or PR department promised, but in many cases there, there's minimal improvement in security when using full disk encryption in place of just encrypting the database file used to store the information. In either case though, it's still a better investment in terms of both time and money to properly secure the network and physical access to the hardware. All that disk encryption protects is data at rest, and for a _server_ system, the data is almost always online, and therefore lack of protection of the system as a whole is usually more of a security issue in general than lack of protection for a single disk that's powered off. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Fri, Aug 12, 2016 at 6:04 AM, Austin S. Hemmelgarn wrote: > On 2016-08-11 16:23, Dave T wrote: >> 5. Would most of you guys use btrfs + dm-crypt on a production file >> server (with spinning disks in JBOD configuration -- i.e., no RAID). >> In this situation, the data is very important, of course. My past >> experience indicated that RAID only improves uptime, which is not so >> critical in our environment. Our main criteria is that we should never >> ever have data loss. As far as I understand it, we do have to use >> encryption. > > On a file server? No, I'd ensure proper physical security is established > and make sure it's properly secured against network based attacks and then > not worry about it. Unless you have things you want to hide from law > enforcement or your government (which may or may not be legal where you > live) or can reasonably expect someone to steal the system, you almost > certainly don't actually need whole disk encryption. Sure but then you need a fairly strict handling policy for those drives when they leave the environment: e.g. for an RMA if the drive dies under warranty, or when the drive is being retired. First there's the actual physical handling (even interception) and accounting of all of the drives, which has to be rather strict. And second, the fallback to wiping the drive if it's dead must be physical destruction. For any data not worth physically destroying the drive for proper disposal, you can probably forego full disk encryption. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On 10 August 2016 at 23:21, Chris Murphy wrote: > > I'm using LUKS, aes xts-plain64, on six devices. One is using mixed-bg > single device. One is dsingle mdup. And then 2x2 mraid1 draid1. I've > had zero problems. The two computers these run on do have aesni > support. Aging wise, they're all at least a year old. But I've been > using Btrfs on LUKS for much longer than that. FWIW: I've had 5 spinning disks with LUKS + Btrfs raid1 for 1,5 years. Also xts-plain64 with AES-NI acceleration. No problems so far. Not using Btrfs compression. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Austin S. Hemmelgarn posted on Fri, 12 Aug 2016 08:04:42 -0400 as excerpted: > On a file server? No, I'd ensure proper physical security is > established and make sure it's properly secured against network based > attacks and then not worry about it. Unless you have things you want to > hide from law enforcement or your government (which may or may not be > legal where you live) or can reasonably expect someone to steal the > system, you almost certainly don't actually need whole disk encryption. > There are two specific exceptions to this though: > 1. If your employer requires encryption on this system, that's their > call. > 2. Encrypted swap is a good thing regardless, because it prevents > security credentials from accidentally being written unencrypted to > persistent storage. In the US, medical records are pretty well protected under penalty of law (HIPPA, IIRC?). Anyone storing medical records here would do well to have full filesystem encryption for that reason. Of course financial records are sensitive as well, or even just forum login information, and then there's the various industrial spies from various countries (China being the one most frequently named) that would pay good money for unencrypted devices from the right sources. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On 2016-08-11 16:23, Dave T wrote: What I have gathered so far is the following: 1. my RAM is not faulty and I feel comfortable ruling out a memory error as having anything to do with the reported problem. 2. my storage device does not seem to be faulty. I have not figured out how to do more definitive testing, but smartctl reports it as healthy. Is this just based on smartctl -H, or is it based on looking at all the info available from smartctl? Based on everything you've said so far, it sounds to me like there was a group of uncorrectable errors on the disk, and the sectors in question have now been remapped by the device's firmware. Such a situation is actually more common than people think (this is part of the whole 'reinstall to speed up your system' mentality in the Windows world). I've actually had this happen before (and correlated the occurrences with spikes in readings from the data-logging Geiger counter I have next to my home server). Most disks don't start to report as failing until they get into pretty bad condition (on most hard drives, it takes a pretty insanely large count of reallocated sectors to mark the disk as failed in the drive firmware, and on SSD's you pretty much have to run it out of spare blocks (which takes a _long_ time on many SSD's)). 3. this problem first happened on a normally running system in light use. It had not recently crashed. But the root fs went read-only for an unknown reason. 4. the aftermath of the initial problem may have been exacerbated by hard resetting the system, but that's only a guess The compression-related problem is this: Btrfs is considerably less tolerant of checksum-related errors on btrfs-compressed data I'm an unsophisticated user. The argument in support of this statement sounds convincing to me. Therefore, I think I should discontinue using compression. Anyone disagree? Is there anything else I should change? (Do I need to provide additional information?) What can I do to find out more about what caused the initial problem. I have heard memory errors mentioned, but that's apparently not the case here. I have heard crash recovery mentioned, but that isn't how my problem initially happened. I also have a few general questions: 1. Can one discontinue using the compress mount option if it has been used previously? What happens to existing data if the compress mount option is 1) added when it wasn't used before, or 2) dropped when it had been used. Yes, it just affects newly written data. If you want to convert existing data to be uncompressed, you'll need to run 'btrfs filesystem defrag -r ' on the filesystem to convert things. 2. I understand that the compress option generally improves btrfs performance (via Phoronix article I read in the past; I don't find the link). Since encryption has some characteristics in common with compression, would one expect any decrease in performance from dropping compression when using btrfs on dm-crypt? (For more context, with an i7 6700K which has aes-ni, CPU performance should not be a bottleneck on my computer.) I would expect a change in performance in that case, but not necessarily a decrease. The biggest advantage of compression is that it trades time spent using the disk for time spent using the CPU. In many cases, this is a favorable trade-off when your storage is slower than your memory (because memory speed is really the big limiting factor here, not processor speed). In your case, the encryption is hardware accelerated, but the compression isn't, so you should in theory actually get better performance by turning off compression. 3. How do I find out if it is appropriate to use dup metadata on a Samsung 950 Pro NVMe drive? I don't see deduplication mentioned in the drive's datasheet: http://www.samsung.com/semiconductor/minisite/ssd/downloads/document/Samsung_SSD_950_PRO_Data_Sheet_Rev_1_2.pdf Whether or not it does deduplication is hard to answer. If it does, then you obviously should avoid dup metadata. If it doesn't, then it's a complex question as to whether or not to use dup metadata. The short explanation for why is that the SSD firmware maintains a somewhat arbitrary mapping between LBA's and actual location of the data in flash, and it tends to group writes from around the same time together in the flash itself. The argument against dup on SSD's in general takes this into account, arguing that because the data is likely to be in the same erase block for both copies, it's not as well protected. Personally, I run dup on non-deduplicationg SSD's anyway, because I don't trust higher layers to not potentially mess up one of the copies, and I still get better performance than most hard disks. 4. Given that my drive is not reporting problems, does it seem reasonable to re-use this drive after the errors I reported? If so, how should I do that? Can I simply make a new btrfs filesystem and copy my data back? Should I start at a lower lev
Re: checksum error in metadata node - best way to move root fs to new drive?
On Thu, Aug 11, 2016 at 04:23:45PM -0400, Dave T wrote: > 1. Can one discontinue using the compress mount option if it has been > used previously? The mount option applies only to newly written blocks, and even then only to files that don't say otherwise (via chattr +c or +C, btrfs property, etc). You can change it on the fly (mount -o remount,...), etc. > What happens to existing data if the compress mount option is 1) added > when it wasn't used before, or 2) dropped when it had been used. That data stays compressed or uncompressed, as when it was written. You can defrag them to change that; balance moves extents without changing their compression. > 2. I understand that the compress option generally improves btrfs > performance (via Phoronix article I read in the past; I don't find the > link). Since encryption has some characteristics in common with > compression, would one expect any decrease in performance from > dropping compression when using btrfs on dm-crypt? (For more context, > with an i7 6700K which has aes-ni, CPU performance should not be a > bottleneck on my computer.) As said elsewhere, compression can drastically help or reduce performance, this depends on your CPU-to-IO ratio, and to whether you do small random writes inside files (compress has to rewrite a whole 128KB block). An extreme data point: Odroid-U2 on eMMC doing Debian archive rebuilds, compression improves overall throughput by a factor of around two! On the other hand, this same task on typical machines tends to be CPU bound. -- An imaginary friend squared is a real enemy. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Dave T posted on Thu, 11 Aug 2016 16:23:45 -0400 as excerpted: > I also have a few general questions: > > 1. Can one discontinue using the compress mount option if it has been > used previously? What happens to existing data if the compress mount > option is 1) added when it wasn't used before, or 2) dropped when it had > been used. The compress mount option only affects newly written data. Data that was previously written is automatically decompressed into memory on read, regardless of whether the compress option is still being used or not. So you can freely switch between using the option and not, and it'll only affect newly written files. Existing files stay written the way they are, unless you do something (like run a recursive defrag with the compress option) to rewrite them. > 2. I understand that the compress option generally improves btrfs > performance (via Phoronix article I read in the past; I don't find the > link). Since encryption has some characteristics in common with > compression, would one expect any decrease in performance from dropping > compression when using btrfs on dm-crypt? (For more context, > with an i7 6700K which has aes-ni, CPU performance should not be a > bottleneck on my computer.) Compression performance works like this (this is a general rule, not btrfs specific): Compression uses more CPU cycles but results in less data to actually transfer to and from storage. If your disks are slow and your CPU is fast (or if the CPU can use hardware accelerated compression functions), performance will tend to favor compression, because the bottleneck will be the actual data transfer to and from storage and the extra overhead of the CPU cycles won't normally matter while the effect of less data to actually transfer, due to the compression, will. But the slower the CPU (and lack of hardware accelerated compression functions) is and the faster storage IO is, the less of a bottleneck the actual data transfer will be, and thus the more likely it will be that the CPU will become the bottleneck, particularly as the compression gets more efficient size-wise, which generally translates to requiring more CPU cycles and/or memory to handle it. Since your storage is PCIE-3.0 @ > 1 GiB/sec, extremely fast, even tho LZO compression is considered fast (as opposed to size-efficient) as well, you may actually see /better/ performance without compression, especially when running CPU-heavy workloads where the extra CPU cycles of compression will matter as the CPU is already the bottleneck. Since you're doing encryption also, and that too tends to be CPU intensive (even if it's hardware accelerated for you), I'd actually be a bit surprised if you didn't see an increase of performance without compression, because your storage /is/ so incredibly fast compared to conventional storage. But of course if it's really a concern, there's nothing like actually benchmarking it yourself to see. =:^) But I'd be very surprised if you actually notice a slowdown, turning compression off. You might not notice a performance boost either, but I'd be surprised if you notice a slowdown, tho some artificial benchmarks might show one if they aren't balancing CPU and IO in something like real-world. > 3. How do I find out if it is appropriate to use dup metadata on a > Samsung 950 Pro NVMe drive? I don't see deduplication mentioned in the > drive's datasheet: > http://www.samsung.com/semiconductor/minisite/ssd/downloads/document/ Samsung_SSD_950_PRO_Data_Sheet_Rev_1_2.pdf I'd google the controller. A lot of them will list either compression and dedup as features as they enhance performance in some cases, or the stability of constant performance as a feature, as mine, targeted at the server market, did. If the emphasis is on constant performance and what- you-see-is-what-you-get storage capacity, then they're not doing compression and dedup, as that can increase performance and storage capacity under certain conditions, but it's very unpredictable as it depends on how much duplication the data has and how compressible it is. Sandforce controllers, in particular, are known to emphasize compression and dedup. OTOH, controllers targeted at enterprise or servers are likely to emphasize stability and predictability and thus not do transparent compression or dedup. > 4. Given that my drive is not reporting problems, does it seem > reasonable to re-use this drive after the errors I reported? If so, > how should I do that? Can I simply make a new btrfs filesystem and copy > my data back? Should I start at a lower level and re-do the dm-crypt > layer? I'd reuse it here. For hardware that supports/needs trim I'd start at the bottom layer and work up, but IIRC you said yours doesn't need it, and by the time you get to the btrfs layer on top of the crypt layer, the hardware layer should be scrambled zeros and ones in any case, so if it's true your hardware doesn't need it
Re: checksum error in metadata node - best way to move root fs to new drive?
On Thu, Aug 11, 2016 at 9:11 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Chris Murphy posted on Thu, 11 Aug 2016 14:43:56 -0600 as excerpted: > >> On Thu, Aug 11, 2016 at 1:07 PM, Duncan <1i5t5.dun...@cox.net> wrote: >>> The compression-related problem is this: Btrfs is considerably less >>> tolerant of checksum-related errors on btrfs-compressed data, >> >> Why? The data is the data. And why would it matter if it's application >> compressed data vs Btrfs compressed data? If there's an error, Btrfs is >> intolerant. I don't see how there's a checksum error that Btrfs >> tolerates. > > Apparently, the code path for compressed data is sufficiently different, > that when there's a burst of checksum errors, even on raid1 where it > should (and does with scrub) get the correct second copy, it will crash > the system. Ahh OK, gotcha. > This is my experience and that of others, and what I thought > was standard btrfs behavior -- I didn't know it was a compression- > specific bug since I use compress on all my btrfs, until someone told me. > > When the btrfs compression option hasn't been used on that filesystem, or > presumably when none of that burst of checksum errors is from btrfs- > compressed files, it will grab the second copy and use it as it should, > and there will be no crash. This is as reported by others, including > people who have tested both with and without btrfs-compressed files and > found that it only crashed if the files were btrfs-compressed, whereas it > worked as expected, fetching the valid second copy, if they weren't btrfs- > compressed. OK so something's broken. > > As I'm not a coder I can't actually tell you from reading the code, but > AFAIK, both the 128 KiB compression block size and the checksum are on > the uncompressed data. Compression takes place after checksumming. > > And I don't believe metadata, whether metadata itself or inline data, is > compressed by btrfs' transparent compression. Inline data is definitely compressed. >From ls -li 263 -rw-r-. 1 root root 3270 Aug 11 21:29 samsung840-256g-hdparm.txt >From btrfs-debug-tree item 84 key (263 INODE_ITEM 0) itemoff 7618 itemsize 160 inode generation 7 transid 7 size 3270 nbytes 3270 block group 0 mode 100640 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 85 key (263 INODE_REF 256) itemoff 7582 itemsize 36 inode ref index 8 namelen 26 name: samsung840-256g-hdparm.txt item 86 key (263 XATTR_ITEM 3817753667) itemoff 7499 itemsize 83 location key (0 UNKNOWN.0 0) type XATTR namelen 16 datalen 37 name: security.selinux data unconfined_u:object_r:unlabeled_t:s0 item 87 key (263 EXTENT_DATA 0) itemoff 5860 itemsize 1639 inline extent data size 1618 ram 3270 compress(zlib) Curiously though, these same small text files once above a certain size (?) are not compressed if they aren't inline extents. 278 -rw-r-. 1 root root 11767 Aug 11 21:29 WDCblack-750g-smartctlx_2.txt item 48 key (278 INODE_ITEM 0) itemoff 7675 itemsize 160 inode generation 7 transid 7 size 11767 nbytes 12288 block group 0 mode 100640 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 49 key (278 INODE_REF 256) itemoff 7636 itemsize 39 inode ref index 23 namelen 29 name: WDCblack-750g-smartctlx_2.txt item 50 key (278 XATTR_ITEM 3817753667) itemoff 7553 itemsize 83 location key (0 UNKNOWN.0 0) type XATTR namelen 16 datalen 37 name: security.selinux data unconfined_u:object_r:unlabeled_t:s0 item 51 key (278 EXTENT_DATA 0) itemoff 7500 itemsize 53 extent data disk byte 12939264 nr 4096 extent data offset 0 nr 12288 ram 12288 extent compression(zlib) Hrrmm. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Chris Murphy posted on Thu, 11 Aug 2016 14:43:56 -0600 as excerpted: > On Thu, Aug 11, 2016 at 1:07 PM, Duncan <1i5t5.dun...@cox.net> wrote: >> The compression-related problem is this: Btrfs is considerably less >> tolerant of checksum-related errors on btrfs-compressed data, > > Why? The data is the data. And why would it matter if it's application > compressed data vs Btrfs compressed data? If there's an error, Btrfs is > intolerant. I don't see how there's a checksum error that Btrfs > tolerates. Apparently, the code path for compressed data is sufficiently different, that when there's a burst of checksum errors, even on raid1 where it should (and does with scrub) get the correct second copy, it will crash the system. This is my experience and that of others, and what I thought was standard btrfs behavior -- I didn't know it was a compression- specific bug since I use compress on all my btrfs, until someone told me. When the btrfs compression option hasn't been used on that filesystem, or presumably when none of that burst of checksum errors is from btrfs- compressed files, it will grab the second copy and use it as it should, and there will be no crash. This is as reported by others, including people who have tested both with and without btrfs-compressed files and found that it only crashed if the files were btrfs-compressed, whereas it worked as expected, fetching the valid second copy, if they weren't btrfs- compressed. I'd assume this is why this particular bug has remained unsquashed for so long. The devs are likely testing compression, and bad checksum data repair from the second copy, but they probably aren't testing bad checksum repair on compressed data, so the problem isn't showing up in their tests. Between that and relatively few people running raid1 with the compression option and seeing enough bad shutdowns to be aware of the problem, it has mostly flown under the radar. For a long time I myself thought it was just the way btrfs behaved with bursts of checksum errors, until someone pointed out that it did /not/ behave that way on btrfs that didn't have any compressed files when the checksum errors occurred. > But also I don't know if the checksum is predicated on compressed data > or uncompressed data - does the scrub blindly read compressed data, > checksums it, and compares to the previously recorded csum? Or does the > scrub read compressed data, decompresses it, checksums it, then > compares? And does compression compress metadata? I don't think it does > from some of the squashfs testing of the same set of binary files on > ext4 vs btrfs uncompressed vs btrfs compressed. The difference is > explained by inline data being compressed (which it is), so I don't > think the fs itself gets compressed. As I'm not a coder I can't actually tell you from reading the code, but AFAIK, both the 128 KiB compression block size and the checksum are on the uncompressed data. Compression takes place after checksumming. And I don't believe metadata, whether metadata itself or inline data, is compressed by btrfs' transparent compression. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Thu, Aug 11, 2016 at 1:07 PM, Duncan <1i5t5.dun...@cox.net> wrote: > The compression-related problem is this: Btrfs is considerably less > tolerant of checksum-related errors on btrfs-compressed data, Why? The data is the data. And why would it matter if it's application compressed data vs Btrfs compressed data? If there's an error, Btrfs is intolerant. I don't see how there's a checksum error that Btrfs tolerates. But also I don't know if the checksum is predicated on compressed data or uncompressed data - does the scrub blindly read compressed data, checksums it, and compares to the previously recorded csum? Or does the scrub read compressed data, decompresses it, checksums it, then compares? And does compression compress metadata? I don't think it does from some of the squashfs testing of the same set of binary files on ext4 vs btrfs uncompressed vs btrfs compressed. The difference is explained by inline data being compressed (which it is), so I don't think the fs itself gets compressed. Chris Murphy -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Thu, Aug 11, 2016 at 8:12 AM, Nicholas D Steeves wrote: > > Chris, do you use compress=lzo? SSDs or rotational disks? No compression, SSD and HDD. The stuff I care about is on dmcrypt (LUKS) for some time. Stuff I sorta care about on plain partitions. Stuff I don't care much about are either on LVM LV's (usually thinp), or qcow2. I have used compression for periods measured in months not years, both zlib and lzo, on both SSD and HDD, to no ill effect. But it's true some of the more abrupt and worse damaged file systems did use compress=lzo. Since lzo is faster and only a bit less better compression than zlib, it may be more people choose lzo and that's why it turns out if there's a problem with compression it happens to be lzo, coincidence rather than causation. I'm not even sure there's enough information to have correlation. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
What I have gathered so far is the following: 1. my RAM is not faulty and I feel comfortable ruling out a memory error as having anything to do with the reported problem. 2. my storage device does not seem to be faulty. I have not figured out how to do more definitive testing, but smartctl reports it as healthy. 3. this problem first happened on a normally running system in light use. It had not recently crashed. But the root fs went read-only for an unknown reason. 4. the aftermath of the initial problem may have been exacerbated by hard resetting the system, but that's only a guess > The compression-related problem is this: Btrfs is considerably less tolerant > of checksum-related errors on btrfs-compressed data I'm an unsophisticated user. The argument in support of this statement sounds convincing to me. Therefore, I think I should discontinue using compression. Anyone disagree? Is there anything else I should change? (Do I need to provide additional information?) What can I do to find out more about what caused the initial problem. I have heard memory errors mentioned, but that's apparently not the case here. I have heard crash recovery mentioned, but that isn't how my problem initially happened. I also have a few general questions: 1. Can one discontinue using the compress mount option if it has been used previously? What happens to existing data if the compress mount option is 1) added when it wasn't used before, or 2) dropped when it had been used. 2. I understand that the compress option generally improves btrfs performance (via Phoronix article I read in the past; I don't find the link). Since encryption has some characteristics in common with compression, would one expect any decrease in performance from dropping compression when using btrfs on dm-crypt? (For more context, with an i7 6700K which has aes-ni, CPU performance should not be a bottleneck on my computer.) 3. How do I find out if it is appropriate to use dup metadata on a Samsung 950 Pro NVMe drive? I don't see deduplication mentioned in the drive's datasheet: http://www.samsung.com/semiconductor/minisite/ssd/downloads/document/Samsung_SSD_950_PRO_Data_Sheet_Rev_1_2.pdf 4. Given that my drive is not reporting problems, does it seem reasonable to re-use this drive after the errors I reported? If so, how should I do that? Can I simply make a new btrfs filesystem and copy my data back? Should I start at a lower level and re-do the dm-crypt layer? 5. Would most of you guys use btrfs + dm-crypt on a production file server (with spinning disks in JBOD configuration -- i.e., no RAID). In this situation, the data is very important, of course. My past experience indicated that RAID only improves uptime, which is not so critical in our environment. Our main criteria is that we should never ever have data loss. As far as I understand it, we do have to use encryption. Thanks for the discussion so far. It's very educational for me. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Nicholas D Steeves posted on Thu, 11 Aug 2016 10:12:04 -0400 as excerpted: > Why is the combination of dm-crypt|luks+btrfs+compress=lzo as overlooked > as a potential cause? Other than the "raid56 ate my data" I've noticed > a bunch of "luks+btrfs+compress=lzo ate my data" threads. My usage is btrfs on physical device (well, on GPT partitions on the physical device), no encryption, and it's mostly raid1 on paired devices, but there's definitely one kink that compress=lzo (and I believe compression in general, including gzip) adds, and it's possible running it on encryption compounds the issue. The compression-related problem is this: Btrfs is considerably less tolerant of checksum-related errors on btrfs-compressed data, and while on uncompressed btrfs raid1 it will recover from the second copy where possible and continue, on files that btrfs has compressed, if there are enough checksum errors, for example in a hard-shutdown situation where one of the raid1 devices had the updates written but it crashed while writing the other, btrfs will crash instead of simply falling back to the good copy. This is known to be specific to compression; uncompressed btrfs recover as intended from the second copy. And it's known to occur only when there's too many checksum errors in a burst -- the filesystem apparently deals correctly with just a few at a time. This problem has been ongoing for years -- I thought it was just the way btrfs worked until someone mentioned that it didn't behave that way without compression -- and it reasonably regularly prevents a smooth reboot here after a crash. In my case I have the system btrfs running read-only by default, so it's not damaged. However, /home and /var/log are of course mounted writable, and that's where the problems come in. If I start in (I believe) rescue mode (it's that or emergency, the other won't do the mounts and won't let me do them manually either, as it thinks a dependency is missing), systemd will do the mounts but not start the (permanent) logging or the services that need to routinely write stuff that I have symlinked into /home/var/whatever so they can write with a read-only root and system partition, I can then scrub the mounted home and log partitions to fix the checksum errors due to one device having the update while the other doesn't, and continue booting normally. However, if I try directly booting normally, the system invariably crashes due to too many checksum errors, even when it /should/ simply read the other copy, which is fine as demonstrated by the fact that scrub can use it to fix the errors on the device triggering the checksum errors. This continued to happen with 4.6. I'm on 4.7 now but am not sure I've crashed with it and thus can't say for sure whether the problem is fixed there. However, I doubt it, as the problem has been there apparently since the compression and raid1 features were introduced, and I didn't see anything mentioning a fix for the issue in the patches going by on the list. The problem is most obvious and reproducible in btrfs raid1 mode, since there, one device /can/ be behind the other, and scrub /can/ be demonstrated to fix it so it's obviously a checksum issue, but I'd imagine if enough checksum mismatches happen on a single device in single mode, it would crash as well, and of course then there's no second copy for scrub to fix the bad copy from, so it would simply show up as a btrfs that can mount but with significant corruption issues that will crash the system if an attempt to read the affected blocks reads too many at a time. And to whatever possible extent an encryption layer between the physical device and btrfs results in possible additional corruption in the event of a crash or hard shutdown, it could easily compound an already bad situation. Meanwhile, /if/ that does turn out to be the root issue here, then finally fixing the btrfs compression related problem where a large burst of checksum failures crashes the system, even when there provably exists a second valid copy, but where this only happens with compression, should go quite far in stabilizing btrfs on encrypted underlayers. I know I certainly wouldn't object to the problem being fixed. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On 2016-08-11 10:12, Nicholas D Steeves wrote: Why is the combination of dm-crypt|luks+btrfs+compress=lzo as overlooked as a potential cause? Other than the "raid56 ate my data" I've noticed a bunch of "luks+btrfs+compress=lzo ate my data" threads. I haven't personally seen one of those in at least a few months. In general, BTRFS is moving fast enough that reports older than a kernel release cycle are generally out of date unless something confirms otherwise, but I do distinctly recall such issues being commonly reported in the past. On 10 August 2016 at 15:46, Austin S. Hemmelgarn wrote: As far as dm-crypt goes, it looks like BTRFS is stable on top in the configuration I use (aex-xts-plain64 with a long key using plain dm-crypt instead of LUKS). I have heard rumors of issues when using LUKS without hardware acceleration, but I've never seen any conclusive proof, and what little I've heard sounds more like it was just race conditions elsewhere causing the issues. Austin, I'm very curious if they were also using compress=lzo, because my informal hypothesis is that the encryption+btrfs+compress=lzo combination precipitates these issues. Maybe the combo is more likely to trigger these race conditions? It might also be neat to mine the archive to see these seem to be more likely to occur with fast SSDs vs slow rotational disks. Do you use compress=lzo? In my case, I've tested on both SSD's (both cheap low-end ones and good Intel and Crucial ones) and traditional hard drives, with and without compression (both zlib and lzo), and with a couple of different encryption algorithms (AES, Blowfish, and Threefish). In my case It's only on plain dm-crypt, not LUKS, but I doubt that particular point will make much difference. The last test I did was when the merge window for 4.6 closed run as part of the regular regression testing I do, and I'll be doing another one in the near future. I think the last time I saw any issues with this in my testing was prior to 4.0, but I don't remember for sure (most of what I care about is comparison to the previous version, so i don't keep much in the way of records of specific things). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Why is the combination of dm-crypt|luks+btrfs+compress=lzo as overlooked as a potential cause? Other than the "raid56 ate my data" I've noticed a bunch of "luks+btrfs+compress=lzo ate my data" threads. On 10 August 2016 at 15:46, Austin S. Hemmelgarn wrote: > > As far as dm-crypt goes, it looks like BTRFS is stable on top in the > configuration I use (aex-xts-plain64 with a long key using plain dm-crypt > instead of LUKS). I have heard rumors of issues when using LUKS without > hardware acceleration, but I've never seen any conclusive proof, and what > little I've heard sounds more like it was just race conditions elsewhere > causing the issues. > Austin, I'm very curious if they were also using compress=lzo, because my informal hypothesis is that the encryption+btrfs+compress=lzo combination precipitates these issues. Maybe the combo is more likely to trigger these race conditions? It might also be neat to mine the archive to see these seem to be more likely to occur with fast SSDs vs slow rotational disks. Do you use compress=lzo? On 10 August 2016 at 18:52, Dave T wrote: > On Wed, Aug 10, 2016 at 5:15 PM, Chris Murphy wrote: > >> 1. Report 'btrfs check' without --repair, let's see what it complains >> about and if it might be able to plausibly fix this. > > First, a small part of the dmesg output: > > [ 172.772283] Btrfs loaded > [ 172.772632] BTRFS: device label top_level devid 1 transid 103495 /dev/dm-0 > [ 274.320762] BTRFS info (device dm-0): use lzo compression Compress=lzo confirmed. Corruption occurred on an SSD. On 10 August 2016 at 17:21, Chris Murphy wrote: > I'm using LUKS, aes xts-plain64, on six devices. One is using mixed-bg > single device. One is dsingle mdup. And then 2x2 mraid1 draid1. I've > had zero problems. The two computers these run on do have aesni > support. Aging wise, they're all at least a year old. But I've been > using Btrfs on LUKS for much longer than that. > Chris, do you use compress=lzo? SSDs or rotational disks? If a bunch of people are using this combo without issue, I'll drop the informal hypothesis as "just a suspicion informed by sloppy pattern recognition" ;-) Thank you! Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Gareth Pye posted on Thu, 11 Aug 2016 15:06:48 +1000 as excerpted: > Is there some simple muddling of meta data that could be done to force > dup meta data on deduping SSDs? Like a simple 'random' byte repeated > often enough it would defeat any sane dedup? I know it would waste data > but clearly that is considered worth it with dup metadata (what is the > difference between 50% metadata efficiency and 45%?) Well, the FTLs are mostly proprietary, AFAIK, so it's probably hard to prove the "force", but given the 512-byte sector standard (some are a multiple of that these days but 512 should be the minimum), in theory one random byte out of every 512 should do it... unless the compression these deduping FTLs generally run as well catches that difference and compresses it out to a different location where it can be compactly stored, allowing multiple copies of the same 512-byte sector to be stored in a single sector, so long as they only had a single byte or two different. So it could probably be done, but given that the deduping and compression features of these ssds are listed as just that, features, and that people buy them for that, it may be that it's better to simply leave well enough alone. Folks who want dup metadata can set it, and if they haven't bought one of these ssds with dedup as a feature, they can be reasonably sure it'll be set. And people who don't care will simply get the defaults and can live with them the same way that people that don't care generally live with defaults that may or may not be the absolute best case for them, but are generally at least not horrible. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Thu, Aug 11, 2016 at 1:23 AM, Chris Murphy wrote: > On Wed, Aug 10, 2016 at 4:01 PM, Dave T wrote: > >> I will be very disappointed if I cannot use btrfs + dm-crypt. As far >> as I can see, there is no alternative given that I need to use >> snapshots (and LVM, as good as it is, has severe performance penalties >> for its snapshots). > > See LVM thin provisioning snapshots. I haven't benchmarked it, but > it's a night and day difference from conventional (thick) snapshots. > The gotchas are currently there's no raid support, and the snapshots > are whole volume. So each snapshot appears as a volume with the same > UUID as the original, and by default they're not active. So for me > it's a bit of a head scratcher what happens when mounting a snapshot > concurrent with another. For Btrfs this ends badly. For XFS it refuses > unless using nouuid, but still seems capable of writing to the two > volumes without causing problems. > XFS now allows changing UUID, as do LVM and MD. We can also change btrfs UUID using "btrfstune -u", but I wonder if there is any way to change device UUID in this case. One problem is that even before you come around doing it various udev rules kick in and create links to wrong instance overwriting previous ones; and I'm not sure either xfs_admin or btrfstune trigger change event. So we may end up with stale completely wrong links. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Is there some simple muddling of meta data that could be done to force dup meta data on deduping SSDs? Like a simple 'random' byte repeated often enough it would defeat any sane dedup? I know it would waste data but clearly that is considered worth it with dup metadata (what is the difference between 50% metadata efficiency and 45%?) On Thu, Aug 11, 2016 at 2:50 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Dave T posted on Wed, 10 Aug 2016 18:01:44 -0400 as excerpted: > >> Does anyone have any thoughts about using dup mode for metadata on a >> Samsung 950 Pro (or any NVMe drive)? > > The biggest problem with dup on ssds is that some ssds (particularly the > ones with the sandforce controllers) do dedup, so you'd be having btrfs > do dup while the filesystem dedups, to no effect except more cpu and > device processing! > > (The other argument for single on ssd that I've seen is that because the > FTL ultimately places the data, and because both copies are written at > the same time, there's a good chance that the FTL will write them into > the same erase block and area, and a defect in one will likely be a > defect in the other as well. That may or may not be, I'm not qualified > to say, but as explained below, I do choose to take my chances on that > and thus do run dup on ssd.) > > So as long as the SSD doesn't have a deduping FTL, I'd suggest dup for > metadata on ssd does make sense. Data... not so sure on, but certainly > metadata, because one bad block of metadata can be many messed up files. > > On my ssds here, which I know don't do dedup, most of my btrfs are raid1 > on the pair of ssds. However, /boot is different since I can't really > point grub at two different /boots, so I have my working /boot on one > device, with the backup /boot on the other, and the grub on each one > pointed at its respective /boot, so I can select working or backup /boot > from the BIOS and it'll just work. Since /boot is so small, it's mixed- > mode chunks, meaning data and metadata are mixed together and the > redundancy mode applies to both at once instead of each separately. And > I chose dup, so it's dup for both data and metadata. > > Works fine, dup for both data and metadata on non-deduping ssds, but of > course that means data takes double the space since there's two copies of > it, and that gets kind of expensive on ssd, if it's more than the > fraction of a GiB that's /boot. > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gareth Pye - blog.cerberos.id.au Level 2 MTG Judge, Melbourne, Australia -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Dave T posted on Wed, 10 Aug 2016 18:01:44 -0400 as excerpted: > Does anyone have any thoughts about using dup mode for metadata on a > Samsung 950 Pro (or any NVMe drive)? The biggest problem with dup on ssds is that some ssds (particularly the ones with the sandforce controllers) do dedup, so you'd be having btrfs do dup while the filesystem dedups, to no effect except more cpu and device processing! (The other argument for single on ssd that I've seen is that because the FTL ultimately places the data, and because both copies are written at the same time, there's a good chance that the FTL will write them into the same erase block and area, and a defect in one will likely be a defect in the other as well. That may or may not be, I'm not qualified to say, but as explained below, I do choose to take my chances on that and thus do run dup on ssd.) So as long as the SSD doesn't have a deduping FTL, I'd suggest dup for metadata on ssd does make sense. Data... not so sure on, but certainly metadata, because one bad block of metadata can be many messed up files. On my ssds here, which I know don't do dedup, most of my btrfs are raid1 on the pair of ssds. However, /boot is different since I can't really point grub at two different /boots, so I have my working /boot on one device, with the backup /boot on the other, and the grub on each one pointed at its respective /boot, so I can select working or backup /boot from the BIOS and it'll just work. Since /boot is so small, it's mixed- mode chunks, meaning data and metadata are mixed together and the redundancy mode applies to both at once instead of each separately. And I chose dup, so it's dup for both data and metadata. Works fine, dup for both data and metadata on non-deduping ssds, but of course that means data takes double the space since there's two copies of it, and that gets kind of expensive on ssd, if it's more than the fraction of a GiB that's /boot. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Apologies. I have to make a correction to the message I just sent. Disregard that message and use this one: On Wed, Aug 10, 2016 at 5:15 PM, Chris Murphy wrote: > 1. Report 'btrfs check' without --repair, let's see what it complains > about and if it might be able to plausibly fix this. First, a small part of the dmesg output: [ 172.772283] Btrfs loaded [ 172.772632] BTRFS: device label top_level devid 1 transid 103495 /dev/dm-0 [ 274.320762] BTRFS info (device dm-0): use lzo compression [ 274.320764] BTRFS info (device dm-0): disk space caching is enabled [ 274.320764] BTRFS: has skinny extents [ 274.322555] BTRFS info (device dm-0): bdev /dev/mapper/cryptroot errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 274.329965] BTRFS: detected SSD devices, enabling SSD mode Now, full output of btrfs check without repair option. checking extents bad metadata [292414541824, 292414558208) crossing stripe boundary bad metadata [292414607360, 292414623744) crossing stripe boundary bad metadata [292414672896, 292414689280) crossing stripe boundary bad metadata [292414738432, 292414754816) crossing stripe boundary bad metadata [292415787008, 292415803392) crossing stripe boundary bad metadata [292415918080, 292415934464) crossing stripe boundary bad metadata [292416376832, 292416393216) crossing stripe boundary bad metadata [292418015232, 292418031616) crossing stripe boundary bad metadata [292419325952, 292419342336) crossing stripe boundary bad metadata [292419588096, 292419604480) crossing stripe boundary bad metadata [292419915776, 292419932160) crossing stripe boundary bad metadata [292422930432, 292422946816) crossing stripe boundary bad metadata [292423061504, 292423077888) crossing stripe boundary ref mismatch on [292423155712 16384] extent item 1, found 0 Backref 292423155712 root 258 not referenced back 0x2280a20 Incorrect global backref count on 292423155712 found 1 wanted 0 backpointer mismatch on [292423155712 16384] owner ref check failed [292423155712 16384] bad metadata [292423192576, 292423208960) crossing stripe boundary bad metadata [292423323648, 292423340032) crossing stripe boundary bad metadata [292429549568, 292429565952) crossing stripe boundary bad metadata [292439904256, 292439920640) crossing stripe boundary bad metadata [292440297472, 292440313856) crossing stripe boundary bad metadata [292442525696, 292442542080) crossing stripe boundary bad metadata [292443770880, 292443787264) crossing stripe boundary bad metadata [292443967488, 292443983872) crossing stripe boundary bad metadata [292444033024, 292444049408) crossing stripe boundary bad metadata [292444098560, 292444114944) crossing stripe boundary bad metadata [292444164096, 292444180480) crossing stripe boundary bad metadata [292444229632, 292444246016) crossing stripe boundary bad metadata [292444688384, 292444704768) crossing stripe boundary bad metadata [292444884992, 292444901376) crossing stripe boundary bad metadata [292445081600, 292445097984) crossing stripe boundary bad metadata [29244672, 292446736384) crossing stripe boundary bad metadata [292448948224, 292448964608) crossing stripe boundary Error: could not find btree root extent for root 258 Checking filesystem on /dev/mapper/cryptroot -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
see below On Wed, Aug 10, 2016 at 5:15 PM, Chris Murphy wrote: > 1. Report 'btrfs check' without --repair, let's see what it complains > about and if it might be able to plausibly fix this. First, a small part of the dmesg output: [ 172.772283] Btrfs loaded [ 172.772632] BTRFS: device label top_level devid 1 transid 103495 /dev/dm-0 [ 274.320762] BTRFS info (device dm-0): use lzo compression [ 274.320764] BTRFS info (device dm-0): disk space caching is enabled [ 274.320764] BTRFS: has skinny extents [ 274.322555] BTRFS info (device dm-0): bdev /dev/mapper/sysluks errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 274.329965] BTRFS: detected SSD devices, enabling SSD mode Now, full output of btrfs check without repair option. checking extents bad metadata [292414541824, 292414558208) crossing stripe boundary bad metadata [292414607360, 292414623744) crossing stripe boundary bad metadata [292414672896, 292414689280) crossing stripe boundary bad metadata [292414738432, 292414754816) crossing stripe boundary bad metadata [292415787008, 292415803392) crossing stripe boundary bad metadata [292415918080, 292415934464) crossing stripe boundary bad metadata [292416376832, 292416393216) crossing stripe boundary bad metadata [292418015232, 292418031616) crossing stripe boundary bad metadata [292419325952, 292419342336) crossing stripe boundary bad metadata [292419588096, 292419604480) crossing stripe boundary bad metadata [292419915776, 292419932160) crossing stripe boundary bad metadata [292422930432, 292422946816) crossing stripe boundary bad metadata [292423061504, 292423077888) crossing stripe boundary ref mismatch on [292423155712 16384] extent item 1, found 0 Backref 292423155712 root 258 not referenced back 0x2280a20 Incorrect global backref count on 292423155712 found 1 wanted 0 backpointer mismatch on [292423155712 16384] owner ref check failed [292423155712 16384] bad metadata [292423192576, 292423208960) crossing stripe boundary bad metadata [292423323648, 292423340032) crossing stripe boundary bad metadata [292429549568, 292429565952) crossing stripe boundary bad metadata [292439904256, 292439920640) crossing stripe boundary bad metadata [292440297472, 292440313856) crossing stripe boundary bad metadata [292442525696, 292442542080) crossing stripe boundary bad metadata [292443770880, 292443787264) crossing stripe boundary bad metadata [292443967488, 292443983872) crossing stripe boundary bad metadata [292444033024, 292444049408) crossing stripe boundary bad metadata [292444098560, 292444114944) crossing stripe boundary bad metadata [292444164096, 292444180480) crossing stripe boundary bad metadata [292444229632, 292444246016) crossing stripe boundary bad metadata [292444688384, 292444704768) crossing stripe boundary bad metadata [292444884992, 292444901376) crossing stripe boundary bad metadata [292445081600, 292445097984) crossing stripe boundary bad metadata [29244672, 292446736384) crossing stripe boundary bad metadata [292448948224, 292448964608) crossing stripe boundary Error: could not find btree root extent for root 258 Checking filesystem on /dev/mapper/cryptroot UUID: -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Wed, Aug 10, 2016 at 4:01 PM, Dave T wrote: > I will be very disappointed if I cannot use btrfs + dm-crypt. As far > as I can see, there is no alternative given that I need to use > snapshots (and LVM, as good as it is, has severe performance penalties > for its snapshots). See LVM thin provisioning snapshots. I haven't benchmarked it, but it's a night and day difference from conventional (thick) snapshots. The gotchas are currently there's no raid support, and the snapshots are whole volume. So each snapshot appears as a volume with the same UUID as the original, and by default they're not active. So for me it's a bit of a head scratcher what happens when mounting a snapshot concurrent with another. For Btrfs this ends badly. For XFS it refuses unless using nouuid, but still seems capable of writing to the two volumes without causing problems. But yes, I like Btrfs snapshots and refinks better. *shrug* If you find a Btrfs on dmcrypt problem, it's a serious bug, and I think it would get attention very quickly. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
Thanks for all the responses, guys! I really appreciate it. This information is very helpful. I will be working through the suggestions (e.g., check without repair) for the next hour or so. I'll report back when I have something to report. My drive is a Samsung 950 Pro nvme drive, which in most respects is treated like an SSD. (the only difference I am aware of is that trim isn't needed). > But until recently dup mode data on single device was impossible, so I > doubt you were using that, and while dup mode metadata was the normal > default, on ssd that changes to single mode as well. Your assumptions are correct: single mode for data and metadata. Does anyone have any thoughts about using dup mode for metadata on a Samsung 950 Pro (or any NVMe drive)? I will be very disappointed if I cannot use btrfs + dm-crypt. As far as I can see, there is no alternative given that I need to use snapshots (and LVM, as good as it is, has severe performance penalties for its snapshots). I'm required to use crypto. I cannot risk doing without snapshots. Therefore, btrfs + dm-crypt seem like my only viable solution. Plus it is my preferred solution. I like both tools. If all goes well, we are planning to implement a production file server for our office with dm-crypt + btrfs (and a lot fo spinning disks). In the office we currently have another system identical to mine running the same drive with dm-crypt + btrfs, the same operating system, the same nvidia GPU and properitary driver and it is running fine. One difference is that it is overclocked substantially (mine isn't). I would have expected it would give a problem before mine would. But it seems to be rock solid. I just ran btrfs scrub on it and it finished in a few seconds with no errors. On my computer I have run two extensive memory tests (8 cpu cores in parallel, all tests). The current test has been running for 14 hrs with no errors. (I think that 8 cores in parallel make this equivalent to a much longer test with the default single cpu settings.) Therefore, I do not beieve this issue is caused by RAM. I'm hoping there is no configuration error or other mistake I made in setting these systems up that would lead to the problems I'm experiencing. BTW, I was able to copy all the files to another drive with no problem. I used "cp -a" to copy, then I ran "rsync -a" twiice to make sure nothing was missed. My guess is that I'll be able to copy this right back onto the root filesystem after I resolve whatever the problem is and my operating system will be back to the same state it was in prior to this problem. OK, I'm off to try btrfs check without --repair... thanks again! For reference: btrfs-progs v4.6.1 Linux 4.6.4-1-ARCH #1 SMP PREEMPT Mon Jul 11 19:12:32 CEST 2016 x86_64 GNU/Linux On Wed, Aug 10, 2016 at 5:21 PM, Chris Murphy wrote: > I'm using LUKS, aes xts-plain64, on six devices. One is using mixed-bg > single device. One is dsingle mdup. And then 2x2 mraid1 draid1. I've > had zero problems. The two computers these run on do have aesni > support. Aging wise, they're all at least a year old. But I've been > using Btrfs on LUKS for much longer than that. > > > Chris Murphy > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On Tue, Aug 9, 2016 at 9:27 PM, Dave T wrote: > btrfs scrub returned with uncorrectable errors. Searching in dmesg > returns the following information: > > BTRFS warning (device dm-0): checksum error at logical N on > /dev/mapper/[crypto] sector: y metadata node (level 2) in tree 250 > > it also says: > > unable to fixup (regular) error at logical NN on /dev/mapper/[crypto] > > > I assume I have a bad block device. Does that seem correct? The > important data is backed up. If it were persistently, blatantly bad, then the drive firmware would know about it, and would report a read error. If you're not seeing libata UNC errors, or the other way it manifests is with hard link resets due to inappropriate SCSI command timer default in the kernel, then it's probably some kind of SDC, torn or misdirected write, etc. If metadata is profile DUP, then scrub should fix it. If it's not, there's something else going on (or really bad luck). I'd like to believe that btrfs check can, or someday will, be able to do some kind of sanity check on a node that fails checksum, and fix it. If the node can be read but merely fails checksum isn't a really good reason for a file system to not give you access to its data, but yeah it kinda depends on what's in the node. It could contain up to a couple hundred items each of which point elsewhere. btrfs-debug-tree -b might give some hint what's going on. I'd like to believe it'll be noisy and warn the checksum fails but still show the contents assuming the drive hands over the data on those sectors. > If I can copy this entire root filesystem, what is the best way to do > it? The btrfs restore tool? cp? rsync? Some cloning tool? Other > options? 0. Backup, that's done. 1. Report 'btrfs check' without --repair, let's see what it complains about and if it might be able to plausibly fix this. Since you can scrub, it means the file system mounts. Since the file system mounts, I would not look at restore to start out because it's tedious. I'd say you toss a coin over using btrfs send/receive, or btrfs check --repair to see if it fixes the node. These days it should be safe with relatively recent btrfs-progs so I'd say use a 4.6.x or 4.7 progs for this. And then the send/receive should be done with -v or maybe even -vv for both send and receive, along with --max-errors 0, which will permit unlimited errors but will report them rather than failing midstream. This will get you the bulk of the OS. If you're lucky, the node contains only a handful of relatively unimportant items, especially if they're files small enough to be stored inline the node, which will substantially reduce the number of errors as a result of a single node loss. The calculus on btrfs check --repair first then send receive, vs send/receive then if that fails fallback to btrfs check --repair, is mainly time. Maybe repair can fix it, maybe it makes things worse. Where send/receive might fail midstream without the node being fixed first, but it causes no additional problems. The 2nd is more conservative but takes more time if it turns out the send/receive fails, you then do repair, and then have to start the send/receive over from scratch again. (If it fails, you should delete or rename the bad subvolume on the receive side before starting another send). > If I use the btrfs restore tool, should I use options x, m and S? In > particular I wonder exactly what the S option does. If I leave S out, > are all symlinks ignored? I would only use restore for the files that are reported by send/receive as failed due to errors - assuming that even happens. Or since this is OS stuff, just reinstall the packages for the files affected by the bad node. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
I'm using LUKS, aes xts-plain64, on six devices. One is using mixed-bg single device. One is dsingle mdup. And then 2x2 mraid1 draid1. I've had zero problems. The two computers these run on do have aesni support. Aging wise, they're all at least a year old. But I've been using Btrfs on LUKS for much longer than that. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: checksum error in metadata node - best way to move root fs to new drive?
On 2016-08-10 02:27, Duncan wrote: Dave T posted on Tue, 09 Aug 2016 23:27:56 -0400 as excerpted: btrfs scrub returned with uncorrectable errors. Searching in dmesg returns the following information: BTRFS warning (device dm-0): checksum error at logical N on /dev/mapper/[crypto] sector: y metadata node (level 2) in tree 250 it also says: unable to fixup (regular) error at logical NN on /dev/mapper/[crypto] I assume I have a bad block device. Does that seem correct? The important data is backed up. However, it would save me a lot of time reinstalling the operating system and setting up my work environment if I can copy this root filesystem to another storage device. Can I do that, considering the errors I have mentioned?? With the uncorrectable error being in a metadata node, what (if anything) does that imply about restoring from this drive? Well, given that I don't see any other people more qualified than I, as a simple btrfs user and list regular, tho not a dmcrypt user and definitely not a btrfs dev, posting, I'll try to help, but... I probably would have replied, if I had seen the e-mail before now. GMail apparently really hates me recently, as I keep getting things hours to days after other people and regularly out of order... As usual though, you seem to have already covered everything important pretty well, I've only got a few comments to add below. Do you know what data and metadata replication modes you were using? Scrub detects checksum errors, and for raid1 mode on multi-device (but I guess you were single device) and dup mode on single device, it will try the other copy and use it if the checksum passes there, repairing the bad copy as well. But until recently dup mode data on single device was impossible, so I doubt you were using that, and while dup mode metadata was the normal default, on ssd that changes to single mode as well. Which means if you were using ssd defaults, you got single mode for both data and metadata, and scrub can detect but not correct checksum errors. That doesn't directly answer your question, but it does explain why/that you couldn't /expect/ scrub to fix checksum problems, only detect them, if both data and metadata are single mode. Meanwhile, in a different post you asked about btrfs on dmcrypt. I'm not aware of any direct btrfs-on-dmcrypt specific bugs (tho I'm just a btrfs user and list regular, not a dev, so could have missed something), but certainly, the dmcrypt layer doesn't simplify things. There was a guy here, Mark MERLIN, worked for google I believe and was on the road frequently, that was using btrfs on dmcrypt for his laptop and various btrfs on his servers as well -- he wrote some of the raid56 mode stuff on the wiki based on his own experiments with it. But I haven't seen him around recently. I'd suggest he'd be the guy to talk to about btrfs on dmcrypt if you can get in contact with him, as he seemed to have more experience with it than anyone else around here. But like I said I haven't seen him around recently... Put it this way. If it were my data on the line, I'd either (1) use another filesystem on top of dmcrypt, if I really wanted/needed the crypted layer, or (2) do without the crypted layer, or (3) use btrfs but be extra vigilant with backups. This since while I know of no specific bugs in btrfs-on-dmcrypt case, I don't particularly trust it either, and Marc MERLIN's posted troubles with the combo were enough to have me avoiding it if possible, and being extra careful with backups if not. As far as dm-crypt goes, it looks like BTRFS is stable on top in the configuration I use (aex-xts-plain64 with a long key using plain dm-crypt instead of LUKS). I have heard rumors of issues when using LUKS without hardware acceleration, but I've never seen any conclusive proof, and what little I've heard sounds more like it was just race conditions elsewhere causing the issues. If I can copy this entire root filesystem, what is the best way to do it? The btrfs restore tool? cp? rsync? Some cloning tool? Other options? It depends on if the filesystem is mountable and if so, how much can be retrieved without error, the latter of which depends on the extent of that metadata damage, since damaged metadata will likely take out multiple files, and depending on what level of the tree the damage was on, it could take out only a few files, or most of the filesystem! If you can mount and the damage appears to be limited, I'd try mounting read-only and copying what I could off, using conventional methods. That way you get checksum protection, which should help assure that anything successfully copied isn't corrupted, because btrfs will error out if there's checksum errors and it won't copy successfully. If it won't mount or it will but the damage appears to be extensive, I'd suggest using restore. It's read-only in terms of the filesystem it's restoring from, so shouldn't cause further damage -- unless the devic
Re: checksum error in metadata node - best way to move root fs to new drive?
Dave T posted on Tue, 09 Aug 2016 23:27:56 -0400 as excerpted: > btrfs scrub returned with uncorrectable errors. Searching in dmesg > returns the following information: > > BTRFS warning (device dm-0): checksum error at logical N on > /dev/mapper/[crypto] sector: y metadata node (level 2) in tree 250 > > it also says: > > unable to fixup (regular) error at logical NN on > /dev/mapper/[crypto] > > > I assume I have a bad block device. Does that seem correct? The > important data is backed up. > > However, it would save me a lot of time reinstalling the operating > system and setting up my work environment if I can copy this root > filesystem to another storage device. > > Can I do that, considering the errors I have mentioned?? With the > uncorrectable error being in a metadata node, what (if anything) does > that imply about restoring from this drive? Well, given that I don't see any other people more qualified than I, as a simple btrfs user and list regular, tho not a dmcrypt user and definitely not a btrfs dev, posting, I'll try to help, but... Do you know what data and metadata replication modes you were using? Scrub detects checksum errors, and for raid1 mode on multi-device (but I guess you were single device) and dup mode on single device, it will try the other copy and use it if the checksum passes there, repairing the bad copy as well. But until recently dup mode data on single device was impossible, so I doubt you were using that, and while dup mode metadata was the normal default, on ssd that changes to single mode as well. Which means if you were using ssd defaults, you got single mode for both data and metadata, and scrub can detect but not correct checksum errors. That doesn't directly answer your question, but it does explain why/that you couldn't /expect/ scrub to fix checksum problems, only detect them, if both data and metadata are single mode. Meanwhile, in a different post you asked about btrfs on dmcrypt. I'm not aware of any direct btrfs-on-dmcrypt specific bugs (tho I'm just a btrfs user and list regular, not a dev, so could have missed something), but certainly, the dmcrypt layer doesn't simplify things. There was a guy here, Mark MERLIN, worked for google I believe and was on the road frequently, that was using btrfs on dmcrypt for his laptop and various btrfs on his servers as well -- he wrote some of the raid56 mode stuff on the wiki based on his own experiments with it. But I haven't seen him around recently. I'd suggest he'd be the guy to talk to about btrfs on dmcrypt if you can get in contact with him, as he seemed to have more experience with it than anyone else around here. But like I said I haven't seen him around recently... Put it this way. If it were my data on the line, I'd either (1) use another filesystem on top of dmcrypt, if I really wanted/needed the crypted layer, or (2) do without the crypted layer, or (3) use btrfs but be extra vigilant with backups. This since while I know of no specific bugs in btrfs-on-dmcrypt case, I don't particularly trust it either, and Marc MERLIN's posted troubles with the combo were enough to have me avoiding it if possible, and being extra careful with backups if not. > If I can copy this entire root filesystem, what is the best way to do > it? The btrfs restore tool? cp? rsync? Some cloning tool? Other options? It depends on if the filesystem is mountable and if so, how much can be retrieved without error, the latter of which depends on the extent of that metadata damage, since damaged metadata will likely take out multiple files, and depending on what level of the tree the damage was on, it could take out only a few files, or most of the filesystem! If you can mount and the damage appears to be limited, I'd try mounting read-only and copying what I could off, using conventional methods. That way you get checksum protection, which should help assure that anything successfully copied isn't corrupted, because btrfs will error out if there's checksum errors and it won't copy successfully. If it won't mount or it will but the damage appears to be extensive, I'd suggest using restore. It's read-only in terms of the filesystem it's restoring from, so shouldn't cause further damage -- unless the device is actively decaying as you use it, in which case the first thing I'd try to do is image it to something else so the damage isn't getting worse as you work with it. But AFAIK restore doesn't give you the checksum protection, so anything restored that way /could/ be corrupt (tho it's worth noting that ordinary filesystems don't do checksum protection anyway, so it's important not to consider the file any more damaged just because it wasn't checksum protected than it would be if you simply retrieved it from say an ext4 filesystem and didn't have some other method to verify the file). Altho... working on dmcrypt, I suppose it's likely that anything that's c