Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
> "Alex" == Alex Lieflander writes: >> On May 7, 2022, at 4:41 PM, Stuart D Gathman wrote: >> >>> On Fri, 6 May 2022, Alex Lieflander wrote: >>> >>> Thanks. I really don’t want to give up the DM-Integrity management. Less >>> complexity is just a bonus. >> >> What are you trying to get out of RAID6? If redundancy and integrity >> are already managed at another layer, then just use RAID0 for striping. >> >> I like to use RAID10 for mirror + striping, but I understand parity disks >> give redundancy without halving capacity. Parity means RMW cycles of >> largish blocks, whereas straight mirroring (RAID1, RAID10) can write >> single sectors without a RMW cycle. Alex> I don’t trust the hardware I’m running on very much, but it’s Alex> all I have to work with at the moment; it’s important that the Alex> array is resilient to *any* (and multiple) single chunk Alex> corruptions because such corruptions are likely to happen in the Alex> future. Ouch! I hope you have good backups somewhere, because I suspect you're doing to suffer a complete failure at some point. Alex> For the last several months I’ve periodically been seeing Alex> (DM-Integrity) checksum mismatch warnings at various locations Alex> on all of my disks. I stopped using a few SATA ports that were Alex> explicitly throwing SATA errors, but I suspect that the Alex> remaining connections are unpredictably (albeit infrequently) Alex> corrupting data in ways that are more difficult to detect. This is interesting. And worrisome, because I would not expect moving from one SATA port to another to cure problems, unless it was A) moving to a different controller, or B) you changed/reseated the SATA cable. But I also wonder about your power supply and what it's rated for. You might just be hitting the ragged edge of what it can supply, and so you're running into problems with voltage dropping just enough to make things slightly flaky. Alex> I’ve tried to “check” and “repair” my array on multiple kernel Alex> versions and live recovery USB sticks, but the “check" always Alex> seems to freeze and all subsequent IO to the array hangs until Alex> reboot; at the moment, a chunk is only ever made consistent when Alex> its data is overwritten, so it needs to survive periodic, random Alex> corruption for as long as possible. This is also a warning to my that maybe you have power supply issues. Can you give a summary of your hardware configuration and model numbers? If you're running a smallish power supply, maybe look for a replacement which can get you more power. Go from a 430W one to 600W, or 500W to 750W and see if that makes a difference. Looking at your data from before, I see you have 12 disks on the system, 11 spinning disks and one nvme device. So I *really* suspect you have an overloaded power supply. Are you also using a disk controller? And which version of linux? Alex> I also have a disk that infrequently fails to read from a Alex> particular area, but the rest of the disk is fine. I wouldn’t Alex> trust that disk with valuable data, but it seems like a perfect Alex> candidate to hold additional parity (raid6_ls_6) that I Alex> hopefully never need. This is not how RAID6 parity works. The entire disk (or partition) is used to write data and/or parity. It's RAID4 which dedicates a single disk to parity duties. So thinking that a known flaky disk will be ok for just parity use isn't really a good idea. I'd also look at the output of 'smartctl --all /dev/sd' for all your disks and see what the numbers say. But honestly, it sounds like you have some serious hardware issues which you're trying to paper over with DM-Integrity and RAID5. And I suspect it will all end in tears sooner or later. You do have backups of your data, right? Even onto a single new 10tb disk that's now connected to the system all the time? Good luck, John ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
On Sat, 7 May 2022, Alex Lieflander wrote: I don’t trust the hardware I’m running on very much, but it’s all I have to work with at the moment; it’s important that the array is resilient to *any* (and multiple) single chunk corruptions because such corruptions are likely to happen in the future. For the last several months I’ve periodically been seeing (DM-Integrity) checksum mismatch warnings at various locations on all of my disks. I stopped using a few SATA ports that were explicitly throwing SATA errors, but I suspect that the remaining connections are unpredictably (albeit infrequently) corrupting data in ways that are more difficult to detect. Sounds like a *great* test bed for software data integrity tools. Don't throw that system away when you get a more reliable one! That sounds like a situation that btrfs with multiple copies could handle. Use a beefier checksum than the default crc-32 also. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
On Fri, 6 May 2022, Alex Lieflander wrote: Thanks. I really don’t want to give up the DM-Integrity management. Less complexity is just a bonus. What are you trying to get out of RAID6? If redundancy and integrity are already managed at another layer, then just use RAID0 for striping. I like to use RAID10 for mirror + striping, but I understand parity disks give redundancy without halving capacity. Parity means RMW cycles of largish blocks, whereas straight mirroring (RAID1, RAID10) can write single sectors without a RMW cycle.___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
> "Alex" == Alex Lieflander writes: Alex> I actually used to use MDADM with LVM on top. I switched to pure Alex> LVM for simplicity and per-disk host-managed Alex> integrity-checking. I don’t know if MDADM has since gained the Alex> ability to correct single-disk inconsistencies, but without Alex> per-disk integrity-checking it would be technically impossible Alex> to do this if 1 disk had already failed. Can you tell me what you mean by "per-disk host-managed integrity-checking"? Are you running dm-integrity in your setup? Can you post the output of: lsblk so we understand better what you have? But in any case, good luck with getting RAID6 solid and working as a pure LVM setup. I don't think you're going to make it happen any time soon, and I honestly prefer to have seperate layers for my setup so that each layer does it's own think and does it well. John ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
Alex> I have 4 disks that I’d really like to put into a RAID6. I know Alex> about RAID10, but it wouldn’t work well for me for several Alex> reasons. Can you explain those reasons? In general, RAID10 gives you only 50% capacity, but much improved performance over RAID5/6 in terms of read/write performance. But if you want to be able to handle the failure of any two disks in your RAID6, then I can understand your decision. Alex> Buying another disk would also be a waste of money because I Alex> don’t need 3-disks-worth of usable capacity. That's fair. Alex> I know there was a question regarding this a few years ago, and Alex> the consensus was to not natively support that configuration. I Alex> can respect that (although I would urge you to reconsider), but Alex> I’d still like to do it on my machine. I would instead build your RAID6 using MD, and then layer LVM on top of it. It works, it's solid and it runs really well. Alex> So far I’ve tried removing the restrictions from the source code Alex> and recompiling (I don’t know C, but I’m familiar with general Alex> code syntax). I’ve slowly gotten further in the lvconvert Alex> process, but there seems to be many similar checks throughout Alex> the code. If you don't know the code, then you're not going to get working RAID6 up and running any time soon. Alex> I’m hoping someone could point me in the right direction towards Alex> achieving this goal. If I successfully bypass the user-space Alex> tool restrictions, will the rest of LVM likely work with my Alex> desired config? Would you be willing to allow the --force option Alex> to bypass the restrictions that are not strictly necessary, even Alex> at the expense of expected stability? Is there anything else you Alex> could suggest? I really can only suggest you setup RAID6 using the MD raid tools (mdadm) and then create your LVM PVs, VGs and LVs on top of that. It really works well. Yes, you now need to have another tool to manage another layer, but since the MD system is well tested, reliable and just works, then I would go with it as the base. If you can copy all your data onto a single disk, then you could simple create a RAID5 on your other three disks, create a PV on the /dev/md0, then add the PV into your VG and do a lvmove to copy your data onto the RAID5, all without downtime. Once it's moved over, you remove the final disk, then grow the MD array from RAID5 to RAID6, again online. The details aren't hard, but you do have to be careful. John ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/