Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?
On Wed, 27 Sep 2023, Jean-Marc Saffroy wrote: So I prefer to manage available raw (un-encrypted) space with LVM. Now, I also need to do backups of /home, and that's why I want snapshots. But that first layer of LVM would only show a snapshot of an encrypted volume, and the backup job shouldn't have the passphrase to decrypt the volume. Which is why I'm trying to find a way of doing snaphots of an "opened" LUKS volume: this way, the backup job can do its job without requiring a passphrase. Besides LVM on LUKS on LVM which you already tried, consider using a filesystem that supports snapshots. I use btrfs, and snapshots work beautifully, and if you use "btrfs send" you can even do differential backups. Btrfs is COW, so snaps share all blocks not touched. Pipe the output of btrfs send directly to your backup process/server running "btrfs receive". Note, this requires the backup server to have btrfs. If it doesn't, then just use rsync from the snapshot directory to the backup server like a typical unix backup solution. (E.g. my vm host uses XFS on the backup drives, so it uses rsync.) In simple tests, I could make it work, with dmsetup on LUKS on LVM, and also (after I sent my original email) with LVM on LUKS on LVM. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Swapping LLVM drive
On Mon, 28 Aug 2023, Roska Postit wrote: After reading your answer more carefully I got the following idea: How do you see if I boot the system (this is a desktop computer and the old and the new drive are both NVMe SSD) from USB Linux and then just do a 'dd' for the entire drive (in block level, bit-by-bit). Then I remove the old disk out of the system. Shouldn't it boot normally now ? That would work, yes, but you don't expand /boot - which you really should do. Also, copying the entire filesystem is not only less efficient, but involves needless writes to an SSD (where they are a more limited resource than on magnetic drives). Then I will create a new partition for the all unused space (1.5GB) on new disk which I then will add to the LVM as a new Physical Volume (PV) in That is pointless when you can just expand the partition (which is trivial when it is the last one). You don't want more PVs unless they are actually physical volumes - or there is some special circumstance that prevents just expanding the partition. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Swapping LLVM drive
On Mon, 28 Aug 2023, Phillip Susi wrote: Why would you use dd/partclone instead of just having LVM move everything to the new drive on the fly? Partition the new drive, use pvcreate to initialize the partition as a pv, vgextend to add the pv to the existing vg, pvmove to evacuate the logical volumes from the old disk, then vgreduce to remove it from the vg. 1. Unnecessary copying. 2. You lose your free backup of the system on the old drive, which should be carefully labeled and kept handy for a year. (After that, SSDs start to run the risk of data retention issues.) Don't forget you'll need to reinstall grub on the new drive for it to boot. And that is the most important reason. "Just reinstall grub" is a much larger learning curve than "dd" IMO. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Swapping LLVM drive
On Sun, 27 Aug 2023, Roska Postit wrote: What is the most proper way to swap my 500GB SSD drive to the bigger 2TB SSD drive in the following LLVM configuration ? nvme0n1 259:0 0 465,8G 0 disk ├─nvme0n1p1 259:1 0 512M 0 part /boot/efi ├─nvme0n1p2 259:2 0 488M 0 part /boot └─nvme0n1p3 259:3 0 464,8G 0 part ├─pc3--vg-root 254:0 0 463,8G 0 lvm / └─pc3--vg-swap_1 254:1 0 980M 0 lvm [SWAP] Since you are not mirroring, just add the new drive. If this is a laptop, and you can only have one drive, then I suggest you mount the new drive via USB (note there are at least 2 kinds of nvme interface and you have to get a matching USB enclosure). Use dd to copy the partition table (this also often contains boot code) to the new disk on USB. Then use dd to copy the smaller partitions (efi,boot). Now use cfdisk to delete the 3rd partition. Expand the boot partition to 1G (you'll thank me later). Allocate the entire rest of the disk to p3. Create a new vg with a different name. Allocate root and swap on new VG the same sizes. Take a snapshot of current root (delete swap on old drive since you didn't leave yourself any room), and use partclone to efficiently copy the filesystem over to new root. Either a) edit grub and fstab on new drive to use new vg name or b) boot from a live media to rename old and new vg or c) rename vg just before shutting down to remove drive - I think LVM can operate with dup VG name, but I've never navigated the details. Swap drives after powerdown. A modern filesystem like ext2, xfs, btrfs, etc can expand as you expand the root LV. Leave yourself some working room in the VG.___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] bug? shrink lv by specifying pv extent to be removed does not behave as expected
I use a utility that maps bad sectors to files, then move/rename the files into a bad blocks folder. (Yes, this doesn't work when critical areas are affected.) If you simply remove the files, then modern disks will internally remap the sectors when they are written again - but the quality of remapping implementations varies. It is more time efficient to just buy a new disk, but with wars and rumors of wars threatening to disrupt supply chains, including tech, it's nice to have the skills to get more use from failing hardware. Plus, it is a challenging problem, which can be fun to work on at leisure. On Sun, 9 Apr 2023, Roland wrote: What is your use case that you believe removing a block in the middle of an LV needs to work? my use case is creating some badblocks script with lvm which intelligently handles and skips broken sectors on disks which can't be used otherwise... ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] How to implement live migration of VMs in thinlv after using lvmlockd
Z> On Tue, Nov 01, 2022 at 01:36:17PM +0800, Zhiyong Ye wrote: I want to implement live migration of VMs in the lvm + lvmlockd + sanlock environment. There are multiple hosts in the cluster using the same iscsi connection, and the VMs are running on this environment using thinlv volumes. But if want to live migrate the vm, it will be difficult since thinlv which from the same thin pool can only be exclusive active on one host. I just expose the LV (thin or not - I prefer not) as an iSCSI target that the VM boots from. There is only one host that manages a thin pool, and that is a single point of failure, but no locking issues. You issue the LVM commands on the iSCSI server (which I guess they call NAS these days). If you need a way for a VM to request enlarging an LV it accesses, or similar interaction, I would make a simple API where each VM gets a token that determines what LVs it has access to and how much total storage it can consume. Maybe someone has already done that. I just issue the commands on the LVM/NAS/iSCSI host. I haven't done this, but there can be more than one thin pool, each on it's own NAS/iSCSI server. So if one storage server crashes, then only the VMs attached to it crash. You can only (simply) migrate a VM to another VM host on the same storage server. BUT, you can migrate a VM to another host less instantly using DRBD or other remote mirroring driver. I have done this. You get the remote LV mirror mostly synced, suspend the VM (to a file if you need to rsync that to the remote), finish the sync of the LV(s), resume the VM on the new server - in another city. Handy when you have a few hours notice of a natural disaster (hurricane/flood). ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Where are the data blocks of an LV?
Checkout https://github.com/sdgathman/lbatofile It was written to identify the file affected by a bad block (so it goes the opposite direction), but the getpvmap() function obtains pe_start and pe_size plus the list of segments. findlv() goes through the segments to find the one an absolute sector is in. That should tell you what you want to know. On Fri, 22 Jul 2022, Marcin Owsiany wrote: I know that it is possible to find where a given logical volume's extents start on a PVs thanks to the information printed by lvdisplay --maps. I was also able to experimentally establish that the actual data of an LV starts one megabyte after that location. Is that offset documented anywhere, or can I somehow discover it at runtime for a given LV? ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
On Sat, 7 May 2022, Alex Lieflander wrote: I don’t trust the hardware I’m running on very much, but it’s all I have to work with at the moment; it’s important that the array is resilient to *any* (and multiple) single chunk corruptions because such corruptions are likely to happen in the future. For the last several months I’ve periodically been seeing (DM-Integrity) checksum mismatch warnings at various locations on all of my disks. I stopped using a few SATA ports that were explicitly throwing SATA errors, but I suspect that the remaining connections are unpredictably (albeit infrequently) corrupting data in ways that are more difficult to detect. Sounds like a *great* test bed for software data integrity tools. Don't throw that system away when you get a more reliable one! That sounds like a situation that btrfs with multiple copies could handle. Use a beefier checksum than the default crc-32 also. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
On Fri, 6 May 2022, Alex Lieflander wrote: Thanks. I really don’t want to give up the DM-Integrity management. Less complexity is just a bonus. What are you trying to get out of RAID6? If redundancy and integrity are already managed at another layer, then just use RAID0 for striping. I like to use RAID10 for mirror + striping, but I understand parity disks give redundancy without halving capacity. Parity means RMW cycles of largish blocks, whereas straight mirroring (RAID1, RAID10) can write single sectors without a RMW cycle.___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] LVM performance vs direct dm-thin
On Sun, 2022-01-30 at 11:45 -0500, Demi Marie Obenour wrote: > On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote: > > > > > Since you mentioned ZFS - you might want focus on using 'ZFS-only' > > solution. > > Combining ZFS or Btrfs with lvm2 is always going to be a painful > > way as > > those filesystems have their own volume management. > > Absolutely! That said, I do wonder what your thoughts on using loop > devices for VM storage are. I know they are slower than thin > volumes, > but they are also much easier to manage, since they are just ordinary > disk files. Any filesystem with reflink can provide the needed > copy-on-write support. I use loop devices for test cases - especially with simulated IO errors. Devs really appreciate having an easy reproducer for database/filesystem bugs (which often involve handling of IO errors). But not for production VMs. I use LVM as flexible partitions (i.e. only classic LVs, no thin pool). Classic LVs perform like partitions, literally using the same driver (device mapper) with a small number of extents, and are if anything more recoverable than partition tables. We used to put LVM on bare drives (like AIX did) - who needs a partition table? But on Wintel, you need a partition table for EFI and so that alien operating systems know there is something already on a disk. Your VM usage is different from ours - you seem to need to clone and activate a VM quickly (like a vps provider might need to do). We generally have to buy more RAM to add a new VM :-), so performance of creating a new LV is the least of our worries. Since we use LVs like partitions - mixing with btrfs is not an issue. Just use the LVs like partitions. I haven't tried ZFS on linux - it may have LVM like features that could fight with LVM. ZFS would be my first choice on a BSD box. We do not use LVM raid - but either run mdraid underneath, or let btrfs do it's data duplication thing with LVs on different spindles. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] convert logical sector -> physical sector + pv/vg extent number
On Mon, 3 Jan 2022, Roland wrote: any chance to get retrieve this information for automated/script based processing ? You might find this script enlightening: https://github.com/sdgathman/lbatofile It maps bad sectors to partition,LV,file,etc The relevant function for your question is findlv() Some of the commands run are: pvdisplay --units k -m '/dev/lvm_pv' pvs --units k -o+pe_start '/dev/lvm_pv' Am 03.01.22 um 00:12 schrieb Andy Smith: On Sun, Jan 02, 2022 at 08:00:30PM +0100, Roland wrote: if i have a logical sector/block "x" on a lvm logical volume , is there a way to easily calculate/determine (optimally by script/cli) the corresponding physical sector of the physical device it belongs to and the extent number of the appropiate pv/vg where it resides ?___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?
On Tue, 28 Dec 2021, Tomas Dalebjörk wrote: Yes, it is an incremental backup based of the cow device I've used such a COW based backup (can't remember the name just now, currently using DRBD and rsync for incremental mirrors). The way it worked was to read and interpret the raw COW device itself and send blocks over the wire - writing directly to a volume on the remote end. It did not try to patch up metadata and use LVM to merge. You need an intimate knowledge of COW internals for either approach - BUT the read-only approach (with plain writes at the other end) is MUCH safer (not going to trash metadata at either end) and just as efficient on the wire. I've also used a block device rsync, that read every block on both sides and compared hashes - but that is obviously a lot more disk io that using the COW where LVM is already tracking changed blocks.___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?
If you want to give it a try, just create a snapshot on a specific device And change all the blocks on the origin, there you are, you now have a cow device containing all data needed. How to move this snapshot device to another server, reattach it to an empty lv volume as a snapshot. lvconvert -s, command requires an argument of an existing snapshot volume name. But there is no snapshot on the new server, so it can't re-attach the volume. So what procedures should be invoked to create just the detached references in LVM, so that the lvconver -s command can work? Just copy the snapshot to another server, by whatever method you would use to copy the COW and Data volumes (I prefer partclone for supported filesystems). No need for lvconvert. You are trying WAY WAY too hard. Are you by any chance trying to create an incremental backup system based on lvm snapshot COW? If so, say so. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Mapping lvol off to pvol off
On Tue, 17 Aug 2021, Chethan Seshadri wrote: Can someone help to covert an offset within an lvol to the corresponding pvol offset using... 1. lvm commands 2. lvmdbusd APIs This utility does that: https://github.com/sdgathman/lbatofile See getlv() and getpvmap() https://github.com/sdgathman/lbatofile/blob/master/lbatofile.py Caveat, only tested with classic LVs. Dunno about thin stuff. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Does LVM have any plan/schedule to support btrfs in fsadm
On Mon, 28 Jun 2021, heming.z...@suse.com wrote: In my opinion, the using style of btrfs by many users are same as ext4/xfs. Yes. I like the checksums in metadata feature for enhanced integrity checking. It seems too complicated to have anytime soon - but when a filesystem detects corruption, and is on an LVM (or md) RAID1 layer, an ioctl to read alternate mirror branches to see which (if any) has the correct data would allow recovery. Btrfs does this if it is doing the mirroring, but then you lose all the other features from LVM or md raid10, including running other filesystems and efficient virtual disks for virtual machines. We eventually got DISCARD operations to pass to lower layers. Dealing with mirror branches should really be a thing too. ___ linux-lvm mailing list linux-lvm@redhat.com https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] lvm limitations
On Tue, 15 Sep 2020, Tomas Dalebjörk wrote: ok, lets say that I have 10 LV on a server, and want create a thin lv snapshot every hour and keep that for 30 days that would be 24h * 30days * 10lv = 720 lv if I want to keep snapshot copies from more nodes, to serve a single repository of snapshot copies, than these would easily become several hundred thousands of lv not sure if this is a good idea, but I guess it can be very useful in some sense as block level incremental forever and instant recovery can be implemented for open sourced based applications what reflections do you have on this idea? My feeling is that btrfs is a better solution for the hourly snapshots. (Unless you are testing a filesystem :-) I find "classic" LVs a robust replacement for partitions that are easily resized without moving data around. I would be more likely to try RAID features on classic LVs than thin LVs.___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] What to do about new lvm messages
On Sat, 22 Aug 2020, L A Walsh wrote: I am trying to create a new pv/vg/+lvs setup but am getting some weird messages pvcreate -M2 --pvmetadatacopies 2 /dev/sda1 Failed to clear hint file. WARNING: PV /dev/sdd1 in VG Backup is using an old PV header, modify the VG to update. Physical volume "/dev/sda1" successfully created. So why am I getting a message about not clearing a hint file (running s root) Because there is an ole PV header on sdd1 From an online man page, I should have been able to use -ff to recreate a pv over the top of a preexisting one, but that didn't seem to work. I got: pvcreate -ff -M2 --pvmetadatacopies 2 /dev/sda1 Failed to clear hint file. WARNING: PV /dev/sdd1 in VG Backup is using an old PV header, modify You wrote a new PV header on sda1 - but that didn't do diddly squat about the old one on sdd1. the VG to update. Cannot access VG Space with system ID Ishtar with unknown local system ID. Device /dev/sda1 excluded by a filter. The PV filter is excluding sda1. Are you confused about what is on which sdx? dd if=/dev/zero of=/dev/sda1 bs=4096 count=1 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000175409 s, 23.4 MB/s I hope sda1 was really what you think it was. What is on sdd1? None of your listings examine it. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] [1019133-4yqc8ex4] LVM hangs after volume change
On Wed, 15 Apr 2020, Shock Media B.V. support wrote: We use an mdadm raid-config consisting of 4 or more SSD's/Disks where we use part of the disks for a raid1,raid10 or raid5. We create volumes on 2 nodes and use DRBD to keep these 2 volumes in sync and we run a virtual machine (using KVM) on this volume. I've used DRBD for an almost identical setup, except only raid1 - we were even cheaper that you. :-> I would first suspect DRBD hanging - and you should check the queue stats to see if there is a backlog. If you are using the paid version with a userland smart buffering process, check whether that has died. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] faster snapshot creation?
On Sat, 22 Feb 2020, Eric Toombs wrote: Snapshot creation is already pretty fast: $ time sudo lvcreate --size 512M --snapshot --name snap /dev/testdbs/template Logical volume "snap" created. 0.03user 0.05system 0:00.46elapsed 18%CPU (0avgtext+0avgdata 28916maxresident)k 768inputs+9828outputs (0major+6315minor)pagefaults 0swaps That's about half a second in real time. But I have a scenario that would benefit from it being even faster. I'm doing many small unit tests So, is there a sort of "dumber" way of making these snapshots, maybe by changing the allocation algorithm or something? How about using a filesystem that supports snapshot, e.g. nilfs, or (I think) btrfs? That would be much faster than doing it at the LVM level, which has to sync metadata and stuff. a) load your template into work directory b) tag snapshot c) run test (possibly in container) d) restore tagged snapshot e) goto c ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] pvmove --abort
On Mon, 27 Jan 2020, Matthias Leopold wrote: I consciously used "pvmove --abort" for the first time now and I'm astonished it doesn't behave like described in the man page. No matter if I've used "--atomic" for the original command, when I interrupt the process with "pvmove --abort" lvm always completely rolls back my copy operation. I would expect that if I don't use "--atomic" then "--abort" will result in "segments that have been moved will remain on the destination PV, while unmoved segments will remain on the source PV" (from man page). Am I missing something? I'm not an LVM guru, but I think I got this one! pvmove effectively creates a mirror on the destination, and begins syncing the mirror. Any writes to the LV go to *both* the source and destination. When you abort, it simply discards the partially synced mirror. When the sync is complete, it discards the source leg of the mirror instead. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Failed merge, still running?
On Wed, 22 Jan 2020, Mauricio Tavares wrote: lvconvert --merge vmhost_vg0/desktop_snap_20200121 and instead of seeing the usual percentage of how far it has completed, I got nothing. lvs -a -o +devices shows LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices desktop vmhost_vg0 Owi-a-s--- 20.00g /dev/sdb3(14848) [desktop_snap_20200121] vmhost_vg0 Swi-a-s--- 10.00g desktop 100.00 While not a guru, I think I can tell you the issue. It looks like the snapshot was full (says 100.00). The snapshot is unusable at that point. Maybe it wasn't before you started the merge, and the merge sets it to 100.00 when it starts, I haven't noticed. Before I blow that lv up, what else should I be checking? What was the snapshot percent used before you started the merge? -- Stuart D. Gathman "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Best way to run LVM over multiple SW RAIDs?
On Sat, 7 Dec 2019, John Stoffel wrote: The biggest harm to performance here is really the RAID5, and if you can instead move to RAID 10 (mirror then stripe across mirrors) then you should be a performance boost. Yeah, That's what I do. RAID10, and use LVM to join together as JBOD. I forgot about the raid 5 bottleneck part, sorry. As Daniel says, he's got lots of disk load, but plenty of CPU, so the single thread for RAID5 is a big bottleneck. I assume he wants to use LVM so he can create volume(s) larger than individual RAID5 volumes, so in that case, I'd probably just build a regular non-striped LVM VG holding all your RAID5 disks. Hopefully Wait, that's what I suggested! If you can, I'd get more SSDs and move to RAID1+0 (RAID10) instead, though you do have the problem where a double disk failure could kill your data if it happens to both halves of a mirror. No worse than raid5. In fact, better because the 2nd fault always kills the raid5, but only has a 33% or less chance of killing the raid10. (And in either case, it is usually just specific sectors, not the entire drive, and other manual recovery techniques can come into play.) -- Stuart D. Gathman "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Best way to run LVM over multiple SW RAIDs?
On Tue, Oct 29, 2019 at 12:14 PM Daniel Janzon wrote: I have a server with very high load using four NVMe SSDs and therefore no HW RAID. Instead I used SW RAID with the mdadm tool. Using one RAID5 volume does not work well since the driver can only utilize one CPU core which spikes at 100% and harms performance. Therefore I created 8 partitions on each disk, and 8 RAID5s across the four disks. Now I want to bring them together with LVM. If I do not use a striped volume I get high performance (in expected magnitude according to disk specs). But when I use a striped volume, performance drops to a magnitude below. The reason I am looking for a striped setup is to The mdadm layer already does the striping. So doing it again in the LVM layer completely screws it up. You want plain JBOD (Just a Bunch Of Disks). -- Stuart D. Gathman "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] exposing snapshot block device
On Tue, 22 Oct 2019, Gionatan Danti wrote: The main thing that somewhat scares me is that (if things had not changed) thinvol uses a single root btree node: losing it means losing *all* thin volumes of a specific thin pool. Coupled with the fact that metadata dump are not as handy as with the old LVM code (no vgcfgrestore), it worries me. If you can find all the leaf nodes belonging to the root (in my btree database they are marked with the root id and can be found by sequential scan of the volume), then reconstructing the btree data is straightforward - even in place. I remember realizing this was the only way to recover a major customer's data - and had the utility written, tested, and applied in a 36 hour programming marathon (which I hope to never repeat). If this hasn't occured to thin pool programmers, I am happy to flesh out the procedure. Having such a utility available as a last resort would ratchet up the reliability of thin pools. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] exposing snapshot block device
On Tue, 22 Oct 2019, Zdenek Kabelac wrote: Dne 22. 10. 19 v 17:29 Dalebjörk, Tomas napsal(a): But, it would be better if the cow device could be recreated in a faster way, mentioning that all blocks are present on an external device, so that the LV volume can be restored much quicker using "lvconvert --merge" command. I do not want to break your imagination here, but that is exactly the thing you can do with thin provisioning and thin_delta tool. lvconvert --merge does a "rollback" to the point at which the snapshot was taken. The master LV already has current data. What Tomas wants to be able to do a "rollforward" from the point at which the snapshot was taken. He also wants to be able to put the cow volume on an extern/remote medium, and add a snapshot using an already existing cow. This way, restoring means copying the full volume from backup, creating a snapshot using existing external cow, then lvconvert --merge instantly logically applies the cow changes while updating the master LV. Pros: "Old" snapshots are exactly as efficient as thin when there is exactly one. They only get inefficient with multiple snapshots. On the other hand, thin volumes are as inefficient as an old LV with one snapshot. An old LV is as efficient, and as anti-fragile, as a partition. Thin volumes are much more flexible, but depend on much more fragile database like meta-data. For this reason, I always prefer "old" LVs when the functionality of thin LVs are not actually needed. I can even manually recover from trashed meta data by editing it, as it is human readable text. Updates to the external cow can be pipelined (but then properly handling reads becomes non trivial - there are mature remote block device implementations for linux that will do the job). Cons: For the external cow to be useful, updates to it must be *strictly* serialized. This is doable, but not as obvious or trivial as it might seem at first glance. (Remote block device software will take care of this as well.) The "rollforward" must be applied to the backup image of the snapshot. If the admin gets it paired with the wrong backup, massive corruption ensues. This could be automated. E.g. the full image backup and external cow would have unique matching names. Or the full image backup could compute an md5 in parallel, which would be store with the cow. But none of those tools currently exist. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial.___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] repair pool with bad checksum in superblock
On Fri, 23 Aug 2019, Gionatan Danti wrote: Il 23-08-2019 14:47 Zdenek Kabelac ha scritto: Ok - serious disk error might lead to eventually irrepairable metadata content - since if you lose some root b-tree node sequence it might be really hard to get something sensible (it's the reason why the metadata should be located on some 'mirrored' device - since while there is lot of effort put into protection again software errors - it's hard to do something with hardware error... Would be possible to have a backup superblock, maybe located on device end? XFS, EXT4 and ZFS already do something similar... On my btree file system, I can recover from arbitrary hardware corruption by storing the root id of the file (table) in each node. Leaf nodes (with full data records) are also indicated. Thus, even if the root node of a file is lost/corrupted, the raw file/device can be scanned for corresponding leaf nodes to rebuild the file (table) with all remaining records. Drawbacks: deleting individual leaf nodes requires changing the root id of the node requiring an extra write. (Otherwise records could be included in some future recovery.) Deleting entire files (tables) just requires marking the root node deleted - no need to write all the leaf nodes. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Power loss consistency for RAID
On Sun, 17 Mar 2019, Zheng Lv wrote: I'm recently considering using software RAID instead of hardware controllers for my home server. AFAIK, write operation on a RAID array is not atomic across disks. I'm concerned that what happens to RAID1/5/6/10 LVs after power loss. Is manual recovery required, or is it automatically checked and repaired on LV activation? Also I'm curious about how such recovery works internally. I use md raid1 and raid10. I recommend that instead of the LVM RAID, which is newer. Create your RAID volumes with md, and add them as PVs: PV VG Fmt Attr PSize PFree /dev/md1 vg_span lvm2 a--u 214.81g 0 /dev/md2 vg_span lvm2 a--u 214.81g 26.72g /dev/md3 vg_span lvm2 a--u 249.00g 148.00g /dev/md4 vg_span lvm2 a--u 252.47g 242.47g Note that you do not need matching drives as with hardware RAID, you can add disks and mix and match partitions of the same size on drives of differing sizes. LVM does this automatically, you have to manually assign partitions to block devices with md. There are very few (large) partitions to assign, so it is a pleasant human sized exercise. While striping and mirror schemes like raid0, raid1, raid10 are actually faster with software RAID, I avoid RAID schemes with RMW cycles like raid5 - you really need the hardware for those. I use raid1 when the filesystem needs to be readable without the md driver - as with /boot. Raid10 provides striping as well as mirroring, with however many drives you have (I usually have 3 or 4). Here is a brief overview of MD recovery and diagnostics. Someone else will have to fill in with the mechanics of LVM raid. Md keeps a version in the superblock of each device in a logical md drive - and marks the older leg as failed and replaced (and begins to sync it). In newer superblock formats, it also keeps a bit map so that it can sync only possibly modified areas. Once a week (configurable), check_raid compares the legs (on most distros). If it encounters a read error on either drive, it immediately syncs that block from the good drive. This reassigns the sector on modern drives. (On ancient drives, a write error on resync marks the drive as failed.) If for some reason (there are legitimate ones involving write optimizations for SWAP volumes and such) the two legs do not match, it arbitrarily copies one leg to the other, keeping a count. (IMO it should also log the block offset so that I can occasionally check that the out of sync occurred in an expected volume.) -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
On Tue, 5 Mar 2019, David Teigland wrote: On Tue, Mar 05, 2019 at 06:29:31PM +0200, Nir Soffer wrote: Maybe LVM should let you mix PVs with different logical block size, but it should require --force. LVM needs to fix this, your solution sounds like the right one. Also, since nearly every modern device device has a physical block size of 4k or more, and even when the logical block size is (emulated) 512, performance degradation occurs with smaller filesystem blocks, then the savvy admin should ensure that all filesystem have a min of 4k block size - except in special circustances. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
On Mon, 4 Mar 2019, Cesare Leonardi wrote: Today I repeated all the tests and indeed in one case the mount failed: after pvmoving from the 512/4096 disk to the 4096/4096 disk, with the LV ext4 using 1024 block size. ... The error happened where you guys expected. And also for me fsck showed no errors. But doesn't look like a filesystem corruption: if you pvmove back the data, it will become readable again: ... THAT is a crucial observation. It's not an LVM bug, but the filesystem trying to read 1024 bytes on a 4096 device. I suspect it could also happen with an unaligned filesystem on a 4096 device. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
On Fri, 1 Mar 2019, Cesare Leonardi wrote: I've done the test suggested by Stuart and it seems to contradict this. I have pvmoved data from a 512/512 (logical/physical) disk to a newly added 512/4096 disk but I had no data corruption. Unfortunately I haven't any native 4k disk to repeat the same test. Use a loopback device with logical block size set to 4096 to confirm that your test does detect corruption (using the same LV, filesystem, data). I believe by "physical sector", the original reporter means logical, as he was using an encrypted block device that was virtual - there was no "physical" sector size. It was "physical" as far as the file system was concerned - where "physical" means "the next layer down". Indeed, even the rotating disk drives make the physical sector size invisible except to performance tests. SSD drives have a "sector" size of 128k or 256k - the erase block, and performance improves when aligned to that. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
On Thu, 28 Feb 2019, Cesare Leonardi wrote: Not to be pedantic, but what do you mean with physical block? Because with modern disks the term is not always clear. Let's take a mechanical disk with 512e sectors, that is with 4k sectors but exposed as 512 byte sectors. Fdisk will refer to it with these terms: Sector size (logical/physical): 512 bytes / 4096 bytes What you are referring as physical size is actually the logical size reported by fdisk, right? And if it's correct, I guess that should be safe to add the above disk with 512e sectors to an LVM storage composed only by disks with real 512 byte sectors. I expect that from the LVM point of view this should not be even considered a mixed sector size setup, even if the real physical sector size of the added disk is 4096 byte. Do you agree or do you think it would be better to test this specific setup? I would definitely test it, using the same test script that reproduces the problem with loopback devices. That said, I believe you are right - it should definitely work. Most of my drives are 512/4096 logical/phys. If you actually write a single 512 byte sector, however, the disk firmware will have to do a read/modify/write cycle - which can tank performance. hdparm will report logical and physical sector size - but there doesn't seem to be an option to set logical sectory size. There really is no need once you already support a smaller logical sector size, as the performance hit can be avoided by aligned filesystems with 4k+ block size (most modern filesystems). Once I encountered a bug in drive firmware where the R/M/W did not work correctly with certain read/write patterns (involving unaligned multi sector writes). I do not wish that on anyone. (don't worry, that drive model is long gone...). -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size
On Wed, 27 Feb 2019, Ingo Franzki wrote: The good thing about the example with encrypted volumes on loopback devices is that you can reproduce the problem on any platform, without having certain hardware requirements. The losetup command has a --sector-size option that sets the logical sector size. I wonder if that is sufficient to reproduce the problem. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] lvcreate from a setuid-root binary
It's not very elegant, but the quick and dirty solution is to use sudo to allow certain users to run specific commands with a real uid of root. You can say exactly what arguments the user has to use - the sudoers file is where this is configured. Or you can make a script - which is probably better. But said script should have no arguments, or as few as possible - because any complexity allows that user to attempt to exploit it to acheive root. Such a script could trivially bring a specific LV online, writable by a specific user. More complex requirement would be - more complex. If LVM has more elegant features for this kind of thing, I'm all ears. On Fri, Nov 16, 2018 at 8:43 AM, Christoph Pleger wrote: Go back to the beginning and describe the original problem you are trying to solve and the constraints you have and ask for advice about ways to achieve it. The beginning is that I want to create a user-specific logical volume when a user logs in to a service that authenticates its users through pam and that does not run as root. Regards Christoph ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] raid10 to raid what? - convert
On Wed, 18 Oct 2017, Tanstaafl wrote: and is not the same as raid1+0 (raid1 on top of raid0). Not according to everything I've ever read about it... for example: https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 But this is not certain as raid10 works perfectly well with 2 or 3 disks, including the redundancy. You must be talking about something else... RAID10 requires at least 4 disks, and always an even number, although most RAID controllers support the designation of at least one hot spare (so it will auto-rebuild using the hot spare in the event of a failure). Been using this configuration in my 5 drive QNAP NAS's for along time. Yep. Not talking about raid1+0 Linux raid10 really ought to be a "standard" - and effectively is. I use it whenever I can (with only 2 disks I use raid1 so I can alias the legs as non-raid). -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] raid10 to raid what? - convert
On Tue, 17 Oct 2017, Stuart D. Gathman wrote: thanks!), and is not the same as raid1+0 (raid1 on top of raid0). Sorry, that is raid0 on top of raid1. With raid1 on top, then after the first disk failure, the second failure has a 66% chance of destroying the data. With raid0 on top, the second failure has only a 33% chance of destroying the data. Raid10 is a way of striping the data with redundancy in a single layer. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] raid10 to raid what? - convert
You still haven't said what you are trying to accomplish. I wouldn't have bothered responding to such a vague question until you provided some tantalizing clues. Until your lastest clues, I would have advised using rsync or dd to copy your data to a new volume. But now it sounds like you ran out of budget for another disk, or need to minimize down time, and want to reconfigure in place. First off, raid10 is a linux specialty (I didn't know LVM supported it, thanks!), and is not the same as raid1+0 (raid1 on top of raid0). However, your previous clue tells us that probably you have at least 4 disks in your raid10. So you should be able to remove 2 legs from the raid10, and still have the equivalent of a raid0. But this is not certain as raid10 works perfectly well with 2 or 3 disks, including the redundancy. In any case, you can remove at least 1 leg from your raid10. Be sure to backup first in any case. Now that you've cut the disk used in half (and your data is precariously dependent on the health of *both* underlying physical disks), what did you want to do next? Maybe you just want to create a different kind of raid with the released space and do that rsync or dd? On Tue, 17 Oct 2017, lejeczek wrote: no other raids but LVM's own. not much in configuration, unless I misunderstand, question I'm posing is simple - is it possible to convert/split lvm raid10 LV into two raid0 LVs? (here one thing comes to mind: data stays intact?) Or raid10 LV to any other raid? As I sroogled I saw there is "mirror spitting" but term raid10 does not appear in that context, or I failed to find it. But man pages also mention, with regards to raid takeover: ".. · between striped/raid0 and raid10. ..." So it sounds like possible(so no other raid levels but these two), only might be intricate process/procedure, which is not documented, or again, I failed to find it. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial.___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] convert LV to physical device _in_place_?
On Thu, 13 Jul 2017, Zdenek Kabelac wrote: PV has 'header' so the real 'data' are shifted by PV header+lvm2 metadata. and also LV does not need to be sequential. However if you have been having a single 'segment' LV and and you calculate proper skipping offset (typically 1MB) you can try to use such device directly without lvm2 with a loop device mapping - see losetup --offset The AIX system used a single segment boot volume LV so that bootstrap code needed only an offset and did not need to understand the LVM. But the boot volume was a normal LV in all other respects. I think there was a flag on the LV to ensure it remained a single segment. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
On Tue, 18 Apr 2017, Gionatan Danti wrote: Any thoughts on the original question? For snapshot with relatively big CoW table, from a stability standpoint, how do you feel about classical vs thin-pool snapshot? Classic snapshots are rock solid. There is no risk to the origin volume. If the snapshot CoW fills up, all reads and all writes to the *snapshot* return IOError. The origin is unaffected. If a classic snapshot exists across a reboot, then the entire CoW table (but not the data chunks) must be loaded into memory when the snapshot (or origin) is activated. This can greatly delay boot for a large CoW. For the common purpose of temporary snapsnots for consistent backups, this is not an issue. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
On Thu, 13 Apr 2017, Xen wrote: Stuart Gathman schreef op 13-04-2017 17:29: IMO, the friendliest thing to do is to freeze the pool in read-only mode just before running out of metadata. It's not about metadata but about physical extents. In the thin pool. Ok. My understanding is that *all* the volumes in the same thin-pool would have to be frozen when running out of extents, as writes all pull from the same pool of physical extents. -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
On Thu, 13 Apr 2017, Xen wrote: Stuart Gathman schreef op 13-04-2017 17:29: understand and recover. A sysadmin could have a plain LV for the system volume, so that logs and stuff would still be kept, and admin logins work normally. There is no panic, as the data is there read-only. Yeah a system panic in terms of some volume becoming read-only is perfectly acceptable. However the kernel going entirely mayhem, is not. Heh. I was actually referring to *sysadmin* panic, not kernel panic. :-) But yeah, sysadmin panic can result in massive data loss... -- Stuart D. Gathman "Confutatis maledictis, flamis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/