Re: [linux-lvm] Can I combine LUKS and LVM to achieve encryption and snapshots?

2023-09-26 Thread Stuart D Gathman

On Wed, 27 Sep 2023, Jean-Marc Saffroy wrote:


So I prefer to manage available raw (un-encrypted) space with LVM.

Now, I also need to do backups of /home, and that's why I want
snapshots. But that first layer of LVM would only show a snapshot of
an encrypted volume, and the backup job shouldn't have the passphrase
to decrypt the volume.

Which is why I'm trying to find a way of doing snaphots of an "opened"
LUKS volume: this way, the backup job can do its job without requiring
a passphrase.


Besides LVM on LUKS on LVM which you already tried, consider using
a filesystem that supports snapshots.  I use btrfs, and snapshots work
beautifully, and if you use "btrfs send" you can even do differential
backups.  Btrfs is COW, so snaps share all blocks not touched.

Pipe the output of btrfs send directly to your backup process/server
running "btrfs receive".  Note, this requires the backup server to have
btrfs.  If it doesn't, then just use rsync from the snapshot directory
to the backup server like a typical unix backup solution.  (E.g. my vm
host uses XFS on the backup drives, so it uses rsync.)


In simple tests, I could make it work, with dmsetup on LUKS on LVM,
and also (after I sent my original email) with LVM on LUKS on LVM.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Swapping LLVM drive

2023-08-29 Thread Stuart D Gathman

On Mon, 28 Aug 2023, Roska Postit wrote:


After reading your answer more carefully I got the following idea:



How do you see if I boot the system (this is a desktop computer and the old
and the new drive are both NVMe SSD) from USB Linux and then just do a 'dd'
for the entire drive (in block level, bit-by-bit). Then I remove the old
disk out of the system. Shouldn't it boot normally now ?


That would work, yes, but you don't expand /boot - which you really
should do.  Also, copying the entire filesystem is not only less
efficient, but involves needless writes to an SSD (where they are 
a more limited resource than on magnetic drives).



Then I will create a new partition for the all unused space (1.5GB) on new
disk which I then will add to the LVM as a new Physical Volume (PV) in


That is pointless when you can just expand the partition (which is
trivial when it is the last one).  You don't want more PVs 
unless they are actually physical volumes - or there is some special

circumstance that prevents just expanding the partition.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Swapping LLVM drive

2023-08-29 Thread Stuart D Gathman

On Mon, 28 Aug 2023, Phillip Susi wrote:


Why would you use dd/partclone instead of just having LVM move
everything to the new drive on the fly?



Partition the new drive, use pvcreate to initialize the partition as a
pv, vgextend to add the pv to the existing vg, pvmove to evacuate the
logical volumes from the old disk, then vgreduce to remove it from the
vg.


1. Unnecessary copying. 
2. You lose your free backup of the system on the old drive,

   which should be carefully labeled and kept handy for a year.
   (After that, SSDs start to run the risk of data retention issues.)


Don't forget you'll need to reinstall grub on the new drive for it to
boot.


And that is the most important reason.  "Just reinstall grub" is a
much larger learning curve than "dd" IMO.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Swapping LLVM drive

2023-08-28 Thread Stuart D Gathman

On Sun, 27 Aug 2023, Roska Postit wrote:


What is the most proper way to swap my 500GB SSD drive to the bigger 2TB SSD
drive in the following LLVM configuration ?

nvme0n1            259:0    0 465,8G  0 disk  
├─nvme0n1p1        259:1    0   512M  0 part  /boot/efi
├─nvme0n1p2        259:2    0   488M  0 part  /boot
└─nvme0n1p3        259:3    0 464,8G  0 part  
  ├─pc3--vg-root   254:0    0 463,8G  0 lvm   /
  └─pc3--vg-swap_1 254:1    0   980M  0 lvm   [SWAP]


Since you are not mirroring, just add the new drive.

If this is a laptop, and you can only have one drive, then I suggest
you mount the new drive via USB (note there are at least 2 kinds of
nvme interface and you have to get a matching USB enclosure).

Use dd to copy the partition table (this also often contains boot code)
to the new disk on USB.
Then use dd to copy the smaller partitions (efi,boot). 
Now use cfdisk to delete the 3rd partition. 
Expand the boot partition to 1G (you'll thank me later).

Allocate the entire rest of the disk to p3.
Create a new vg with a different name.  Allocate root and swap on
new VG the same sizes.
Take a snapshot of current root (delete swap on old drive since you
didn't leave yourself any room), and use partclone to efficiently
copy the filesystem over to new root.

Either a) edit grub and fstab on new drive to use new vg name  or
   b) boot from a live media to rename old and new vg or
   c) rename vg just before shutting down to remove drive -
  I think LVM can operate with dup VG name, but I've never
  navigated the details.

Swap drives after powerdown.

A modern filesystem like ext2, xfs, btrfs, etc can expand as you expand
the root LV.  Leave yourself some working room in the VG.___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] bug? shrink lv by specifying pv extent to be removed does not behave as expected

2023-04-09 Thread Stuart D Gathman

I use a utility that maps bad sectors to files, then move/rename the
files into a bad blocks folder.  (Yes, this doesn't work when critical
areas are affected.)  If you simply remove the files, then
modern disks will internally remap the sectors when they are written
again  - but the quality of remapping implementations varies.

It is more time efficient to just buy a new disk, but with wars and
rumors of wars threatening to disrupt supply chains, including tech,
it's nice to have the skills to get more use from failing hardware.

Plus, it is a challenging problem, which can be fun to work on at leisure.

On Sun, 9 Apr 2023, Roland wrote:


 What is your use case that you believe removing a block in the middle
 of an LV needs to work?


my use case is creating some badblocks script with lvm which intelligently
handles and skips broken sectors on disks which can't be used otherwise...


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] How to implement live migration of VMs in thinlv after using lvmlockd

2022-11-01 Thread Stuart D Gathman

Z> On Tue, Nov 01, 2022 at 01:36:17PM +0800, Zhiyong Ye wrote:

I want to implement live migration of VMs in the lvm + lvmlockd + sanlock
environment. There are multiple hosts in the cluster using the same iscsi
connection, and the VMs are running on this environment using thinlv
volumes. But if want to live migrate the vm, it will be difficult since
thinlv which from the same thin pool can only be exclusive active on one
host.


I just expose the LV (thin or not - I prefer not) as an iSCSI target
that the VM boots from.  There is only one host that manages a thin pool, 
and that is a single point of failure, but no locking issues.  You

issue the LVM commands on the iSCSI server (which I guess they call NAS
these days).

If you need a way for a VM to request enlarging an LV it accesses, or
similar interaction, I would make a simple API where each VM gets a
token that determines what LVs it has access to and how much total
storage it can consume.  Maybe someone has already done that.
I just issue the commands on the LVM/NAS/iSCSI host.

I haven't done this, but there can be more than one thin pool, each on
it's own NAS/iSCSI server.  So if one storage server crashes, then
only the VMs attached to it crash.  You can only (simply) migrate a VM 
to another VM host on the same storage server.


BUT, you can migrate a VM to another host less instantly using DRBD
or other remote mirroring driver.  I have done this.  You get the
remote LV mirror mostly synced, suspend the VM (to a file if you need
to rsync that to the remote), finish the sync of the LV(s), resume the
VM on the new server - in another city.  Handy when you have a few hours
notice of a natural disaster (hurricane/flood).

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Where are the data blocks of an LV?

2022-07-26 Thread Stuart D Gathman

Checkout https://github.com/sdgathman/lbatofile
It was written to identify the file affected by a bad block (so it goes
the opposite direction), but the getpvmap() function obtains pe_start and 
pe_size plus the list of segments.  findlv() goes through the segments to
find the one an absolute sector is in.  That should tell you what you want 
to know.


On Fri, 22 Jul 2022, Marcin Owsiany wrote:


I know that it is possible to find where a given logical volume's extents
start on a PVs thanks to the information printed by lvdisplay --maps.

I was also able to experimentally establish that the actual data of an LV
starts one megabyte after that location. Is that offset documented anywhere,
or can I somehow discover it at runtime for a given LV?


___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks

2022-05-08 Thread Stuart D Gathman

On Sat, 7 May 2022, Alex Lieflander wrote:


I don’t trust the hardware I’m running on very much, but it’s all I have to 
work with at the moment; it’s important that the array is resilient to *any* 
(and multiple) single chunk corruptions because such corruptions are likely to 
happen in the future.

For the last several months I’ve periodically been seeing (DM-Integrity) 
checksum mismatch warnings at various locations on all of my disks. I stopped 
using a few SATA ports that were explicitly throwing SATA errors, but I suspect 
that the remaining connections are unpredictably (albeit infrequently) 
corrupting data in ways that are more difficult to detect.


Sounds like a *great* test bed for software data integrity tools.  Don't
throw that system away when you get a more reliable one!

That sounds like a situation that btrfs with multiple copies could
handle.  Use a beefier checksum than the default crc-32 also.
___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks

2022-05-07 Thread Stuart D Gathman

On Fri, 6 May 2022, Alex Lieflander wrote:


Thanks. I really don’t want to give up the DM-Integrity management. Less 
complexity is just a bonus.


What are you trying to get out of RAID6?  If redundancy and integrity
are already managed at another layer, then just use RAID0 for striping.

I like to use RAID10 for mirror + striping, but I understand parity disks 
give redundancy without halving capacity.  Parity means RMW cycles of

largish blocks, whereas straight mirroring (RAID1, RAID10) can write
single sectors without a RMW cycle.___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM performance vs direct dm-thin

2022-01-30 Thread Stuart D. Gathman
On Sun, 2022-01-30 at 11:45 -0500, Demi Marie Obenour wrote:
> On Sun, Jan 30, 2022 at 11:52:52AM +0100, Zdenek Kabelac wrote:
> > 
> 
> > Since you mentioned ZFS - you might want focus on using 'ZFS-only'
> > solution.
> > Combining  ZFS or Btrfs with lvm2 is always going to be a painful
> > way as
> > those filesystems have their own volume management.
> 
> Absolutely!  That said, I do wonder what your thoughts on using loop
> devices for VM storage are.  I know they are slower than thin
> volumes,
> but they are also much easier to manage, since they are just ordinary
> disk files.  Any filesystem with reflink can provide the needed
> copy-on-write support.

I use loop devices for test cases - especially with simulated IO
errors.  Devs really appreciate having an easy reproducer for
database/filesystem bugs (which often involve handling of IO errors). 
But not for production VMs.

I use LVM as flexible partitions (i.e. only classic LVs, no thin pool).
Classic LVs perform like partitions, literally using the same driver
(device mapper) with a small number of extents, and are if anything
more recoverable than partition tables.  We used to put LVM on bare
drives (like AIX did) - who needs a partition table?  But on Wintel,
you need a partition table for EFI and so that alien operating systems
know there is something already on a disk.

Your VM usage is different from ours - you seem to need to clone and
activate a VM quickly (like a vps provider might need to do).  We
generally have to buy more RAM to add a new VM :-), so performance of
creating a new LV is the least of our worries.

Since we use LVs like partitions - mixing with btrfs is not an issue. 
Just use the LVs like partitions.  I haven't tried ZFS on linux - it
may have LVM like features that could fight with LVM.  ZFS would be my
first choice on a BSD box.

We do not use LVM raid - but either run mdraid underneath, or let btrfs
do it's data duplication thing with LVs on different spindles.





___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] convert logical sector -> physical sector + pv/vg extent number

2022-01-06 Thread Stuart D Gathman

On Mon, 3 Jan 2022, Roland wrote:


any chance to get retrieve this information for automated/script based
processing ?


You might find this script enlightening:

https://github.com/sdgathman/lbatofile

It maps bad sectors to partition,LV,file,etc

The relevant function for your question is findlv()
Some of the commands run are:

pvdisplay --units k -m '/dev/lvm_pv'
pvs --units k -o+pe_start '/dev/lvm_pv'


Am 03.01.22 um 00:12 schrieb Andy Smith:

 On Sun, Jan 02, 2022 at 08:00:30PM +0100, Roland wrote:

 if i have a logical sector/block "x" on a lvm logical volume ,  is there
 a way to easily calculate/determine (optimally by script/cli) the
 corresponding physical sector of the physical device it belongs to and
 the extent number of the appropiate pv/vg where it resides ?___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?

2021-12-29 Thread Stuart D Gathman

On Tue, 28 Dec 2021, Tomas Dalebjörk wrote:


Yes, it is an incremental backup based of the cow device


I've used such a COW based backup (can't remember the name just now, currently
using DRBD and rsync for incremental mirrors).  The way it worked was to
read and interpret the raw COW device itself and send blocks over the
wire - writing directly to a volume on the remote end.  It did not try
to patch up metadata and use LVM to merge.  You need an intimate
knowledge of COW internals for either approach - BUT the read-only
approach (with plain writes at the other end) is MUCH safer (not going
to trash metadata at either end) and just as efficient on the wire.

I've also used a block device rsync, that read every block on both
sides and compared hashes - but that is obviously a lot more disk io
that using the COW where LVM is already tracking changed blocks.___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] how to convert a disk containing a snapshot to a snapshot lv?

2021-12-27 Thread Stuart D Gathman

If you want to give it a try, just create a snapshot on a specific device
And change all the blocks on the origin, there you are, you now have a cow
device containing all data needed.
How to move this snapshot device to another server, reattach it to an empty
lv volume as a snapshot.
lvconvert -s, command requires an argument of an existing snapshot volume
name.
But there is no snapshot on the new server, so it can't re-attach the
volume.
So what procedures should be invoked to create just the detached references
in LVM, so that the lvconver -s command can work?


Just copy the snapshot to another server, by whatever method you would
use to copy the COW and Data volumes (I prefer partclone for supported
filesystems).  No need for lvconvert.  You are trying WAY WAY too hard.
Are you by any chance trying to create an incremental backup system
based on lvm snapshot COW?  If so, say so.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Mapping lvol off to pvol off

2021-08-18 Thread Stuart D Gathman

On Tue, 17 Aug 2021, Chethan Seshadri wrote:


Can someone help to covert an offset within an lvol to the corresponding
pvol offset using...

1. lvm commands
2. lvmdbusd APIs


This utility does that:
https://github.com/sdgathman/lbatofile

See getlv() and getpvmap()
https://github.com/sdgathman/lbatofile/blob/master/lbatofile.py

Caveat, only tested with classic LVs.  Dunno about thin stuff.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Does LVM have any plan/schedule to support btrfs in fsadm

2021-06-27 Thread Stuart D Gathman

On Mon, 28 Jun 2021, heming.z...@suse.com wrote:


In my opinion, the using style of btrfs by many users are same as ext4/xfs.


Yes.  I like the checksums in metadata feature for enhanced integrity
checking.

It seems too complicated to have anytime soon - but when a filesystem
detects corruption, and is on an LVM (or md) RAID1 layer, an ioctl to
read alternate mirror branches to see which (if any) has the correct
data would allow recovery.  Btrfs does this if it is doing the
mirroring, but then you lose all the other features from LVM or md raid10, 
including running other filesystems and efficient virtual disks for

virtual machines.

We eventually got DISCARD operations to pass to lower layers.
Dealing with mirror branches should really be a thing too.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] lvm limitations

2020-09-15 Thread Stuart D Gathman

On Tue, 15 Sep 2020, Tomas Dalebjörk wrote:


ok, lets say that I have 10 LV on a server, and want create a thin lv
snapshot every hour and keep that for 30 days that would be 24h *
30days * 10lv = 720 lv



if I want to keep snapshot copies from more nodes, to serve a single
repository of snapshot copies, than these would easily become several
hundred thousands of lv



not sure if this is a good idea, but I guess it can be very useful in
some sense as block level incremental forever and instant recovery can
be implemented for open sourced based applications

what reflections do you have on this idea?


My feeling is that btrfs is a better solution for the hourly snapshots.
(Unless you are testing a filesystem :-)

I find "classic" LVs a robust replacement for partitions that are easily
resized without moving data around.  I would be more likely to try
RAID features on classic LVs than thin LVs.___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] What to do about new lvm messages

2020-08-22 Thread Stuart D Gathman

On Sat, 22 Aug 2020, L A Walsh wrote:


I am trying to create a new pv/vg/+lvs setup but am getting some weird messages

pvcreate  -M2 --pvmetadatacopies 2 /dev/sda1
 Failed to clear hint file.
 WARNING: PV /dev/sdd1 in VG Backup is using an old PV header, modify
the VG to update.
 Physical volume "/dev/sda1" successfully created.

So why am I getting a message about not clearing a hint file (running s root)


Because there is an ole PV header on sdd1


From an online man page, I should have been able to use -ff to

recreate a pv over
the top of a preexisting one, but that didn't seem to work.  I got:
pvcreate -ff -M2 --pvmetadatacopies 2 /dev/sda1
 Failed to clear hint file.
 WARNING: PV /dev/sdd1 in VG Backup is using an old PV header, modify


You wrote a new PV header on sda1 - but that didn't do diddly squat 
about the old one on sdd1.



the VG to update.
 Cannot access VG Space with system ID Ishtar with unknown local system ID.
 Device /dev/sda1 excluded by a filter.


The PV filter is excluding sda1.  Are you confused about what is on 
which sdx?



dd if=/dev/zero of=/dev/sda1 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000175409 s, 23.4 MB/s


I hope sda1 was really what you think it was.

What is on sdd1?  None of your listings examine it.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] [1019133-4yqc8ex4] LVM hangs after volume change

2020-04-16 Thread Stuart D Gathman

On Wed, 15 Apr 2020, Shock Media B.V. support wrote:


We use an mdadm raid-config consisting of 4 or more SSD's/Disks where
we use part of the disks for a raid1,raid10 or raid5. We create
volumes on 2 nodes and use DRBD to keep these 2 volumes in sync and we
run a virtual machine (using KVM) on this volume.


I've used DRBD for an almost identical setup, except only raid1 - we
were even cheaper that you. :->

I would first suspect DRBD hanging - and you should check the queue
stats to see if there is a backlog.  If you are using the paid version
with a userland smart buffering process, check whether that has died.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] faster snapshot creation?

2020-02-25 Thread Stuart D. Gathman

On Sat, 22 Feb 2020, Eric Toombs wrote:


Snapshot creation is already pretty fast:


$ time sudo lvcreate --size 512M --snapshot --name snap /dev/testdbs/template
  Logical volume "snap" created.
0.03user 0.05system 0:00.46elapsed 18%CPU (0avgtext+0avgdata 28916maxresident)k
768inputs+9828outputs (0major+6315minor)pagefaults 0swaps


That's about half a second in real time. But I have a scenario that
would benefit from it being even faster. I'm doing many small unit tests

So, is there a sort of "dumber" way of making these snapshots, maybe by
changing the allocation algorithm or something?


How about using a filesystem that supports snapshot, e.g. nilfs, or
(I think) btrfs?  That would be much faster than doing it at the LVM
level, which has to sync metadata and stuff.

a) load your template into work directory
b) tag snapshot
c) run test (possibly in container)
d) restore tagged snapshot
e) goto c



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] pvmove --abort

2020-01-27 Thread Stuart D Gathman

On Mon, 27 Jan 2020, Matthias Leopold wrote:

I consciously used "pvmove --abort" for the first time now and I'm astonished 
it doesn't behave like described in the man page. No matter if I've used 
"--atomic" for the original command, when I interrupt the process with 
"pvmove --abort" lvm always completely rolls back my copy operation. I would 
expect that if I don't use "--atomic" then "--abort" will result in "segments 
that have been moved will remain on the destination PV, while unmoved 
segments will remain on the source PV" (from man page). Am I missing 
something?


I'm not an LVM guru, but I think I got this one!  pvmove effectively
creates a mirror on the destination, and begins syncing the mirror.
Any writes to the LV go to *both* the source and destination.
When you abort, it simply discards the partially synced mirror.
When the sync is complete, it discards the source leg of the mirror
instead.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Failed merge, still running?

2020-01-22 Thread Stuart D. Gathman

On Wed, 22 Jan 2020, Mauricio Tavares wrote:


lvconvert --merge vmhost_vg0/desktop_snap_20200121

and instead of seeing the usual percentage of how far it has
completed, I got nothing. lvs -a -o +devices shows

 LV  VG Attr   LSize  Pool Origin
Data%  Meta%  Move Log Cpy%Sync Convert Devices
 desktop vmhost_vg0 Owi-a-s--- 20.00g
/dev/sdb3(14848)
 [desktop_snap_20200121] vmhost_vg0 Swi-a-s--- 10.00g  desktop 100.00


While not a guru, I think I can tell you the issue.  It looks like
the snapshot was full (says 100.00).  The snapshot is unusable at
that point.  Maybe it wasn't before you started the merge, and the
merge sets it to 100.00 when it starts, I haven't noticed.


Before I blow that lv up, what else should I be checking?


What was the snapshot percent used before you started the merge?

--
  Stuart D. Gathman 
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Best way to run LVM over multiple SW RAIDs?

2019-12-07 Thread Stuart D. Gathman

On Sat, 7 Dec 2019, John Stoffel wrote:


The biggest harm to performance here is really the RAID5, and if you
can instead move to RAID 10 (mirror then stripe across mirrors) then
you should be a performance boost.


Yeah, That's what I do.  RAID10, and use LVM to join together as JBOD.
I forgot about the raid 5 bottleneck part, sorry.


As Daniel says, he's got lots of disk load, but plenty of CPU, so the
single thread for RAID5 is a big bottleneck.



I assume he wants to use LVM so he can create volume(s) larger than
individual RAID5 volumes, so in that case, I'd probably just build a
regular non-striped LVM VG holding all your RAID5 disks.  Hopefully


Wait, that's what I suggested!


If you can, I'd get more SSDs and move to RAID1+0 (RAID10) instead,
though you do have the problem where a double disk failure could kill
your data if it happens to both halves of a mirror.


No worse than raid5.  In fact, better because the 2nd fault always
kills the raid5, but only has a 33% or less chance of killing the
raid10.  (And in either case, it is usually just specific sectors,
not the entire drive, and other manual recovery techniques can come into
play.)

--
  Stuart D. Gathman 
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] Best way to run LVM over multiple SW RAIDs?

2019-12-07 Thread Stuart D. Gathman

On Tue, Oct 29, 2019 at 12:14 PM Daniel Janzon  
wrote:

I have a server with very high load using four NVMe SSDs and
therefore no HW RAID. Instead I used SW RAID with the mdadm tool.
Using one RAID5 volume does not work well since the driver can only
utilize one CPU core which spikes at 100% and harms performance.
Therefore I created 8 partitions on each disk, and 8 RAID5s across
the four disks.



Now I want to bring them together with LVM. If I do not use a striped
volume I get high performance (in expected magnitude according to disk
specs). But when I use a striped volume, performance drops to a
magnitude below. The reason I am looking for a striped setup is to


The mdadm layer already does the striping.  So doing it again in the LVM
layer completely screws it up.  You want plain JBOD (Just a Bunch
Of Disks).

--
  Stuart D. Gathman 
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] exposing snapshot block device

2019-10-22 Thread Stuart D. Gathman

On Tue, 22 Oct 2019, Gionatan Danti wrote:

The main thing that somewhat scares me is that (if things had not changed) 
thinvol uses a single root btree node: losing it means losing *all* thin 
volumes of a specific thin pool. Coupled with the fact that metadata dump are 
not as handy as with the old LVM code (no vgcfgrestore), it worries me.


If you can find all the leaf nodes belonging to the root (in my btree
database they are marked with the root id and can be found by sequential
scan of the volume), then reconstructing the btree data is
straightforward - even in place.

I remember realizing this was the only way to recover a major customer's
data - and had the utility written, tested, and applied in a 36 hour
programming marathon (which I hope to never repeat).  If this hasn't
occured to thin pool programmers, I am happy to flesh out the procedure.
Having such a utility available as a last resort would ratchet up the
reliability of thin pools.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



Re: [linux-lvm] exposing snapshot block device

2019-10-22 Thread Stuart D. Gathman

On Tue, 22 Oct 2019, Zdenek Kabelac wrote:


Dne 22. 10. 19 v 17:29 Dalebjörk, Tomas napsal(a):
But, it would be better if the cow device could be recreated in a faster 
way, mentioning that all blocks are present on an external device, so that 
the LV volume can be restored much quicker using "lvconvert --merge" 
command.


I do not want to break your imagination here, but that is exactly the thing 
you can do with thin provisioning and thin_delta tool.


lvconvert --merge does a "rollback" to the point at which the snapshot
was taken.  The master LV already has current data.  What Tomas wants to
be able to do a "rollforward" from the point at which the snapshot was
taken.  He also wants to be able to put the cow volume on an
extern/remote medium, and add a snapshot using an already existing cow.

This way, restoring means copying the full volume from backup, creating
a snapshot using existing external cow, then lvconvert --merge 
instantly logically applies the cow changes while updating the master

LV.

Pros:

"Old" snapshots are exactly as efficient as thin when there is exactly
one.  They only get inefficient with multiple snapshots.  On the other
hand, thin volumes are as inefficient as an old LV with one snapshot.
An old LV is as efficient, and as anti-fragile, as a partition.  Thin
volumes are much more flexible, but depend on much more fragile database
like meta-data.

For this reason, I always prefer "old" LVs when the functionality of
thin LVs are not actually needed.  I can even manually recover from
trashed meta data by editing it, as it is human readable text.

Updates to the external cow can be pipelined (but then properly
handling reads becomes non trivial - there are mature remote block
device implementations for linux that will do the job).

Cons:

For the external cow to be useful, updates to it must be *strictly*
serialized.  This is doable, but not as obvious or trivial as it might
seem at first glance.  (Remote block device software will take care
of this as well.)

The "rollforward" must be applied to the backup image of the snapshot.
If the admin gets it paired with the wrong backup, massive corruption
ensues.  This could be automated.  E.g. the full image backup and
external cow would have unique matching names.  Or the full image backup
could compute an md5 in parallel, which would be store with the cow.
But none of those tools currently exist.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] repair pool with bad checksum in superblock

2019-08-23 Thread Stuart D. Gathman

On Fri, 23 Aug 2019, Gionatan Danti wrote:


Il 23-08-2019 14:47 Zdenek Kabelac ha scritto:

Ok - serious disk error might lead to eventually irrepairable metadata
content - since if you lose some root b-tree node sequence it might be
really hard
to get something sensible  (it's the reason why the metadata should be 
located

on some 'mirrored' device - since while there is lot of effort put into
protection again software errors - it's hard to do something with
hardware error...


Would be possible to have a backup superblock, maybe located on device end?
XFS, EXT4 and ZFS already do something similar...


On my btree file system, I can recover from arbitrary hardware
corruption by storing the root id of the file (table) in each node. 
Leaf nodes (with full data records) are also indicated.  Thus, even if

the root node of a file is lost/corrupted, the raw file/device can be
scanned for corresponding leaf nodes to rebuild the file (table) with
all remaining records.

Drawbacks: deleting individual leaf nodes requires changing the root id
of the node requiring an extra write.  (Otherwise records could be
included in some future recovery.)  Deleting entire files (tables) 
just requires marking the root node deleted - no need to write all the

leaf nodes.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Power loss consistency for RAID

2019-03-18 Thread Stuart D. Gathman

On Sun, 17 Mar 2019, Zheng Lv wrote:

I'm recently considering using software RAID instead of hardware controllers 
for my home server.


AFAIK, write operation on a RAID array is not atomic across disks. I'm 
concerned that what happens to RAID1/5/6/10 LVs after power loss.


Is manual recovery required, or is it automatically checked and repaired on 
LV activation?


Also I'm curious about how such recovery works internally.


I use md raid1 and raid10.  I recommend that instead of the LVM RAID,
which is newer.  Create your RAID volumes with md, and add them as PVs:

  PV VG  Fmt  Attr PSize   PFree
  /dev/md1   vg_span lvm2 a--u 214.81g  0
  /dev/md2   vg_span lvm2 a--u 214.81g  26.72g
  /dev/md3   vg_span lvm2 a--u 249.00g 148.00g
  /dev/md4   vg_span lvm2 a--u 252.47g 242.47g

Note that you do not need matching drives as with hardware RAID, you
can add disks and mix and match partitions of the same size on drives
of differing sizes.  LVM does this automatically, you have to manually
assign partitions to block devices with md.  There are very few (large)
partitions to assign, so it is a pleasant human sized exercise.

While striping and mirror schemes like raid0, raid1, raid10 are actually
faster with software RAID, I avoid RAID schemes with RMW cycles like
raid5 - you really need the hardware for those.

I use raid1 when the filesystem needs to be readable without the md 
driver - as with /boot.  Raid10 provides striping as well as mirroring,

with however many drives you have (I usually have 3 or 4).

Here is a brief overview of MD recovery and diagnostics.  Someone else
will have to fill in with the mechanics of LVM raid.

Md keeps a version in the superblock of each device in a logical md
drive - and marks the older leg as failed and replaced (and begins to
sync it).  In newer superblock formats, it also keeps a bit map so that it 
can sync only possibly modified areas.


Once a week (configurable), check_raid compares the legs (on most
distros).  If it encounters a read error on either drive, it immediately
syncs that block from the good drive.  This reassigns the sector on
modern drives.  (On ancient drives, a write error on resync marks the
drive as failed.) If for some reason (there are legitimate ones
involving write optimizations for SWAP volumes and such) the two legs do
not match, it arbitrarily copies one leg to the other, keeping a count.
(IMO it should also log the block offset so that I can occasionally check
that the out of sync occurred in an expected volume.)

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-05 Thread Stuart D. Gathman

On Tue, 5 Mar 2019, David Teigland wrote:


On Tue, Mar 05, 2019 at 06:29:31PM +0200, Nir Soffer wrote:

Maybe LVM should let you mix PVs with different logical block size, but it
should
require --force.


LVM needs to fix this, your solution sounds like the right one.


Also, since nearly every modern device device has a physical block size of
4k or more, and even when the logical block size is (emulated) 512,
performance degradation occurs with smaller filesystem blocks, 
then the savvy admin should ensure that all filesystem have a min of 
4k block size - except in special circustances.


--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-03-04 Thread Stuart D. Gathman

On Mon, 4 Mar 2019, Cesare Leonardi wrote:

Today I repeated all the tests and indeed in one case the mount failed: after 
pvmoving from the 512/4096 disk to the 4096/4096 disk, with the LV ext4 using 
1024 block size.

 ...
The error happened where you guys expected. And also for me fsck showed no 
errors.


But doesn't look like a filesystem corruption: if you pvmove back the data, 
it will become readable again:

 ...

THAT is a crucial observation.  It's not an LVM bug, but the filesystem
trying to read 1024 bytes on a 4096 device.  I suspect it could also
happen with an unaligned filesystem on a 4096 device.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Filesystem corruption with LVM's pvmove onto a PV with a larger physical block size

2019-02-28 Thread Stuart D. Gathman

On Fri, 1 Mar 2019, Cesare Leonardi wrote:


I've done the test suggested by Stuart and it seems to contradict this.
I have pvmoved data from a 512/512 (logical/physical) disk to a newly added 
512/4096 disk but I had no data corruption. Unfortunately I haven't any 
native 4k disk to repeat the same test.


Use a loopback device with logical block size set to 4096 to confirm
that your test does detect corruption (using the same LV, filesystem,
data).

I believe by "physical sector", the original reporter means logical,
as he was using an encrypted block device that was virtual - there
was no "physical" sector size.  It was "physical" as far as the
file system was concerned - where "physical" means "the next layer
down".

Indeed, even the rotating disk drives make the physical sector size
invisible except to performance tests.  SSD drives have a "sector" size
of 128k or 256k - the erase block, and performance improves when aligned
to that.

--
  Stuart D. Gathman 
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvcreate from a setuid-root binary

2018-11-21 Thread Stuart D. Gathman
It's not very elegant, but the quick and dirty solution is to use sudo 
to allow certain users to run specific commands with a real uid of 
root.  You can say exactly what arguments the user has to use - the 
sudoers file is where this is configured.  Or you can make a script - 
which is probably better.  But said script should have no arguments, or 
as few as possible - because any complexity allows that user to attempt 
to exploit it to acheive root.  Such a script could trivially bring a 
specific LV online, writable by a specific user.  More complex 
requirement would be - more complex.


If LVM has more elegant features for this kind of thing, I'm all ears.

On Fri, Nov 16, 2018 at 8:43 AM, Christoph Pleger 
 wrote:

Go back to the beginning and describe the original problem you are
trying to solve and the constraints you have and ask for advice about
ways to achieve it.


The beginning is that I want to create a user-specific logical volume 
when a user logs in to a service that authenticates its users through 
pam and that does not run as root.


Regards
  Christoph

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] raid10 to raid what? - convert

2017-10-18 Thread Stuart D. Gathman

On Wed, 18 Oct 2017, Tanstaafl wrote:




and is not the same as raid1+0 (raid1 on top of raid0).


Not according to everything I've ever read about it... for example:


https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10


But this is not certain as raid10 works perfectly well with 2 or 3
disks, including the redundancy.

You must be talking about something else... RAID10 requires at least 4
disks, and always an even number, although most RAID controllers support
the designation of at least one hot spare (so it will auto-rebuild using
the hot spare in the event of a failure). Been using this configuration
in my 5 drive QNAP NAS's for along time.


Yep.  Not talking about raid1+0

Linux raid10 really ought to be a "standard" - and effectively is.
I use it whenever I can (with only 2 disks I use raid1 so I can alias
the legs as non-raid).

--
      Stuart D. Gathman <stu...@gathman.org>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] convert LV to physical device _in_place_?

2017-07-13 Thread Stuart D. Gathman

On Thu, 13 Jul 2017, Zdenek Kabelac wrote:


PV has 'header' so the real 'data' are shifted by  PV header+lvm2 metadata.
and also LV does not need to be sequential.

However if you have been having a single 'segment' LV and and you calculate
proper skipping offset (typically 1MB) you can try to use such device 
directly without lvm2 with a loop device mapping -  see losetup --offset


The AIX system used a single segment boot volume LV so that bootstrap
code needed only an offset and did not need to understand the LVM. 
But the boot volume was a normal LV in all other respects.  I think

there was a flag on the LV to ensure it remained a single segment.

--
  Stuart D. Gathman <stu...@gathman.org>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-18 Thread Stuart D. Gathman

On Tue, 18 Apr 2017, Gionatan Danti wrote:

Any thoughts on the original question? For snapshot with relatively big CoW 
table, from a stability standpoint, how do you feel about classical vs 
thin-pool snapshot?


Classic snapshots are rock solid.  There is no risk to the origin
volume.  If the snapshot CoW fills up, all reads and all writes to the
*snapshot* return IOError.  The origin is unaffected.

If a classic snapshot exists across a reboot, then the entire CoW table
(but not the data chunks) must be loaded into memory when the snapshot 
(or origin) is activated.  This can greatly delay boot for a large CoW.


For the common purpose of temporary snapsnots for consistent backups,
this is not an issue.

--
  Stuart D. Gathman <stu...@gathman.org>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Stuart D. Gathman

On Thu, 13 Apr 2017, Xen wrote:


Stuart Gathman schreef op 13-04-2017 17:29:


 IMO, the friendliest thing to do is to freeze the pool in read-only mode
 just before running out of metadata.


It's not about metadata but about physical extents.

In the thin pool.


Ok.  My understanding is that *all* the volumes in the same thin-pool would 
have to be frozen when running out of extents, as writes all pull from

the same pool of physical extents.

--
  Stuart D. Gathman <stu...@gathman.org>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Stuart D. Gathman

On Thu, 13 Apr 2017, Xen wrote:


Stuart Gathman schreef op 13-04-2017 17:29:


 understand and recover.   A sysadmin could have a plain LV for the
 system volume, so that logs and stuff would still be kept, and admin
 logins work normally.  There is no panic, as the data is there read-only.


Yeah a system panic in terms of some volume becoming read-only is perfectly 
acceptable.


However the kernel going entirely mayhem, is not.


Heh.  I was actually referring to *sysadmin* panic, not kernel panic.
:-)

But yeah, sysadmin panic can result in massive data loss...

--
  Stuart D. Gathman <stu...@gathman.org>
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/