Re: [linux-lvm] Cannot activate LVs in VG xxx while PVs appear on duplicate devices.

2018-06-09 Thread Xen

On Sat, 9 Jun 2018, Wolfgang Denk wrote:


Any help how to fix/avoid this problem would be highly appreciated.


You can try to create a global_filter in /etc/lvm/lvm.conf, I believe, and 
make sure it also ends up in your initramfs.


After you filter out your /dev/sdx devices, they should no longer even be 
seen by pvscan, and can also prevent this, perhaps illogical (refusal to) 
activate.


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine

2018-03-28 Thread Xen

Zdenek Kabelac schreef op 28-03-2018 0:17:


Hi

This is why users do open BZ if they would like to see some 
enhancement.


Normally cache is integral part of a volume - so it's partially
missing - whole volume is considered to be garbage.

But in 'writethrough' mode there could be likely possible better 
recovery.


Of course this case needs usability without --force.

So please open  RFE BZ for this case.


It goes into the mess I usually get myself into; if you "dd copy" the 
disk containing the origin volume before uncaching it, and then go to 
some live session where you only have the new backup copy, but you want 
to clean up its LVM,


then you now must fix the VGs in isolation of the cache; I suppose this 
is just the wrong order of doing things, but as part of a backup you 
don't really want to uncache first, as that requires more work to get it 
back to normal after.


So you end up in a situation where the new origin copy has a reference 
to the cache disk --- all of this assumes writethrough mode --- and you 
need to clear that reference.


However, you cannot, or should not, attach the cache disk again; it 
might get effected, and you don't want that, you want it to remain in 
its pristine state.


Therefore, you are now left with the task of removing the cache from the 
VG, because you cannot actually run vgimportclone while the cache disk 
is missing.


The obvious solution is to *also* clone the cache disk and then run 
operations on the combined set, but this might not be possible.


Therefore, all that was left was:

  vgreduce --remove-missing --force
  cd /etc/lvm/archive
  cp  /etc/lvm/backup/
  cd /etc/lvm/backup
  vi 
  " remove cache PV, and change origin to regular linear volume, and add
  " the visible tag

  vgcfgrestore 

  # presto, origin is restored as regular volume without the cache

  vgimportclone -i  

  # now have distinct volume group, VG UUID and PV UUID

So the problem is making dd backups of origin, perhaps dd backups should 
be avoided, but for some purposes (such as system migration) file copies 
are just


more work

in general, and can complicate things as well, for instance if there are 
NTFS partitions or whatnot.


And disk images can be nice to have, in any case.

This was the use case basically.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine

2018-03-27 Thread Xen

Gang He schreef op 27-03-2018 7:55:


I just reproduced a problem from the customer, since they did virtual
disk migration from one virtual machine  to another one.
According to your comments, this does not look like a LVM code problem,
the problem can be considered as LVM administer misoperation?


Counterintuitively, you must remove the PV from the VG before you remove 
the (physical) disk from the system.


Yes that is something you can often forget doing, but as it stands 
resolving the situation often becomes a lot harder when you do it in 
reverse.


Ie. removing the disk first and then removing the PV from the VG is a 
lot harder.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine

2018-03-23 Thread Xen

Gang He schreef op 23-03-2018 9:30:


6) attach disk2 to VM2(tb0307-nd2), the vg on VM2 looks abnormal.
tb0307-nd2:~ # pvs
  WARNING: Device for PV JJOL4H-kc0j-jyTD-LDwl-71FZ-dHKM-YoFtNV not
found or rejected by a filter.
  PV VG  Fmt  Attr PSize  PFree
  /dev/vdc   vg2 lvm2 a--  20.00g 20.00g
  /dev/vdd   vg1 lvm2 a--  20.00g 20.00g
  [unknown]  vg1 lvm2 a-m  20.00g 20.00g


This is normal because /dev/vdd contains metadata for vg1 which includes 
now missing disk /dev/vdc   as the PV is no longer the same.






tb0307-nd2:~ # vgs
  WARNING: Device for PV JJOL4H-kc0j-jyTD-LDwl-71FZ-dHKM-YoFtNV not
found or rejected by a filter.
  VG  #PV #LV #SN Attr   VSize  VFree
  vg1   2   0   0 wz-pn- 39.99g 39.99g
  vg2   1   0   0 wz--n- 20.00g 20.00g


This is normal because you haven't removed /dev/vdc from vg1 on 
/dev/vdd, since it was detached while you operated on its vg.




7) reboot VM2, the result looks worse (vdc disk belongs to two vg).
tb0307-nd2:/mnt/shared # pvs
  PV VG  Fmt  Attr PSize  PFree
  /dev/vdc   vg1 lvm2 a--  20.00g 0
  /dev/vdc   vg2 lvm2 a--  20.00g 10.00g
  /dev/vdd   vg1 lvm2 a--  20.00g  9.99g


When you removed vdd when it was not attached, the VG1 metadata on vdd 
was not altered. The metadata resides on both disks, so you had 
inconsistent metadata between both disks because you operated on the 
shared volume group while one device was missing.


You also did not recreate PV on /dev/vdc so it has the same UUID as when 
it was part of VG1, this is why VG1 when VDD is booted will still try to 
include /dev/vdc because it was never removed from the volume group on 
VDD.


So the state of affairs is:

/dev/vdc contains volume group info for VG2 and includes only /dev/vdc

/dev/vdd contains volume group info for VG1, and includes both /dev/vdc 
and /dev/vdd by UUID for its PV, however, it is a bug that it should 
include /dev/vdc even though the VG UUID is now different (and the name 
as well).


Regardless, from vdd's perspective /dev/vdc is still part of VG1.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2018-03-03 Thread Xen

Zdenek Kabelac schreef op 28-02-2018 22:43:


It still depends - there is always some sort of 'race' - unless you
are willing to 'give-up' too early to be always sure, considering
there are technologies that may write many GB/s...


That's why I think it is only possible for snapshots.

You can use rootfs with thinp - it's very fast for testing i.e. 
upgrades

and quickly revert back - just there should be enough free space.


That's also possible with non-thin.

Snapshot are using space - with hope that if you will 'really' need 
that space

you either add this space to you system - or you drop snapshots.


And I was saying back then that it would be quite easy to have a script 
that would drop bigger snapshots first (of larger volumes) given that 
those are most likely less important and more likely to prevent thin 
pool fillup, and you can save more smaller snapshots this way.


So basically I mean this gives your snapshots a "quotum" that I was 
asking about.


Lol now I remember.

You could easily give (by script) every snapshot a quotum of 20% of full 
volume size, then when 90% thin target is reached, you start dropping 
volumes with the largest quotum first, or something.


Idk, something more meaningful than that, but you get the idea.

You can calculate the "own" blocks of the snapshot and when the pool is 
full you check for snapshots that have surpassed their quotum, and the 
ones that are past their quotas in the largest numbers you drop first.



But as said - with today 'rush' of development and load of updates -
user do want to try 'new disto upgrade' - if it works - all is fine -
if it doesn't let's have a quick road back -  so using thin volume for
rootfs is pretty wanted case.


But again, regular snapshot of sufficient size does the same thing, you 
just have to allocate for it in advance, but for root this is not really 
a problem.


Then no more issue with thin-full problem.

I agree, less convenient, and a slight bit slower, but not by much for 
this special use case.



There are also some on going ideas/projects - one of them was to have
thinLVs with priority to be always fully provisioned - so such thinLV
could never be the one to have unprovisioned chunks


That's what ZFS does... ;-).

Other was a better integration of filesystem with 'provisioned' 
volumes.


That's what I was talking about back then...

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2018-03-03 Thread Xen

Gionatan Danti schreef op 28-02-2018 20:07:


To recap (Zdeneck, correct me if I am wrong): the main problem is
that, on a full pool, async writes will more-or-less silenty fail
(with errors shown on dmesg, but nothing more).


Yes I know you were writing about that in the later emails.


Another possible cause
of problem is that, even on a full pool, *some* writes will complete
correctly (the one on already allocated chunks).


Idem.


In the past was argued that putting the entire pool in read-only mode
(where *all* writes fail, but read are permitted to complete) would be
a better fail-safe mechanism; however, it was stated that no current
dmtarget permit that.


Right. Don't forget my main problem was system hangs due to older 
kernels, not the stuff you write about now.



Two (good) solution where given, both relying on scripting (see
"thin_command" option on lvm.conf):
- fsfreeze on a nearly full pool (ie: >=98%);
- replace the dmthinp target with the error target (using dmsetup).

I really think that with the good scripting infrastructure currently
built in lvm this is a more-or-less solved problem.


I agree in practical terms. Doesn't make for good target design, but 
it's good enough, I guess.



Do NOT take thin snapshot of your root filesystem so you will avoid
thin-pool overprovisioning problem.


But is someone *really* pushing thinp for root filesystem? I always
used it for data partition only... Sure, rollback capability on root
is nice, but it is on data which they are *really* important.


No, Zdenek thought my system hangs resulted from something else and then 
in order to defend against that (being the fault of current DM design) 
he tried to raise the ante by claiming that root-on-thin would cause 
system failure anyway with a full pool.


I never suggested root on thin.


In stress testing, I never saw a system crash on a full thin pool


That's good to know, I was just using Jessie and Xenial.


We discussed that in the past also, but as snapshot volumes really are
*regular*, writable volumes (which a 'k' flag to skip activation by
default), the LVM team take the "safe" stance to not automatically
drop any volume.


Sure I guess any application logic would have to be programmed outside 
of any (device mapper module) anyway.


The solution is to use scripting/thin_command with lvm tags. For 
example:

- tag all snapshot with a "snap" tag;
- when usage is dangerously high, drop all volumes with "snap" tag.


Yes, now I remember.

I was envisioning some other tag that would allow a quotum to be set for 
every volume (for example as a %) and the script would then drop the 
volumes with the larger quotas first (thus the larger snapshots) so as 
to protect smaller volumes which are probably more important and you can 
save more of them. I am ashared to admit I had forgotten about that 
completely ;-).


Back to rule #1 - thin-p is about 'delaying' deliverance of real 
space.

If you already have plan to never deliver promised space - you need to
live with consequences


I am not sure to 100% agree on that.


When Zdenek says "thin-p" he might mean "thin-pool" but not generally 
"thin-provisioning".


I mean to say that the very special use case of an always auto-expanding 
system is a special use case of thin provisioning in general.


And I would agree, of course, that the other uses are also legit.


Thinp is not only about
"delaying" space provisioning; it clearly is also (mostly?) about
fast, modern, usable snapshots. Docker, snapper, stratis, etc. all use
thinp mainly for its fast, efficent snapshot capability.


Thank you for bringing that in.


Denying that
is not so useful and led to "overwarning" (ie: when snapshotting a
volume on a virtually-fillable thin pool).


Aye.


!SNAPSHOTS ARE NOT BACKUPS!


Snapshot are not backups, as they do not protect from hardware
problems (and denying that would be lame)


I was really saying that I was using them to run backups off of.


however, they are an
invaluable *part* of a successfull backup strategy. Having multiple
rollaback target, even on the same machine, is a very usefull tool.


Even more you can backup running systems, but I thought that would be 
obvious.



Again, I don't understand by we are speaking about system crashes. On
root *not* using thinp, I never saw a system crash due to full data
pool.


I had it on 3.18 and 4.4, that's all.


Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are
way too limited).


That could be it too.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2018-03-03 Thread Xen
I did not rewrite this entire message, please excuse the parts where I 
am a little more "on the attack".




Zdenek Kabelac schreef op 28-02-2018 10:26:


I'll probably repeat my self again, but thin provision can't be
responsible for all kernel failures. There is no way DM team can fix
all the related paths on this road.


Are you saying there are kernel bugs presently?


If you don't plan to help resolving those issue - there is not point
in complaining over and over again - we are already well aware of this
issues...


I'm not aware of any issues, what are they?

I was responding here to an earlier thread I couldn't respond to back 
then, the topic was whether it was possible to limit thin snapshot 
sizes, you said it wasn't, I was just recapping this thread.



If the admin can't stand failing system, he can't use thin-p.


That just sounds like a blanket excuse for any kind of failure.


Overprovisioning on DEVICE level simply IS NOT equivalent to full
filesystem like you would like to see all the time here and you've
been already many times explained that filesystems are simply not
there ready - fixes are on going but it will take its time and it's
really pointless to exercise this on 2-3 year old kernels...


Pardon me, but your position has typically been that it is fundamentally 
impossible, not that "we're not there yet".


My questions have always been about fundamental possibilities, to which 
you always answer in the negative.


If something is fundamentally impossible, don't be surprised if you then 
don't get any help in getting there: you always close off all paths 
leading towards it.


You shut off any interest, any discussion, and any development interest 
in paths that a long time later, you then say "we're working on it" 
whereas before you always said "it's impossible".


This happened before where first you say "It's not a problem, it's admin 
error" and then a year later you say "Oh yeah, it's fixed now".


Which is it?

My interest has always been, at least philosophically, or concerning 
principle abilities, in development and design, but you shut it off 
saying it's impossible.


Now you complain you are not getting any help.


Thin provisioning has it's use case and it expects admin is well aware
of possible problems.


That's a blanket statement once more that says nothing about actual 
possibilities or impossibilities.



If you are aiming for a magic box working always right - stay away
from thin-p - the best advice


Another blanket statement excusing any and all mistakes or errors or 
failures the system could ever have.



Do NOT take thin snapshot of your root filesystem so you will avoid
thin-pool overprovisioning problem.


Zdenek, could you please make up your mind?

You brought up thin snapshotting as a reason for putting root on thin, 
as a way of saying that thin failure would lead to system failure and 
not just application failure,


whereas I maintained that application failure was acceptable.

I tried to make the distinction between application level failure (due 
to filesystem errors) and system instability caused by thin.


You then tried to make those equivalent by saying that you can also put 
root on thin, in which case application failure becomes system failure.


I never wanted root on thin, so don't tell me not to snapshot it, that 
was your idea.




Rule #1:

Thin-pool was never targeted for 'regular' usage of full thin-pool.


All you are asked is to design for error conditions.

You want only to take care of the special use case where nothing bad 
happens.


Why not just take care of the general use case where bad things can 
happen?


You know, real life?

In any development process you first don't take care of all error 
conditions, you just can't be bothered with them yet. Eventually, you 
do.


It seems you are trying to avoid having to deal with the glaring error 
conditions that have always existed, but you are trying to avoid having 
to take any responsibility for it by saying that it was not part of the 
design.


To make this more clear Zdenek, your implementation does not cater to 
the general use case of thin provisioning, but only to the special use 
case where full thin pools never happen.


That's a glaring omission in any design. You can go on and on on how 
thin-p was not "targetted" at that "use case", but that's like saying 
you built a car engine that was not "targetted" at "running out of 
fuel".


Then when the engine breaks down you say it's the user's fault.

Maybe retarget your design?

Running out of fuel is not a use case.

It's a failure condition that you have to design for.

Full thin-pool is serious ERROR condition with bad/ill effects on 
systems.


Yes and your job as a systems designer is to design for those error 
conditions and make sure they are handled gracefully.


You just default on your responsibility there.

The reason you brought up root on thin was to elevate application 
failure to the level of 

Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2018-02-27 Thread Xen

Zdenek Kabelac schreef op 24-04-2017 23:59:


I'm just currious -  what the you think will happen when you have
root_LV as thin LV and thin pool runs out of space - so 'root_LV'
is replaced with 'error' target.


Why do you suppose Root LV is on thin?

Why not just stick to the common scenario when thin is used for extra 
volumes or data?


I mean to say that you are raising an exceptional situation as an 
argument against something that I would consider quite common, which 
doesn't quite work that way: you can't prove that most people would 
not want something by raising something most people wouldn't use.


I mean to say let's just look at the most common denominator here.

Root LV on thin is not that.


Well then you might be surprised - there are user using exactly this.


I am sorry, this is a long time ago.

I was concerned with thin full behaviour and I guess I was concerned 
with being able to limit thin snapshot sizes.


I said that application failure was acceptable, but system failure not.

Then you brought up root on thin as a way of "upping the ante".

I contended that this is a bigger problem to tackle, but it shouldn't 
mean you shouldn't tackle the smaller problems.


(The smaller problem being data volumes).

Even if root is on thin and you are using it for snapshotting, it would 
be extremely unwise to overprovision such a thing or to depend on 
"additional space" being added by the admin; root filesystems are not 
meant to be expandable.


If on the other hand you do count on overprovisioning (due to snapshots) 
then being able to limit snapshot size becomes even more important.



When you have rootLV on thinLV - you could easily snapshot it before
doing any upgrade and revert back in case something fails on upgrade.
See also projects like snapper...


True enough, but if you risk filling your pool because you don't have 
full room for a full snapshot, that would be extremely unwise. I'm also 
not sure write performance for a single snapshot is very much different 
between thin and non-thin?


They are both CoW. E.g. you write to an existing block it has to be 
duplicated, only for non-allocated writes thin is faster, right?


I simply cannot reconcile an attitude that thin-full-risk is acceptable 
and the admin's job while at the same time advocating it for root 
filesystems.


Now most of this thread I was under the impression that "SYSTEM HANGS" 
where the norm because that's the only thing I ever experienced (kernel 
3.x and kernel 4.4 back then), however you said that this was fixed in 
later kernels.


So given that, some of the disagreement here was void as apparently no 
one advocated that these hangs were acceptable ;-).


:).


I have tried it, yes. Gives troubles with Grub and requires thin 
package to be installed on all systems and makes it harder to install 
a system too.


lvm2 is cooking some better boot support atm


Grub-probe couldn't find the root volume so I had to maintain my own 
grub.cfg.


Regardless if I ever used this again I would take care to never 
overprovision or to only overprovision at low risk with respect to 
snapshots.


Ie. you could thin provision root + var or something similar but I would 
always put data volumes (home etc) elsewhere.


Ie. not share the same pool.

Currently I was using a regular snapshot but I allocated it too small 
and it always got dropped much faster than I anticipated.


(A 1GB snapshot constantly filling up with even minor upgrade 
operations).





Thin root LV is not the idea for most people.

So again, don't you think having data volumes produce errors is not 
preferable to having the entire system hang?


Not sure why you insist system hangs.

If system hangs - and you have recent kernel & lvm2 - you should fill 
bug.


If you set  '--errorwhenfull y'  - it should instantly fail.

There should not be any hanging..


Right well Debian Jessie and Ubuntu Xenial just experienced that.



That's irrelevant; if the thin pool is full you need to mitigate it, 
rebooting won't help with that.


well it's really admins task to solve the problem after panic call.
(adding new space).


That's a lot easier if your root filesystem doesn't lock up.

;-).

Good luck booting to some rescue environment on a VPS or with some boot 
stick on a PC; the Ubuntu rescue environment for instance has been 
abysmal since SystemD.


You can't actually use the rescue environment because there is some 
weird interaction with systemd spewing messages and causing weird 
behaviour on the TTY you are supposed to work on.


Initrd yes, but not the "full rescue" systemd target, doesn't work.

My point with this thread was.




When my root snapshot fills up and gets dropped, I lose my undo history, 
but at least my root filesystem won't lock up.


I just calculated the size too small and I am sure I can also put a 
snapshot IN a thin pool for a non-thin root volume?


Haven't tried.

However, I don't have the space for a full copy of every 

Re: [linux-lvm] Saying goodbye to LVM

2018-02-07 Thread Xen

Gionatan Danti schreef op 07-02-2018 22:19:


LVM just has conceptual problems.


As a CentOS user, I *never* encountered such problems. I really think
these are caused by the lack of proper integration testing from
Debian/Ubuntu.


That would only apply to udev/boot problems, not the tooling issues.

If you never make DD copies, you never run into such issues.

And if you don't use Cache you won't have those missing PV issues 
either.


Maybe I am just great at finding missing features but LVM has in the end 
cost me a lot more time than it has saved me.


I mean, if I had just stuck to regular partitions I would have been 
further ahead in life by now ;-).


Including any lack of LVM expertise I would have had by then. Which, in 
the end, I don't think is worth it.




But hey - all key LVM developers are RedHat people, so
it should be expected (for the better/worse).


The denialist nature of Linux people ensures that even if LVM upstream 
says UPGRADE, Ubuntu will say "why? everything works fine for me".


Or, "I never ran into such issues" ;-).


True. I never use it with boot device.


Even on Solaris it is limited, for example the root pool cannot have an 
external log device (that means SLOG). Then, you have no clue if this is 
also going to be the case on Linux or not ;-). And Grub supports booting 
from a root dataset but only barely, I don't think anything else (e.g. a 
ZVOL) is any kind of realism. The biggest downside is inflexibility in 
shrinking pools, and people complain about ZVOL snapshots requiring a 
lot of space.


Btrfs, on the other hand, supports removing disks from raid sets and 
just reorganizing what's left.



LVM and XFS are, on the other
hand, extremely well integrated into mainline kernel/userspace
utilities.


Except that apparently there are (or were, or can be) extreme 
initramfs/udev issues and the Ubuntu support/integration has been flimsy 
at best -- what's not flimsy is Grub support, it will even load an 
embedded LVM just fine.


I mean you can have an LV on a PV that is an LV on a PV and Grub will be 
able to read it, the Ubuntu initramfs will not.



Hence my great interest in stratis...


I don't deny you there but I wonder if I'm not better off sticking to 
ordinary partitions ;-).


But my main idea is to use compressed ZVOLs if I can.

You can just stick partition tables on those too. ZFS has a lot of 
different "models".


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] pvscan hung in LVM 160

2018-01-23 Thread Xen

I have a small question if I may ask.

I upgraded one of my systems to LVM 160, I know that's still rather old.

During boot, pvscan hangs, I don't know if this is because udev is 
terminated.


The pvscan --background never completes.

If I remove --background, pvscan instantly exits with code 3.

Consequently, none of my volumes are activated save what comes down to the 
root device, because it is activated explicitly, this is Ubuntu Xenial 
basically.


The thing only started happening when I upgraded LVM to a newer version.

Though pvscan exits with 3, at least my system doesn't take 6 minutes to 
boot now.


My only question is: was this a known bug in 160 and has this since been 
fixed, and if so, in what version?


Some people on Arch investigated this bug when 160 was current (around 
2014) and for them the bug was caused by udev exiting before pvscan could 
finish.


The basic boot delays are caused by udevadm settle calls.

Regards.



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reattach cache

2017-11-22 Thread Xen

matthew patton schreef op 22-11-2017 10:16:

by definition when you detach a cache it is now entirely invalid and
will (should) be treated as empty.


Yeah but it wasn't.

Within seconds the rootfs was readonly and because I was running some 
apt thing at the same time, now a large bunch of packages became 
corrupted in the apt index (dpkg index).


A bunch are still missing md5 files but I am on limited bandwidth so not 
reinstalling.


So I assumed the above that you write.

But it bit me once more.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reattach cache

2017-11-22 Thread Xen

Zdenek Kabelac schreef op 22-11-2017 10:57:


In your case - just destroy the cache (--uncache) and do not try to
reuse cache-pool unless you really know what you are doing.


But I still don't know how to clean the metadata manually.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-23 Thread Xen

matthew patton schreef op 23-10-2017 21:02:

On Mon, 10/23/17, John Stoffel  wrote:


SSD pathologies aside, why are we concerned about the cache layer on a
streaming read?

By definition the cache shouldn't be involved at all.


Because whatever purpose you are using it for, it shouldn't OOM the 
system.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-20 Thread Xen

lejeczek schreef op 20-10-2017 16:20:


I would - if bigger part of a storage subsystem resides in the
hardware - stick to the hardware, use CacheCade, let the hardware do
the lot.


In other words -- keep it simple (smart person) ;-).

Complicatedness is really the biggest reason for failure everywhere

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-20 Thread Xen

Oleg Cherkasov schreef op 20-10-2017 10:21:

On 19. okt. 2017 20:13, Xen wrote:


The main cause was a way too slow SSD but at the same time... that 
sorta thing still shouldn't happen, locking up the entire system.


I haven't had a chance to try again with a faster SSD.


I have double checked with MegaRAID/CLI and all disks on that rig
(including SSD ones of course) are SAS 6Gb/s both devices and links.
My first thought about those SSDs was that those are slower than RAID5
however it seems not the case.

Could it be TRIMing issue because those are from 2012?


You mean that the SATA version is too low to interleave TRIMs with data 
access?


Because I think that was the case with my mSata SSD.

I don't currently remember the sata version that allowed interleaving 
but that SSD didn't reach or have it.


After trimming performance would go up greatly.

So I don't know about SAS but it might be similar right.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-20 Thread Xen

matthew patton schreef op 20-10-2017 2:12:

It is just a backup server,


Then caching is pointless.


That's irrelevant and not up to another person to decide.


Furthermore any half-wit caching solution
can detect streaming read/write and will deliberately bypass the
cache.


The problem was not performance, it was stability.


Furthermore DD has never been a useful benchmark for anything.
And if you're not using 'odirect' it's even more pointless.


Performance was not the issue, stability was.


Server has 2x SSD drives by 256Gb each


and for purposes of 'cache' should be individual VD and not waste
capacity on RAID1.


Is probably also going to be quite irrelevant to the problem at hand.


10x 3Tb drives.  In addition  there are two
MD1200 disk arrays attached with 12x 4Tb disks each.  All


Raid5 for this size footprint is NUTs. Raid6 is the bare minimum.


That's also irrelevant to the problem at hand.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-20 Thread Xen

John Stoffel schreef op 19-10-2017 23:14:


And RHEL7.4/CentOS 7 is all based on kernel 3.14 (I think) with lots
of RedHat specific backports.  So knowing the full details will only
help us provide help to him.


Alright I missed that, sorry.

Still given that a Red Hat developer has stated awareness about the 
problem that means that other than the kernel it isn't likely that 
individual config is going to play a big role.


Also it is likely that anyone in the position to really help would 
already recognise the problems.


I just mean to say that it is going to need a developer and is not very 
likely that individual config is at fault.


Although a different kernel would see different behaviour, you're right 
about that, my apologies.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-19 Thread Xen

John Stoffel schreef op 19-10-2017 21:09:


How did you setup your LVM config and your cache config?  Did you
mirror the two SSDs using MD


He said he used hardware RAID to mirror the devices.


I ask because I'm running lvcache at home on my main file/kvm server
and I've never seen this problem.  But!  I suspect you're running a
much older kernel, lvm config, etc.


lvm2-2.02.171-8.el7.x86_64

CentOS 7.4 was released a month ago.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] cache on SSD makes system unresponsive

2017-10-19 Thread Xen

Oleg Cherkasov schreef op 19-10-2017 19:54:


Any ideas what may be wrong?


All I know myself in the past have tried to cache an embedded encrypted 
LVM in a regular home system.


The problem was probably caused by the SSD not clearing write caches 
fast enough but I too got some 2 minute "hanging process" outputs on the 
console.


So it was probably a queueing issue within the kernel and might not have 
been related to the cache,


but I'm still not sure if there wasn't an interplay at work.

The main cause was a way too slow SSD but at the same time... that sorta 
thing still shouldn't happen, locking up the entire system.


I haven't had a chance to try again with a faster SSD.

Regards...

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] raid10 to raid what? - convert

2017-10-18 Thread Xen

lejeczek schreef op 18-10-2017 17:52:


I'm still looking for an answer - if it's possible then how to split
raid10 into two raid0 LVs(with perhaps having data intact?)
I've been fiddling with --splitmirrors but either I got it wrong or I
didn't and command just fails.
More than contemplating theories and general knowledge on raid I'd
prise a lot succinct, concrete info, actuall experience of "howto".


Sorry that I can't help you here, but I believe it is possible with 
mdraid.


However note that if you can spend the time you could take two disks, 
wipe them, put raid 0 on it, and then copy the still functioning RAID 10 
over. I know that's not what you're asking but personally I have no 
hands-on experience with LVM raid and metadata.


I assume that even if you were to manage to get a raid 0 going with pure 
DM commands you'd still need to change LVM's metadata and no clue how to 
do that myself.


Conceivably, copying your data over is not ideal but it should not take 
longer than a day?


Don't forget that RAID 10 is 2 stripes of mirrors...

But LVM typically prevents you to operate on a volume group if not all 
PVs are present...


Anyway if it can't be done by lvconvert, it probably can't be done 
(without whatever complex thing).


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Difference between Debian and some other distributions with thin provisioning

2017-09-30 Thread Xen

Jan Tulak schreef op 29-09-2017 18:42:


Debian:
# cat /etc/debian_version
8.9
# lvm version
  LVM version: 2.02.111(2) (2014-09-01)
  Library version: 1.02.90 (2014-09-01)
  Driver version:  4.27.0

Centos:
# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
# lvm version
  LVM version: 2.02.171(2)-RHEL7 (2017-05-03)
  Library version: 1.02.140-RHEL7 (2017-05-03)
  Driver version:  4.35.0


Versions are not actually very close.

On Debian I do know that I am often confused by thin commands, sometimes 
-T seems to be necessary and sometimes --thin-pool or I am just 
confused, I don't know.


Same with Grub2, it has been in 2.02beta for ages and was only recently 
released as 2.02 but I think years of development went into that.


Although I think I can stay they are quit headstrong as compared to 
LVM2...


But you can see that the versions are 3 years apart anyway...

Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Restoring snapshot gone bad

2017-09-22 Thread Xen

Mauricio Tavares schreef op 22-09-2017 8:03:

I have a lv, vmzone/desktop that I use as drive for a kvm guest;
nothing special here. I wanted to restore its snapshot so like I have
done many times before I shut guest down and then

lvconvert --merge vmzone/desktop_snap_20170921
  Logical volume vmzone/desktop is used by another device.
  Can't merge over open origin volume.
  Merging of snapshot vmzone/desktop_snap_20170921 will occur on next 
activation

 of vmzone/desktop.

What is it really trying to tell me? How to find out which other
device is using it?


Other people will have better answers but I think it will be hard to see 
unless it is used by the device mapper for some target.


I hope there is a better answer.

But obviously the "o" means open volume (ie. mounted or something else) 
and that means that if you can't find a way to close it next time you 
boot the machine it will get merged?


I hope there is a good way to get your usage information (apart from 
something like "lsof").


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-21 Thread Xen

Hi,

thank you for your response once more.

Zdenek Kabelac schreef op 21-09-2017 11:49:


Hi

Some more 'light' into the existing state as this is really not about
what can and what cannot be done in kernel - as clearly you can do
'everything' in kernel - if you have the code for it...


Well thank you for that ;-).


In practice your 'proposal' is quite different from the existing
target - essentially major rework if not a whole new re-implementation
 - as it's not 'a few line' patch extension  which you might possibly
believe/hope into.


Well I understand that the solution I would be after would require 
modification to the DM target. I was not arguing for LVM alone; I 
assumed that since DM and LVM are both hosted in the same space there 
would be at least the idea of cooperation between the two teams.


And that it would not be too 'radical' to talk about both at the same 
time.



Of course this decision makes some tasks harder (i.e. there are surely
problems which would not even exist if it would be done in kernel)  -
but lots of other things are way easier - you really can't compare
those


I understand. But many times lack of integration of shared goal of 
multiple projects is also big problem in Linux.


However if we *can* standardize on some tag or way of _reserving_ this 
space, I'm all for it.


Problems of a desktop user with 0.5TB SSD are often different with
servers using 10PB across multiple network-connected nodes.

I see you call for one standard - but it's very very difficult...


I am pretty sure that if you start out with something simple, it can 
extend into the complex.


That's of course why an elementary kernel feature would make sense.

A single number. It does not get simpler than that.

I am not saying you have to.

I was trying to find out if your statements that something was 
impossible, was actually true.


You said that you need a completely new DM target from the ground up. I 
doubt that. But hey, you're the expert, not me.


I like that you say that you could provide an alternative to the regular 
DM target and that LVM could work with that too.


Unfortunately I am incapable of doing any development myself at this 
time (sounds like fun right) and I also of course could not myself test 
20 PB.


I think a 'critical' tag in combination with the standard 
autoextend_threshold (or something similar) is too loose and 
ill-defined and not very meaningful.


We look for delivering admins rock-solid bricks.

If you make small house or you build a Southfork out of it is then
admins' choice.

We have spend really lot of time thinking if there is some sort of
'one-ring-to-rule-them-all' solution - but we can't see it yet -
possibly because we know wider range of use-cases compared with
individual user-focused problem.


I think you have to start simple.

You can never come up with a solution if you start out with the complex.

The only thing I ever said was:
- give each volume a number of extents or a percentage of reserved space 
if needed

- for all the active volumes in the thin pool, add up these numbers
- when other volumes require allocation, check against free extents in 
the pool

- possibly deny allocation for these volumes

I am not saying here you MUST do anything like this.

But as you say, it requires features in the kernel that are not there.

I did not know or did not realize the upgrade paths of the DM module(s) 
and LVM2 itself would be so divergent.


So my apologies for that but obviously I was talking about a full-system 
solution (not partial).


And I would prefer to set individual space reservation for each volume 
even if it can only be compared to 5% threshold values.


Which needs 'different' kernel target driver (and possibly some way to
kill/split page-cache to work on 'per-device' basis)


No no, here I meant to set it by a script or to read it by a script or 
to use it by a script.



And just as an illustration of problems you need to start solving for
this design:

You have origin and 2 snaps.
You set different 'thresholds' for these volumes  -


I would not allow setting threshold for snapshots.

I understand that for dm thin target they are all the same.

But for this model it does not make sense because LVM talks of "origin" 
and "snapshots".


You then overwrite 'origin'  and you have to maintain 'data' for OTHER 
LVs.


I don't understand. Other LVs == 2 snaps?


So you get into the position - when 'WRITE' to origin will invalidate
volume that is NOT even active (without lvm2 being even aware).


I would not allow space reservation for inactive volumes.

Any space reservation is meant for safeguarding the operation of a 
machine.


Thus it is meant for active volumes.


So suddenly rather simple individual thinLV targets  will have to
maintain whole 'data set' and cooperate with all other active thins
targets in case they share some data


I don't know what data sharing has to do with it.

The entire system only works with 

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-20 Thread Xen

Gionatan Danti schreef op 18-09-2017 21:20:


Xen, I really think that the combination of hard-threshold obtained by
setting thin_pool_autoextend_threshold and thin_command hook for
user-defined script should be sufficient to prevent and/or react to
full thin pools.


I will hopefully respond to Zdenek's message later (and the one before 
that that I haven't responded to),



I'm all for the "keep it simple" on the kernel side.


But I don't mind if you focus on this,


That said, I would like to see some pre-defined scripts to easily
manage pool fullness. (...) but I would really
like the standardisation such predefined scripts imply.


And only provide scripts instead of kernel features.

Again, the reason I am also focussing on the kernel is because:

a) I am not convinced it cannot be done in the kernel
b) A kernel feature would make space reservation very 'standardized'.

Now I'm not convinced I really do want a kernel feature but saying it 
isn't possible I think is false.


The point is that kernel features make it much easier to standardize and 
to put some space reservation metric in userland code (it becomes a 
default feature) and scripts remain a little bit off to the side.


However if we *can* standardize on some tag or way of _reserving_ this 
space, I'm all for it.


I think a 'critical' tag in combination with the standard 
autoextend_threshold (or something similar) is too loose and ill-defined 
and not very meaningful.


In other words you would be abusing one feature for another purpose.

So I do propose a way to tag volumes with a space reservation (turning 
them cricical) or alternatively to configure a percentage of reserved 
space and then merely tag some volumes as critical volumes.


I just want these scripts to be such that you don't really need to 
modify them.


In other words: values configured elsewhere.

If you think that should be the thin_pool_autoextend_threshold, fine, 
but I really think it should be configured elsewhere (because you are 
not using it for autoextending in this case).


thin_command is run every 5%:

https://www.mankier.com/8/dmeventd

You will need to configure a value to check against.

This is either going to be a single, manually configured, fixed value 
(in % or extents)


Or it can be calculated based on reserved space of individual volumes.

So if you are going to have a kind of "fsfreeze" script based on 
critical volumes vs. non-critical volumes I'm just saying it would be 
preferable to set the threshold at which to take action in another way 
than by using the autoextend_threshold for that.


And I would prefer to set individual space reservation for each volume 
even if it can only be compared to 5% threshold values.


So again: if you want to focus on scripts, fine.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"

2017-09-20 Thread Xen

Gionatan Danti schreef op 19-09-2017 10:44:


Sure, I was only describing a possible case where the warning is
"redundant" (ie: because the admin know the snapshot will be
short-lived).


I would only like to say that reducing the "warning" to a "notice" would 
also reduce the irritation.


Ie. instead "WARNING. Word word word" it could also be

"New volume xx overprovisions thin pool by xxx".

This way you don't give the user the idea that you think he/she is 
stupid.


I know this still implies some verbosity but it doesn't have to be an 
"error". It can simply be a "notice".


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-15 Thread Xen

Brassow Jonathan schreef op 15-09-2017 4:06:


There are many solutions that could work - unique to every workload
and different user.  It is really hard for us to advocate for one of
these unique solutions that may work for a particular user, because it
may work very badly for the next well-intentioned googler.


Well, thank you.

Of course in the split between saying "it is the administrator's job 
that everyone works well" and at the same time saying that those 
administrators can be "googlers".


There's a big gap between that. I think that many who do employ thinp 
will be at least a bit more serious about it, but perhaps not as serious 
that they can devote all the resources to developing all of the 
mitigating measures that anyone could want.


So I think the common truth lies more in the middle: they are not 
googlers who implement the first random article they find without 
thinking about it, and they are not professional people in full time 
employment doing this thing.



So because of that fact that most administrators interested in thin like 
myself will have read LVM manpages a great deal already on their own 
systems...


And any common default targets for "thin_command" could also be well 
documented and explained, and pros and cons layed out.


The only thing we are talking about today is reserving space due to some 
threshold.


And performing an action when that reservation is threatened.

So this is the common need here.

This need is going to be the same for everyone that uses any scheme that 
could be offered.


Then the question becomes: are interventions also as common?

Well there are really only a few available:

a) turning into error volume as per the bug
b) fsfreezing
c) merely reporting
d) (I am not sure if "lvremove" should really be seriously considered).

At this point you have basically exhausted any default options you may 
have that are "general". No one actually needs more than that.


What becomes interesting now is the logic underpinning these decisions.

This logic needs some time to write and this is the thing that 
administrators will put off.


So they will live with not having any intelligence in automatic response 
and will just live with the risk of a volume filling up without having 
written the logic that could activate the above measures.


That's the problem.

So what I am advocating for -- I am not disregarding Mr. Zdenek's bug 
;-), [1], In fact I think this "lverror" would be very welcome 
(paraphrasing here) even though personally I would want to employ a 
filesystem mechanic if I am doing this using a userland too anyway!!!


But sure, why not.

I think that is complementary to and orthogonal to the issue of where 
the logic is coming from, and that the logic also requires a lot of 
resources to write.


So even though you could probably hack it together in some 15 minutes, 
and then you need testing etc...


I think it would just be a lot more pleasant if this logic framework 
already existed, was tried and tested, did the job correctly, and can 
easily be employed by anyone else.


So I mean to say that currently we are only talking about space 
reservation.



You can only do this in a number of ways:

- % of total volume size.

- fixed amount configured per volume

And that's basically it.

The former merely requires each volume to be 'flagged' as 'critical' as 
suggested.
The latter requires some number to be defined and then flagging is 
unnecessary.


The script would ensure that:

- not ALL thin volumes are 'critical'.
- as long as a single volume is non-critical, the operation can continue
- all critical volumes are aggregated in required free space
- the check is done against currently available free space
- the action on the non-critical-volumes is performed if necessary.

That's it. Anyone could use this.



The "Big vs. Small" model is a little bit more involved and requires a 
little bit more logic, and I would not mind writing it, but it follows 
along the same lines.


*I* say that in this department, *only* these two things are needed.

+ potentially the lverror thing.

So I don't really see this wildgrowth of different ideas.


So personally I would like the "set manual size" more than the "use 
percentage" in the above. I would not want to flag volumes as critical, 
I would just want to set their reserved space.


I would prefer if I could set this in the LVM volumes themselves, rather 
than in the script.


If the script used a percentage, I would want to be able to configure 
the percentage outside the script as well.


I would want the script to do the heavy lifting of knowing how to 
extract these values from the LVM volumes, and some information on how 
to put them there.


(Using tags and all of that is not all that common knowledge I think).

Basically, I want the script to know how to set and retrieve properties 
from the LVM volumes.


Then I want it to be easy to see the reserved space (potentially) 
(although this can 

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-15 Thread Xen

Zdenek Kabelac schreef op 14-09-2017 21:05:


Basically user-land tool takes a runtime snapshot of kernel metadata
(so gets you information from some frozen point in time) then it
processes the input data (up to 16GiB!) and outputs some number - like
what is the
real unique blocks allocated in thinLV.


That is immensely expensive indeed.


Typically snapshot may share
some blocks - or could have already be provisioning all blocks  in
case shared blocks were already modified.


I understand and it's good technology.

Yes I mean my own 'system' I generally of course know how much data is 
on it and there is no automatic data generation.


However lvm2 is not 'Xen oriented' tool only.
We need to provide universal tool - everyone can adapt to their needs.


I said that to indicate that prediction problems are not current 
important for me as much but they definitely would be important in other 
scenarios or for other people.


You twist my words around to imply that I am trying to make myself 
special, while I was making myself unspecial: I was just being modest 
there.



Since your needs are different from others needs.


Yes and we were talking about the problems of prediction, thank you.

But if I do create snapshots (which I do every day) when the root and 
boot snapshots fill up (they are on regular lvm) they get dropped 
which is nice,


old snapshot are different technology for different purpose.


Again, what I was saying was to support the notion that having snapshots 
that may grow a lot can be a problem.


I am not sure the purpose of non-thin vs. thin snapshots is all that 
different though.


They are both copy-on-write in a certain sense.

I think it is the same tool with different characteristics.


With 'plain'  lvs output is - it's just an orientational number.
Basically highest referenced chunk for a thin given volume.
This is great approximation of size for a single thinLV.
But somewhat 'misleading' for thin devices being created as 
snapshots...

(having shared blocks)


I understand. The above number for "snapshots" were just the missing 
numbers from this summing up the volumes.


So I had no way to know snapshot usage.

I just calculated all used extents per volume.

The missing extents I put in snapshots.

So I think it is a very good approximation.


So you have no precise idea how many blocks are shared or uniquely
owned by a device.


Okay. But all the numbers were attributed to the correct volume 
probably.


I did not count the usage of the snapshot volumes.

Whether they are shared or unique is irrelevant from the point of view 
of wanting to know the total consumption of the "base" volume.


In the above 6 extents were not accounted for (24 MB) so I just assumed 
that would be sitting in snapshots ;-).



Removal of snapshot might mean you release  NOTHING from your
thin-pool if all snapshot blocks where shared with some other thin
volumes


Yes, but that was not indicated in above figure either. It was just 24 
MB that would be freed ;-).


Snapshots can only become a culprit if you start overwriting a lot of 
data, I guess.


If you say that any additional allocation checks would be infeasible 
because it would take too much time per request (which still seems odd 
because the checks wouldn't be that computation intensive and even for 
100 gigabyte you'd only have 25.000 checks at default extent size) -- 
of course you asynchronously collect the data.


Processing of mapping of upto 16GiB of metadata will not happen in
miliseconds and consumes memory and CPU...


I get that. If that is the case.

That's just the sort of thing that in the past I have been keeping track 
of continuously (in unrelated stuff) such that every mutation also 
updated the metadata without having to recalculate it...


I am meaning to say that if indeed this is the case and indeed it is 
this expensive, then clearly what I want is not possible with that 
scheme.


I mean to say that I cannot argue about this design. You are the 
experts.


I would have to go in learning first to be able to say anything about it 
;-).


So I can only defer to your expertise. Of course.

But the purpose of what you're saying is that the number of uniquely 
owned blocks by any snapshot is not known at any one point in time.


And needs to be derived from the entire map. Okay.

Thus reducing allocation would hardly be possible, you say.

Because the information is not known anyway.


Well pardon me for digging this deeply. It just seemed so alien that 
this thing wouldn't be possible.


I mean it seems so alien that you cannot keep track of those numbers 
runtime without having to calculate them using aggregate measures.


It seems information you want the system to have at all times.

I am just still incredulous that this isn't being done...

But I am not well versed in kernel concurrency measures so I am hardly 
qualified to comment on any of that.


In any case, thank you for you

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-14 Thread Xen

Zdenek Kabelac schreef op 13-09-2017 21:35:


We are moving here in right direction.

Yes - current thin-provisiong does not let you limit maximum number of
blocks individual thinLV can address (and snapshot is ordinary thinLV)

Every thinLV can address  exactly   LVsize/ChunkSize  blocks at most.


So basically the only options are allocation check with asynchronously 
derived intel that might be a few seconds late, as a way to execute some 
standard and general "prioritizing" policy, and an interventionalist 
policy that will (fs)freeze certain volumes depending on admin knowledge 
about what needs to happen in his/her particular instance.


This is part of the problem: you cannot calculate in advance what can 
happen, because by design, mayhem should not ensue, but what if your 
predictions are off?


Great - 'prediction' - we getting on the same page -  prediction is
big problem


Yes I mean my own 'system' I generally of course know how much data is 
on it and there is no automatic data generation.


Matthew Patton referenced quotas in some email, I didn't know how to do 
it as quickly when I needed it so I created a loopback mount from a 
fixed sized container to 'solve' that issue when I did have an 
unpredictable data source... :p.


But if I do create snapshots (which I do every day) when the root and 
boot snapshots fill up (they are on regular lvm) they get dropped which 
is nice, but particularly the big data volume if I really were to move a 
lot of data around I might need to first get rid of the snapshots or 
else I don't know what will happen or when.


Also my system (yes I am an "outdated moron") does not have thin_ls tool 
yet so when I was last active here and you mentioned that tool (thank 
you for that, again) I created this little script that would give me 
also info:


$ sudo ./thin_size_report.sh
[sudo] password for xen:
Executing self on linux/thin
Individual invocation for linux/thin

name   pct   size
-
data54.34% 21.69g
sites4.60%  1.83g
home 6.05%  2.41g
- +
volumes 64.99% 25.95g
snapshots0.09% 24.00m
- +
used65.08% 25.97g
available   34.92% 13.94g
- +
pool size  100.00% 39.91g

The above "sizes" are not volume sizes but usage amounts.

And the % are % of total pool size.

So you can see I have 1/3 available on this 'overprovisioned' thin pool 
;-).



But anyway.


Being able to set a maximum snapshot size before it gets dropped could 
be very nice.


You can't do that IN KERNEL.

The only tool which is able to calculate real occupancy - is
user-space thin_ls tool.


Yes my tool just aggregated data from "lvs" invocations to calculate the 
numbers.


If you say that any additional allocation checks would be infeasible 
because it would take too much time per request (which still seems odd 
because the checks wouldn't be that computation intensive and even for 
100 gigabyte you'd only have 25.000 checks at default extent size) -- of 
course you asynchronously collect the data.


So I don't know if it would be *that* slow provided you collect the data 
in the background and not while allocating.


I am also pretty confident that if you did make a policy it would turn 
out pretty good.


I mean I generally like the designs of the LVM team.

I think they are some of the most pleasant command line tools anyway...

But anyway.

On the other hand if all you can do is intervene in userland, then all 
LVM team can do is provide basic skeleton for execution of some standard 
scripts.



So all you need to do is to use the tool in user-space for this task.


So maybe we can have an assortment of some 5 interventionalist policies 
like:


a) Govern max snapshot size and drop snapshots when they exceed this
b) Freeze non-critical volumes when thin space drops below aggegrate 
values appropriate for the critical volumes

c) Drop snapshots when thin space <5% starting with the biggest one
d) Also freeze relevant snapshots in case (b)
e) Drop snapshots when exceeding max configured size in case of 
threshold reach.


So for example you configure max size for snapshot. When snapshots 
exceeds size gets flagged for removal. But removal only happens when 
other condition is met (threshold reach).


So you would have 5 different interventions you could use that could be 
considered somewhat standard and the admit can just pick and choose or 
customize.




This is the main issue - these 'data' are pretty expensive to 'mine'
out of data structures.


But how expensive is it to do it say every 5 seconds?



It's the user space utility which is able to 'parse' all the structure
and take a 'global' picture. But of course it takes CPU and TIME and
it's 

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Xen

Zdenek Kabelac schreef op 13-09-2017 21:17:


Please if you can show the case where the current upstream thinLV
fails and you lose your data - we can finally start to fix something.


Hum, I can only say "I owe you one" on this.

I mean to say it will have to wait, but I hope to get to this at some 
point.



I'm still unsure what problem you want to get resolved from pretty
small group of people around dm/lvm2 - do you want from us to rework
kernel page-cache ?

I'm simply still confused what kind action you expect...

Be specific with real world example.


I think Brassow Jonathan's idea is very good to begin with (thank you 
sir ;-)).


I get that you say that kernel space solution is impossible to implement 
(apart from not crashing the system, and I get that you say that this is 
no longer the case) because checking several things would prolong 
execution paths considerably, is what you say.


And I realize that any such thing would need asynchronous checking and 
updating some values and then execution paths that need to check for 
such things which I guess could indeed by rather expensive to actually 
execute.


I mean the only real kernel experience I have was trying to dabble with 
filename_lookup and path_lookupat or whatever it was called. I mean 
inode path lookups, which is a bit of the same thing. And indeed even a 
single extra check would have incurred a performance overhead.


I mean the code to begin with differentiated between fast lookup and 
slow lookup and all of that.


And particularly the fast lookup was not something you'd want to mess 
with, etc.


But I absolutely have no issue to begin with I want to say with 
asynchronous 'intervention' even if it is not byte accurate, as you say 
in the other email.


And I get that you prefer user-space tools doing the thing...

And you say there that this information is hard to mine.

And that the "thin_ls" tool does that.

It's just that I don't want it to be 'random' and depending on your 
particular random sysadmin doing the right thing in isolation of all 
other random sysadmins having to do the right thing all in isolation of 
each other all writing the same code.


At the very least if you recognise your responsibility, which you are 
doing now, we can have a bit of a framework that is delivered by 
upstream LVM so the thing comes out more "fully fleshed" and sysadmins 
have less work to do, even if they still have to customize the scripts 
or anything.


Most ideal thing would definitely be something you "set up" and then the 
thing takes care of itself, ie. you only have to input some values and 
constraints.


But intervention in forms of "fsfreeze" or whatever is very personal, I 
get that.


And I get that previously auto-unmounting also did not really solve 
issues for everyone.


So a general interventionalist policy that is going to work for everyone 
is hard to get.


So the only thing that could work for everyone is if there is actually a 
block on new allocations. If that is not possible, then indeed I agree 
that a "one size fits all" approach is hardly possible.


Intervention is system-specific.

Regardless at least it should be easy to ensure that some constraints 
are enforced, that's all I'm asking.


Regards, (I'll respond further in the other email).

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-13 Thread Xen

Zdenek Kabelac schreef op 12-09-2017 23:57:


Users interested in thin-provisioning are really mostly interested in
performance - especially on multicore machines with lots of fast
storage with high IOPS throughput  (some of them even expect it should
be at least as good as linear)


Why don't you hold a survey?

And not phrase it in terms of "Would you like to sacrifice performance 
for more safety?"


But please.

Ask people:

1) What area does the LVM team needs to focus on for thin provisioning:

a) Performance and keeping performance intact
b) Safety and providing good safeguards against human and program error
c) User interface and command line tools
d) Monitoring and reporting software and systems
e) Graphical user interfaces
f) Integration into default distributions and support for booting/grub

And then allow people to score these things with a percentage or to 
distribute some 20 points across these 6 points.


Invent more points as needed.

Give people 20 points to distribute across some 8 areas of interest.

Then ask people what areas are most interesting to them.

So topics could be:
(a) Performance (b) Robustness (c) Command line user interface (d) 
Monitoring systems (e) Graphical user interface (f) Distribution support


So ask people. Don't assume.

(NetworkManager team did this pretty well by the way. They were really 
interested in user perception some time ago).


if you will keep thinking for a while you will at some point see the 
reasoning.


Only if your reasoning is correct. Not if your reasoning is wrong.

I could also say to you, we could also say to you "If you think longer 
on this you will see we are right". That would probably be more accurate 
even.



Repeated again - whoever targets for 100% full thin-pool usage has
misunderstood purpose of thin-provisioning.


Again, no one "targets" for 100% full. It is just an eventuality we need 
to take care of.


You design for failure.

A nuclear plant who did not take account of operator drunkenness and had 
no safety measures in place to ensure that would not lead to 
catastrophe, would be a very bad nuclear plant.


Human error can be calculated into the design. In fact, it must.

DESIGN FOR HUMAN WEAKNESS.

NOT EVERYONE IS PERFECT and human faults happen.

If I was a customer and I was paying your bills, you would never respond 
like this.


We like some assurance that things do not go immediate mayhem the moment 
someone somewhere slacks off and falls asleep.


We like to design in advance so we do not have to keep a constant eye 
out.


We build "structure" so that the structure works for us, and not 
constant vigilance.


Constant vigilance can fail. Structure cannot.

Focus on "being" not "doing".

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Xen

Zdenek Kabelac schreef op 12-09-2017 16:57:


This bug has been reported (by me even to libblkid maintainer) AND
already fixed already in past


I was the one who reported it.

This was Karel Zak's message from 30 august 2016:

"On Fri, Aug 19, 2016 at 01:14:29PM +0200, Karel Zak wrote:
On Thu, Aug 18, 2016 at 10:39:30PM +0200, Xen wrote:
Would someone be will to fix the issue that a Physical Volume from LVM2 
(PV)
when placed directly on disk (no partitions or partition tables) will 
not be


This is very unusual setup, but according to feedback from LVM guys
it's supported, so I will improve blkid to support it too.

Fixed in the git tree (for the next v2.29). Thanks.

Karel"

So yes, I knew what I was talking about.

At least slightly ;-).

:p.



But to defend a bit libblkid maintainer side :) - this feature was not
really well documented from lvm2 side...


That's fine.


You can sync every second to minimize amount of dirty pages

Lots of things  all of them will in some other the other impact
system performance


He said no people would be hurt by such a measure except people who 
wanted to unpack and compile kernel pure in page buffers ;-).



So clearly you need to spend resources effectively and support both 
groups...
Sometimes is better to use large RAM (common laptops have 32G of RAM 
nowadays)


Yes and he said those people wanting to compile the kernel purely in 
memory (without using RAM disk for it) have issues anyway...


;-).

So no it is not that clear that you need to support both groups. 
Certainly not by default.


Or at least not in its default configuration for some dirty page file 
flag ;-).


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Xen

Zdenek Kabelac schreef op 12-09-2017 16:37:


On block layer - there are many things  black & white

If you don't know which process 'create' written page, nor if you write
i.e. filesystem data or metadata or any other sort of 'metadata' 
information,

you can hardly do any 'smartness' logic on thin block level side.


You can give any example to say that something is black and white 
somewhere, but I made a general point there, nothing specific.



The philosophy with DM device is - you can replace then online with
something else - i.e. you could have a linear LV  which is turned to
'RAID" and than it could be turned to   'Cache RAID'  and then even to
thinLV -  all in one raw
on life running system.


I know.


So what filesystem should be doing in this case ?


I believe in most of these systems you cite the default extent size is 
still 4MB, or am I mistaken?



Should be doing complex question of block-layer underneath - checking
current device properties - and waiting till the IO operation is
processed  - before next IO comes in the process - and repeat the
some  in very synchronous
slow logic ??Can you imagine how slow this would become ?


You mean a synchronous way of checking available space in thin volume by 
thin pool manager?



We are targeting 'generic' usage not a specialized case - which fits 1
user out of 100 - and every other user needs something 'slightly'
different


That is completely exaggerative.

I think you will find this issue comes up often enough to think that it 
is not one out of 100 and besides unless performance considerations 
are at the heart of your ...reluctance ;-) no one stands to lose 
anything.


So only question is design limitations or architectural considerations 
(performance), not whether it is a wanted feature or not (it is).




I don't think there is anything related...
Thin chunk-size ranges from 64KiB to 1GiB


Thin allocation is not by default in extent-sizes?


The only inter-operation is the main filesystem (like extX & XFS) are
getting fixed for better reactions for ENOSPC...
and WAY better behavior when there are 'write-errors' - surprisingly
there were numerous faulty logic and expectation encoded in them...


Well that's good right. But I did read here earlier about work between 
ExtFS team and LVM team to improve allocation characteristics to better 
align with underlying block boundaries.


If zpools - are 'equally' fast as thins  - and gives you better 
protection,

and more sane logic the why is still anyone using thins???


I don't know. I don't like ZFS. Precisely because it is a 'monolith' 
system that aims to be everything. Makes it more complex and harder to 
understand, harder to get into, etc.



Of course if you slow down speed of thin-pool and add way more
synchronization points and consume 10x more memory :) you can get
better behavior in those exceptional cases which are only hit by
unexperienced users who tends to intentionally use thin-pools in
incorrect way.


I'm glad you like us ;-).

Yes apologies here, I responded to this thing earlier (perhaps a year 
ago) and the systems I was testing on was 4.4 kernel. So I cannot 
currently confirm and probably is already solved (could be right).


Back then the crash was kernel messages on TTY and then after some 
20-30


there is by default 60sec freeze, before unresized thin-pool start to 
reject

all write to unprovisioned space as 'error' and switches to
out-of-space state.  There is though a difference if you are
out-of-space in data
or metadata -  the later one is more complex...


I can't say whether it was that or not. I am pretty sure the entire 
system froze for longer than 60 seconds.


In page cache there are no thing logically separated - you have 'dirty' 
pages

you need to write somewhere - and if you writes leads to errors,
and system reads errors back instead of real-data - and your execution
code start to run on completely unpredictable data-set - well 'clean'
reboot is still very nice outcome IMHO


Well even if that means some dirty pages are lost before the application 
discovers it, any read or write errors should at some point lead to the 
application to shut down right.


I think for most applications the most sane behaviour would simply be to 
shut down.


Unless there is more sophisticated error handling.

I am not sure what we are arguing about at this point.

Application needs to go anyway.


If I had a system crashing because I wrote to some USB device that was 
malfunctioning, that would not be a good thing either.


Well try to BOOT from USB :) and detach and then compare...
Mounting user data and running user-space tools out of USB is 
uncomparable...


Systems would also grind to a halt from user-data and not system files.

I know booting from USB can be 1000x slower than user data.

But shared page cache for all devices is bad design, period.


AFAIK - this is still not resolved issue...


That's a shame.


You can have 

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-12 Thread Xen

Zdenek Kabelac schreef op 12-09-2017 13:46:


What's wrong with BTRFS


I don't think you are a fan of it yourself.

Either you want  fs & block layer tied together - that the btrfs/zfs 
approach


Gionatan's responses used only Block layer mechanics.


or you want

layered approach with separate 'fs' and block layer  (dm approach)


Of course that's what I want or I wouldn't be here.


If you are advocating here to start mixing 'dm' with 'fs' layer, just
because you do not want to use 'btrfs' you'll probably not gain main
traction here...




You know Zdenek, it often appears to me your job here is to dissuade 
people from having any wishes or wanting anything new.


But if you look a little bit further, you will see that there is a lot 
more possible within the space that you define, than you think in a 
black & white vision.


"There are more things in Heaven and Earth, Horatio, than is dreamt of 
in your philosophy" ;-).


I am pretty sure many of the impossibilities you cite spring from a 
misunderstanding of what people want, you think they want something 
extreme, but it is often much more modest than that.


Although personally I would not mind communication between layers in 
which providing layer (DM) communicates some stuff to using layer (FS) 
but 90% of the time that is not even needed to implement what people 
would like.


Also we see ext4 being optimized around 4MB block sizes right? To create 
better allocation.


So that's example of "interoperation" without mixing layers.

I think Gionatan has demonstrated that pure block layer functionality, 
is possible to have more advanced protection ability that does not need 
any knowledge about filesystems.








We  need to see EXACTLY which kind of crash do you mean.

If you are using some older kernel - then please upgrade first and
provide proper BZ case with reproducer.


Yes apologies here, I responded to this thing earlier (perhaps a year 
ago) and the systems I was testing on was 4.4 kernel. So I cannot 
currently confirm and probably is already solved (could be right).


Back then the crash was kernel messages on TTY and then after some 20-30 
seconds total freeze. After I copied too much data to (test) thin pool.


Probably irrelevant now if already fixed.


BTW you can imagine an out-of-space thin-pool with thin volume and
filesystem as a FS, where some writes ends with 'write-error'.


If you think there is OS system which keeps running uninterrupted,
while number of writes ends with 'error'  - show them :)  - maybe we
should stop working on Linux and switch to that (supposedly much
better) different OS


I don't see why you seem to think that devices cannot be logically 
separated from each other in terms of their error behaviour.


If I had a system crashing because I wrote to some USB device that was 
malfunctioning, that would not be a good thing either.


I have said repeatedly that the thin volumes are data volumes. Entire 
system should not come crashing down.


I am sorry if I was basing myself on older kernels in those messages, 
but my experience dates from a year ago ;-).


Linux kernel has had more issues with USB for example that are 
unacceptable, and even Linus Torvalds himself complained about it. 
Queues filling up because of pending writes to USB device and entire 
system grinds to a halt.


Unacceptable.


You can have different pools and you can use rootfs  with thins to
easily test i.e. system upgrades


Sure but in the past GRUB2 would not work well with thin, I was basing 
myself on that...


I do not see real issue with using thin rootfs myself but grub-probe 
didn't work back then and OpenSUSE/GRUB guy attested to Grub not having 
thin support for that.



Most thin-pool users are AWARE how to properly use it ;)  lvm2 tries
to minimize (data-lost) impact for misused thin-pools - but we can't
spend too much effort there


Everyone would benefit from more effort being spent there, because it 
reduces the problem space and hence the burden on all those maintainers 
to provide all types of safety all the time.


EVERYONE would benefit.


But if you advocate for continuing system use of out-of-space
thin-pool - that I'd probably recommend start sending patches...  as
an lvm2 developer I'm not seeing this as best time investment but
anyway...


Not necessarily that the system continues in full operation, 
applications are allowed to crash or whatever. Just that system does not 
lock up.


But you say these are old problems and now fixed...

I am fine if filesystem is told "write error".

Then filesystem tells application "write error". That's fine.

But it might be helpful if "critical volumes" can reserve space in 
advance.


That is what Gionatan was saying...?

Filesystem can also do this itself but not knowing about thin layer it 
has to write random blocks to achieve this.


I.e. filesystem may guess about thin layout underneath and just write 1 
byte to each block it wants to allocate.


But 

Re: [linux-lvm] Reserve space for specific thin logical volumes

2017-09-08 Thread Xen

Gionatan Danti schreef op 08-09-2017 12:35:

Hi list,
as by the subject: is it possible to reserve space for specific thin
logical volumes?

This can be useful to "protect" critical volumes from having their
space "eaten" by other, potentially misconfigured, thin volumes.

Another, somewhat more convoluted, use case is to prevent snapshot
creation when thin pool space is too low, causing the pool to fill up
completely (with all the associated dramas for the other thin
volumes).


For my 'ideals' thin space reservation (which would be like allocation 
in advance) would definitely be a welcome thing.


You can also think of it in terms of a default pre-allocation setting. 
I.e. every volume keeps a bit of space over-allocated while only doing 
so if there is actually room in the thin volume (some kind of lazy 
allocation?).


Of course not trying to steal your question here and I do not know if 
any such thing is possible but it might be and I wouldn't mind hearing 
the answer as well.


No offense intended. Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-23 Thread Xen

Zdenek Kabelac schreef op 22-04-2017 23:17:

That is awesome, that means a errors=remount-ro mount will cause a 
remount right?


Well 'remount-ro' will fail but you will not be able to read anything
from volume as well.


Well that is still preferable to anything else.

It is preferable to a system crash, I mean.

So if there is no other last rather, I think this is really the only 
last resort that exists?


Or maybe one of the other things Gionatan suggested.


Currently lvm2 can't support that much variety and complexity...


I think it's simpler but okay, sure...

I think pretty much anyone would prefer a volume-read-errors system 
rather than a kernel-hang system.


It is just not of the same magnitude of disaster :p.


The explanation here is simple - when you create a new thinLV - there
is currently full suspend - and before 'suspend' pool is 'unmonitored'
after resume again monitored - and you get your warning logged again.


Right, yes, that's what syslog says.

It does make it a bit annoying to be watching for messages but I guess 
it means filtering for the monitoring messages too.


If you want to filter out the recurring message, or check current thin 
pool usage before you send anything.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-19 Thread Xen

Zdenek Kabelac schreef op 18-04-2017 12:17:


Already got lost in lots of posts.

But  there is tool  'thin_ls'  which can be used for detailed info
about used space by every single  thin volume.

It's not support directly by 'lvm2' command (so not yet presented in
shiny cool way via 'lvs -a') - but user can relatively easily run this
command
on his own on life pool.


See for usage of


dmsetup message /dev/mapper/pool 0
[ reserve_metadata_snap | release_metadata_snap ]

and 'man thin_ls'


Just don't forget to release snapshot of thin-pool kernel metadata
once it's not needed...

There are two ways: polling a number through some block device command 
or telling the filesystem through a daemon.


Remounting the filesystem read-only is one such "through a daemon" 
command.




Unmount of thin-pool has been dropped from upstream version >169.
It's now delegated to user script executed on % checkpoints
(see 'man dmeventd')


So I write something useless again ;-).

Always this issue with versions...

So Let's see, Debian Unstable (Sid) still has version 168 as does 
Testing (Stretch).

Ubuntu Zesty Zapus (17.04) has 167.

So for the foreseeable future both those distributions won't have that 
feature at least.


I heard you speak of those scripts yes but I did not know when or what 
yet, thanks.


I guess my script could be run directly from the script execution in the 
future then.


Thanks for responding though, much obliged.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-14 Thread Xen

Stuart D. Gathman schreef op 13-04-2017 19:32:

On Thu, 13 Apr 2017, Xen wrote:


Stuart Gathman schreef op 13-04-2017 17:29:

 IMO, the friendliest thing to do is to freeze the pool in read-only 
mode

 just before running out of metadata.


It's not about metadata but about physical extents.

In the thin pool.


Ok.  My understanding is that *all* the volumes in the same thin-pool
would have to be frozen when running out of extents, as writes all
pull from
the same pool of physical extents.


Yes, I simply tested with a small thin pool not used for anything else.

The volumes were not more than a few hundred megabytes big, so easy to 
fill up.


Putting a file copy to one of the volumes that the pool couldn't handle, 
the system quickly crashed.


Upon reboot it was neatly filled 100% and I could casually remove the 
volumes or whatever.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Xen

Stuart Gathman schreef op 13-04-2017 17:29:

IMO, the friendliest thing to do is to freeze the pool in read-only 
mode

just before running out of metadata.


It's not about metadata but about physical extents.

In the thin pool.


While still involving application
level data loss (the data it was just trying to write), and still
crashing the system (the system may be up and pingable and maybe even
sshable, but is "crashed" for normal purposes)


Then it's not crashed. Only some application that may make use of the 
data volume may be crashed, but not the entire system.


The point is that errors and some filesystem that has errors=remount-ro, 
is okay.


If a regular snapshot that is mounted fills up, the mount is dropped.

System continues operating, as normal.


, it is simple to
understand and recover.   A sysadmin could have a plain LV for the
system volume, so that logs and stuff would still be kept, and admin
logins work normally.  There is no panic, as the data is there 
read-only.


Yeah a system panic in terms of some volume becoming read-only is 
perfectly acceptable.


However the kernel going entirely mayhem, is not.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Xen

Zdenek Kabelac schreef op 13-04-2017 16:33:


Hello

Just let's repeat.

Full thin-pool is NOT in any way comparable to full filesystem.

Full filesystem has ALWAYS room for its metadata - it's not pretending
it's bigger - it has 'finite' space and expect this space to just BE
there.

Now when you have thin-pool - it cause quite a lot of trouble across
number of layers.  There are solvable and being fixed.

But as the rule #1 still applies - do not run your thin-pool out of
space - it will not always heal easily without losing date - there is
not a simple straighforward way how to fix it (especially when user
cannot ADD any new space he promised to have)

So monitoring pool and taking action ahead in time is always superior
solution to any later  postmortem systems restores.


Yes that's what I said. If your thin pool runs out, your system will 
crash.


Thanks for alluding that this will also happen if a thin snapshot causes 
this (obviously).


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM

2017-04-13 Thread Xen

Gionatan Danti schreef op 13-04-2017 12:20:


Hi,
anyone with other thoughts on the matter?


I wondered why a single thin LV does work for you in terms of not 
wasting space or being able to make more efficient use of "volumes" or 
client volumes or whatever.


But a multitude of thin volumes won't.

See, you only compared multiple non-thin with a single-thin.

So my question is:

did you consider multiple thin volumes?

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM thin pool advice

2017-02-15 Thread Xen

David Shaw schreef op 15-02-2017 1:33:


Is there some way to cap the amount of data that the snapshot can
allocate from the pool?  Also, is there some way to allocate enough
metadata space that it can't run out?  By way of analogy, using the
old snapshot system, if the COW is sufficiently large (larger than the
volume being snapshotted), it cannot overflow because even if every
block of the original volume is dirtied, the COW can handle all of it.
 Is there some similar way to size the metadata space of a thin pool
such that overflow is "impossible"?


Personally I do not know the current state of affairs but the response 
I've often got here is that there is no such mechanic and it is up to 
the administrator to find out.


Maybe this is a bit ghastly to say it like this, my apologies.

I would very much like to be called wrong here.

The problem is although the LVM monitor (I think) does respond, or can 
be configured to respond to a "thin pool fillup" it does so as a kind of 
daemon, a watch-dog, but it is not an in-system guard.


Typically what I've found in the past is that a fill-up will just hang 
your system.


So I am probably very wrong about some things so I would rather let the 
developers answer.


But as you've found it, the snapshot for a thin volume is always 
allocated with the same size as the origin volume. That means unless you 
have double the space available, your system can crash.


I have personally once ventured -- but I am just some by-stander right 
-- that a proper solution would have to involve inter-layer 
communication between filesystems and block devices, but that is even 
outside of the problem here. The problem as far as I can see it is that 
there is very unexpected behaviour when the thin pool fills up.


Zdenek once pointed out that the allocator does not have a full map of 
what is available. For efficiency reasons, it goes "in search" of the 
next block to allocate. (Next extent).


It does so in response to a filesystem read or write (a write, 
supposedly). The filesystem knows of no limits in the thin pool and 
expects sufficient behaviour. The block layer (in this case LVM) can 
respond with failure or success but I do not know how it is handled or 
what results it produces when the thin pool is full and no new blocks 
can be allocated.


However I expect your system to freeze when the snapshot allocates more 
space than is available. I think the designated behaviour is for the 
snapshot to be dropped but I doubt this happens?


After all the snapsnot might be mounted, etc?...

It seems to me the first thing to do is to create safety margins, but 
then... I do not develop this thing right now :p.


I think what is required is advance-allocation where each (individual) 
volume allocates a pre-defined number of blocks in advance. Then, any 
out of space message from the thin volume manager would implicate the 
pre-allocation and not the actual allocation for the filesystem.


You create a bit of a buffer. In time. Once the individual pool 
allocator knows the thin pool is having problems, but it still has 
extents available to itself that it pre-allocated, it can already start 
informing the filesystem -- ideally -- that there is mayhem to be 
coming.


But also it means that a snapshot could recognise problems ahead of time 
and be told that it needs to start failing if a certain minimum of free 
space is not to be found.


But also, all of this requires that the central thin volume manager 
knows ahead of time, or in any case, at any single moment, how many 
extents are available. If this is concurrently done and there are many 
such allocators operating, all of them would need to operate on 
synchronized numbers of available space. Particularly when space is 
running out I feel there should be some sort of emergency mode where 
restrictions start to apply.


It is just unacceptable to me that the system will crash when space runs 
out. In case of a depleted thin pool, any snapshot should really be 
discarded by default I feel. Otherwise the entire thin pool should be 
readily frozen. But why the system should crash on this is beyond me.


My apologies for this perhaps petulant message. I just think it should 
not be understated how important it is that a system does not crash,


and I just was indicating that in the past the message has often been 
that it is _your_ job to create safety.


But this is slightly impossible. This would indicate... well whatever.

The failure case of a filled-up thin pool should not be relegated to the 
shadows.


I hope to be made wrong here and good luck with your endeavour.  I would 
suggest that a thin pool is very sexy ;-). But thus far there are no 
safeguards.



Please be advised that I do not know if such limits currently exist that 
you ask of. I have just been told here that the thin snapshot is of 
equal size to origin volume and there is nothing you can do about it?


Regards.


Re: [linux-lvm] inner VG inside chroot not visible inside chroot

2017-01-06 Thread Xen

Xen schreef op 07-01-2017 1:48:


Why does it not show the mounted LV in the "lvs" table and why does it
not show the PV that IS visible in the "lvs" table in the output of
"vgdisplay -v" ?


I am sorry.

I had followed Zdenek's advice at some point and had edited some config 
file to set filters.


It was no help at the time for whatever reason but I had forgotten I had 
set it and it had made it to the new system. At which point it caused 
the behaviour described above.


Being new to filters I could not imagine what was causing it ;-).

So much time lost again... and my filesystem is now messed up because of 
the way the old LVM (133) was constantly "exchanging" physical volumes.


The filesystem would enter read-only state but not before causing 
gigantic corruption.


Only to files opened or in use at the time (so mostly cache files etc) 
but still.


lost+found now contains 505 files/dirs on this disk.

And now I am left with the task of returning this filesystem to a proper 
state... it doesn't really get better, this life.


Sorry. Another problem to fix and no time for any of it.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] how to change UUID of PV of duplicate partition (followup)

2017-01-06 Thread Xen

Zdenek Kabelac schreef op 06-01-2017 21:06:


You can always 'fix it' you way.

Just setup  device filter in  lvm.conf  so you either  see diskA or 
diskB


In you particular  case:

 filter = [ "r|/dev/sdb4|" ]

or

 filter = [ "r|/dev/sdc4|" ]


And set/change things with your existing lvm2 version


Oh right, thank you man. I was looking for that but barring an option 
available on pvchange or vgimportclone itself I just decided to go the 
easy package upgrade route with David's help or suggestion.


I knew that should have been possible, so thanks. I just hadn't gotten 
around to diving into the config file yet. Much appreciated.


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] how to change UUID of PV of duplicate partition (followup)

2017-01-06 Thread Xen

David Teigland schreef op 06-01-2017 20:19:

On Fri, Jan 06, 2017 at 08:10:20PM +0100, Xen wrote:

This is what I mean:

  Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 
not

/dev/sdc4


The handling of duplicate PVs has been entirely redone in recent 
versions.

The problems you are having are well known and should now be fixed.


Oh right, I was going to write my LVM version, but did not manage to 
produce it yet, sorry :p.


Good to know. Yeah I am using 16.04 from Ubuntu's version.

  LVM version: 2.02.133(2) (2015-10-30)
  Library version: 1.02.110 (2015-10-30)
  Driver version:  4.34.0

Sigh... I guess I'll have to make some time to start using a recent 
version then on this system. Perhaps it was due time. Always these old 
versions...


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] duplicate pv change uuid

2017-01-06 Thread Xen

David Teigland schreef op 06-01-2017 20:04:


How does one do this, again?


vgimportclone is meant to do this.  It strategically uses filters to
isolate and modify the intended devs.


Thanks.

I thought running it in the current situation would be enough.

For some reason I am not getting my own emails though. Basically I am 
not getting list emails, only direct replies atm.


So I will CC myself I guess.


  Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 
not /dev/sdc4

  Using duplicate PV /dev/sdb4 without holders, replacing /dev/sdc4
  Volume group containing /dev/sdb4 has active logical volumes
  Physical volume /dev/sdb4 not changed
  0 physical volumes changed / 1 physical volume not changed

This was after running vgimportclone /dev/sdc4

It immediately replaces the one I target with the one I am using, (don't 
need to change, don't need) and then says it's active, duh.


I guess I'll just need to reboot a live DVD or actually OpenSUSE DVDs 
are faster.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] how to change UUID of PV of duplicate partition (followup)

2017-01-06 Thread Xen

I mean,

This is what I mean:

  Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 
not /dev/sdc4

  Using duplicate PV /dev/sdb4 without holders, replacing /dev/sdc4
  Volume group containing /dev/sdb4 has active logical volumes
  Physical volume /dev/sdb4 not changed
  0 physical volumes changed / 1 physical volume not changed

It immediately replaced the good PV with the bad PV (that I was trying 
to change) so I cannot actually get to the "bad" PV (which is duplicate) 
to change it without booting an external system in which I can effect 
one disk in isolation.


But, after running that command my root filesystem was now mounted 
read-only instantly so even just attaching the disk basically causes the 
entire system to instantly fail.


Real good right.

Probably my entire fault right :-/.

"Let's cause this system to crash, we'll attach a harddisk." "Job done!"

Actually I guess in this case it replaced the bad with the good but 
behind the scenes something else happened as well. This time it is 
hiding /dev/sdc4, the other time it was hiding /dev/sdb4, it seems to be 
random.


Basically any eSata system that a disk gets attached to could cause the 
operating system to fail. The same would probably be true of regular USB 
disks.


Even inserting a USB stick could crash a system like this.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] pvcreate: vfat signature detected on /dev/sda5

2016-12-09 Thread Xen

John L. Poole schreef op 10-12-2016 4:35:


   zeta jlpoole # pvcreate /dev/sda5
   WARNING: vfat signature detected on /dev/sda5 at offset 54. Wipe it? 
[y/n]: n

 Aborted wiping of vfat.
 1 existing signature left on the device.
 Aborting pvcreate on /dev/sda5.
   zeta jlpoole #

Is the warning something I need to be concerned about when
creating a physical volume on /dev/sda5?  I'm wondering if it was
looking at another
partition and bubbling up just as a precaution.


I am assuming this is a pretty standard thing; and nothing you need to 
worry about. If you wanted to be sure, you could create a dd of the 2nd 
partition, or of the first 131 MB.


Then after PV create you could do a diff on the saved data and the real 
data still on disk, but I am going to assume -- as a layman -- that LVM 
is not messing up the messages; it is really talking about sda5, and 
there is nothing going to happen. I encounter spurious signatures all 
the time, most of the time I do not even worry how it got there.


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] boot volume in activated containing PV does not activate

2016-11-07 Thread Xen

Linda A. Walsh schreef op 07-11-2016 17:58:

Xen wrote:
Now at boot (prior to running of systemd) I activate both 
coll/msata-lv and msata/root.

---
   Systemd requires that it be PID 1 (the first process to run
on your system).  I'm told that it won't run unless that is true.  So
how can you do anything before systemd has been run?


Hey lover :p.

Because I do it in the initrd. I'm not sure how much of that is "handed 
over" to systemd but it makes a big difference. For example, 
pre-activating one half of my (embedded) mirrors was the key to making 
sure the entire mirror is activated properly upon boot.


If not, I would have to run something like vgchange --refresh or 
lvchange --refresh or something of the kind.







This works fine when both mirrors are present.

As a test I removed the primary (SSD) mirror of the root volume (and 
the boot volume). Now the system still boots (off of another disk 
which has grub on it, but only the grub core and boot image, nothing 
else, so it still references my VG) and the root volume still gets 
activated but the boot volume doesn't get activated anymore.


What can cause this? It ought to get activated by udev rules.

You remember I patched the udev file to ensure a PV directly on disk 
always gets activated but this is not that disk.


Could it be that the msata/boot volume doesn't get activated because 
the PV had already been activated in the initrd but only as an LV and 
not its volumes?

   Sounds very similar to opensuse's requirement that /usr be on
the same partition as root -- if not, then you have to boot using
a ramdisk, which mounts /usr, and then does the real system boot, so
of course, booting directly from disk (which is what my machine does)
is not supported.  I also mount "/usr" as a first step in my boot
process, but that disallows a systemd boot, which some define as
systemd being pid 1.


So you skip an initrd right(or initramfs). My Ubuntu thing just mounts 
itself as /, then pivots to the real root after activating some stuff.


On Ubuntu (and Debian) you create hooks and scripts and then they just 
get embedded into initramfs and that just gets loaded prior to systemd 
doing anything.


It was said that LVM (or actually, SystemD and UDev) just processes all 
things again; all devices all passed through the udev system once more, 
and so everything is getting activated or at least processed (creates a 
udev trigger for something to happen).


The strange thing was -- and I was just a bit confused just a moment 
about about a similar Grub question I asked somewhere -- that...


while booting from my regular msata thing that had the first half of the 
mirrors, and while pre-activating the outer-embedding PV (and its LV) 
that is used for that second half, this was enough to get the boot 
volume completely activated without explicitly doing so.


Meaning, the second PV to "msata" volume group was now activated and 
this PV contains both the second half of msata/root and the second half 
of msata/boot. As a consequence, when boot is activated by systemd, it 
finds the volume (PV) just fine.


But now for some strange reason it does not, so now that I remove the 
msata disk, the PV for it (the second half that is still there) is still 
getting activated but now that the first half is not found, and 
triggered, so to speak...


the second half suddenly does not prompt a systemd activation anymore. 
Actually systemd is no part of it at all, it is just the udev rule that 
rules the pvchange.


So *apparently* there is no udev rule being run on the second-half PV 
(that contains both second-halves) and the *root* one has already been 
activated by my initrd (ramfs) so there's no problem there.


The boot one (which is just one half at this point) does not get 
activated at all. Which leads me to leave the udev rule misses it.



   I'm told systemd has changed the meaning of 'fstab' to not
be a table of disks to mount, but to be a table of disks to keep 
mounted

(even if umount'd by an administrator).  Since udev's functionality was
incorporated into systemd, might it not be doing something similar --
trying to maintain the boot-state in effect before you added the 
mirror?


I don't think so. At this point there is no mirror because I am only 
providing one half. I am pretty sure the bootstate is maintained if I 
were to preactivate the boot LV in the initrd(ramfs).


I am sorry for not having responded to your email of last January. I 
guess I'm living in a slow-time bubble. There was a reason I couldn't 
respond, I don't remember what it was.


Seems like 3 days ago for me. Nothing much happens except for terrible 
things. Everything moves as fast as my feet can take me, which is not 
very fast.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] creating DD copies of disks

2016-09-17 Thread Xen

Michael D. Setzer II schreef op 17-09-2016 17:18:


With windows systems, there seem to be more issues. On the same machine
it works fine, but moving to systems that are just a little different 
sometimes

results in various messages.


Windows 10 has less issues than before. You can move a system from a 
"regular" disk to a "firmware raid" disk (same disk with a little bit of 
firmware data at the end, and run from a different controller) and if 
your raid drivers are already installed, it will boot without issue 
whatsoever.


Anyway, thanks for your response.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] creating DD copies of disks

2016-09-17 Thread Xen

Xen schreef op 17-09-2016 16:16:


I just won't be able to use the old system until I change it back.
Time to test, isn't it.


Indeed both Linux and Windows have no issues with the new disk, I think.

I was making a backup onto a bigger-sized disk, but neither Windows nor 
Linux have an issue with it. They just see the old partition table of 
the old disk and are fine with that.


Now I only need to know if the Linux system will run with the new disk, 
but I'm sure there won't be issues there now. The system is not actually 
*on* that disk.


Call it a data disk for an SSD system.

So perhaps you can see why I would want to have the two disks loaded at 
the same time:


- if they work, I can copy data even after the DD (perhaps, to make some 
rsync copy as you indicate) but now I already have all the required 
structures (partition tables...) without any work.


- I wasn't actually yet done with the old disk. This was also just 
research. Better make a backup first before you finalize things. I want 
to do more work on the "old" system before I finalize things. But having 
a backup sitting there that I can use is a plus. Having to not be able 
to use both disks at the same time is a huge detriment. It is also not 
that hard now to change the system back but I still don't know how I can 
manually change the UUIDs that a VG/LV references. It is a huge plus if 
I can just exchange one PV with another at will.





___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] creating DD copies of disks

2016-09-17 Thread Xen

Lars Ellenberg schreef op 17-09-2016 15:49:

On Sat, Sep 17, 2016 at 09:29:16AM +0200, Xen wrote:

I want to ask again:

What is the proper procedure when duplicating a disk with DD?


depends on what you define as "proper",
what the desired outcome is supposed to look like.

What exactly are you trying to do?

If you intend to "clone" PVs of some LVM2 VG,
and want to be able to activate that on the same system
without first deactivating the "original",
I suggest:

1) create consistent snapshot(s) or clone(s) of all PVs
2) import them with "vgimportclone",
which is a shell script usually in /sbin/vgimportclone,
that will do all the neccessary magic for you,
creating new "uuid"s and renaming the vg(s).


Right so that would mean first duplicating partition tables etc.

I will check that out some day. At this point it is already done, 
mostly. I didn't yet know you could do that, or what a "clone" would be, 
so thank you.




- my experience indicates that pvchange -ay on a PV that contains a
duplicate VG, even if it has a different UUID, but with an identical 
name,

creates problems.


You don't say...


I do say but this is very common and you can run into it without 
realizing, e.g. as you open some loopback image of some other system and 
you hadn't realized it would contain an identically named VG as your own 
system.


The issue is not that the problem happens, the issue is that you can't 
recover from it.


After both VGs are activated, in my experience, you are screwed. You may 
not be able to rename the 2nd PV, or even the first. I mean the VG 
sitting in that PV.


Sometimes it means having to reboot the system and then doing it again 
while renaming your own VG prior to loading the alien one.


This "you need foresight" situation is not very good.

Perhaps you can deactivate the new VG and close the PV and clear it from 
the cache; I'm not sure, back then my "skill" was not as great.


The problem really is that LVM will activate a second VG with the same 
name *just fine* without renaming it internally or even in display.


However, once it is activated, you are at a loss. So it will happily, 
without you being able to know about it in advance, create a difficult 
to reverse situation for you.


What if you *are* doing forensics (or recovery) as the Matthew person 
indicated? Are you now to give your own VGS completely unique names? 
Just so you can prevent any conflicts? Not a good situation.


LVM should really auto-rename conflicting VGS that get loaded after 
activation of the original ones, however it is hard to pick which one 
that should be, perhaps.


At least, maybe it should bolt before activating a duplicate and then 
require manual intervention.


Or, just make it easier to recover from the situation. It is just 
extremely common if you ever open an image of another disk (particularly 
if it's your own) or if you are doing anything with default "ubuntu-vg" 
or "kubuntu-vg" systems, in that sense.


I had a habit of calling my main VGs "Linux". Not any longer.

I now try to specify the system they are from, no matter how ugly.

Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] creating DD copies of disks

2016-09-17 Thread Xen

matthew patton schreef op 17-09-2016 15:04:

What is the proper procedure when duplicating a disk with DD?


If you're literally duplicating the entire disk, what on earth are you
doing keeping it in the same system?


That's very simple, I'm surprised you wouldn't see it.


OF COURSE you remove it from the
origin box if you expect to do anything with it.


Why would you? That's like making a photo-copy of something and then 
moving to another house before you can read it.


If anything, it's only technical limitations that would mandate such a 
thing. This also doesn't answer the question of what to do if you have 
VG with identical names.



And I presume there
are no active filesystems or frankly writable LVM components on the
source disk while the DD is running?


Nope. All VGS had been deactivated (was running from a bootable stick).



Most times it's only the filesystems that contain interesting data an
so a DD of the filesystem is understandable even though there are
other tools like RSYNC which are more logical.


Trouble is making a backup of a complex setup is also complex if you 
don't have the required tools for it and even "clonezilla" cannot really 
handle LVM that well. So you're down to manually writing scripts to do 
all of the steps that you need to do to back up the required data (e.g. 
LVM metadata, and such) and then the steps to recreate it when you 
restore a backup (if any).


So in this case I was just making a backup of a disk because I might be 
needing to send the origin disk out for repair, so to speak. The disk 
contains various partitions and LVM structures. A clonezilla backup is 
possible, but cannot handle encryption. But because the new disk is 
meant to replace the old one (for a time) I need a literal copy anyway. 
Now of course I could clone the non-LVM partitions and then recreate 
volume groups etc. with different names, but this is arduous.


In that case I would have unique UUIDs but would still need to change my 
new volume group names so the systems can coexist while the copy is 
running.


At this point I'm not even sure well.

Let's just say I need to ensure the operation of this disk in this 
system completely prior to dumping the old one. There are only two ways: 
disconnect the source disk (and try to boot from the new system, etc.) 
or run from usb stick and disconnect the source disk, in that case. But 
if issues arise, I may need the source disk as well. Why would there not 
be an option to have it loaded at the same time? They are separate 
disks, and should ideally not directly conflict. In the days prior to 
UUID, this never happened; there were never any conflicts in that sense 
(unless you use filesystem labels and partition labels of course).


So I first want to settle into a peaceful coexistence because that is 
the most potent place to move forward from, I'm sure you understand. 
First cover the basics, then move on.


One answer well.

In any case it is clear that after changing the UUID of the PV and VG 
and changing the VG name, the duplicate disk can serve just fine for the 
activation of certain things, because LVM doesn't care what your VG is 
called, it will just find your LV by its UUID, if that makes sense.


So the duplicate LVS still have identical UUIDs and hence still perform 
in the old way (and cannot really coexist now).


However it seems not possible to change the UUID of a LV.

https://www.redhat.com/archives/rhl-list/2008-March/msg00329.html

Not answered to satisfaction. Why would you need to use two different 
systems to copy data between two disks? That seems hardly possible.


I have now two VGS with different UUIDs:

VG UUID   jrDQRC-6tlI-n1xK-O7nh-xVAt-Y5SL-Ou8X7b
VG UUID   KyWokE-ddUN-8GXO-HgWA-5bqU-9HN2-57Qyho

But when I allow the 2nd one (the new one) to be activated, and activate 
something else as well, its LVs will be used just fine as PV for 
something else, based on UUID and nothing else.



Indeed, blkid will show them as having identical UUIDs.

Now I had forgot to run pvchange -u on those LVs, so I guess Alistair 
was right in that thread. But the pvchange -u also instantly updated the 
VGS that referenced it; which is not so bad, but now the system will run 
with the new disk, and not the old disk.


But that means part of the "migration" is at least complete from this 
point of view. So thank you.


Now that Linux has no issues whatsoever I will have to see what Windows 
is going to do.


It's nice to know that when you change the UUID of a LV that is used as 
PV for something else, that something else is updated automatically. 
That was part of my question: how do I inform the "user" of the changed 
PVS (UUIDS)?


So what I have now is basically a duplicate disk but all the UUIDs are 
different (at least for the LVM).


But generally I do not mount with UUID so for the partitions it is not 
really an issue now.


The backup was also made for safety at this point.

I just 

[linux-lvm] creating DD copies of disks

2016-09-17 Thread Xen

I want to ask again:

What is the proper procedure when duplicating a disk with DD?

- after duplication you cannot update a PV with pvchange -u because it 
will supersede your duplicate with the original and not do anything


So to do that you need to boot off a different system, deactivate loaded 
vgs (if any) and then pvchange -u the duplicate PV.


- my experience indicates that pvchange -ay on a PV that contains a 
duplicate VG, even if it has a different UUID, but with an identical 
name, creates problems.


I mean that anytime you load a VG with the same name you get issues. VG 
names are often standardized between installs, so that Ubuntu might have 
"ubuntu-vg" as the name and kubuntu might have "kubuntu-vg" as the name.


So if you then load two of those disks in the same system, you instantly 
have issues.


If the system were to auto- or temporary-rename an offending 2nd VG it 
wouldn't be so bad. But usually you have to vgrename rename your current 
VG in advance of loading a second disk.


Which isn't exactly as intended, because now you are changing your local 
name to make room for a second system, when it should really be the 
other way around.


In the end I feel I have to do:

pvchange -u
vgchange -u
vgrename

To get something that will at least not bug me when both systems are 
loaded at the same time.


This then renders it impossible to use it as a backup because any other 
disk referencing the PV will not find it because the UUID has changed.


Now you would first have to reverse these operations (particularly the 
vgrename and pvchange -u) towards the data of the first disk (the 
original) to be able to use the device again.


All of that is not very resilient. Now both PV UUIDs and VG names have 
to be unique.


Particularly I wonder how easy it is to point an existing VG to a disk 
that has a new (duplicate) PV and tell it: use that one from now on.


I mean: how can I add a disk that has a duplicate PV with a different 
UUID and add it to the VG in such a way that it replaces the references 
that VG has for the old PV?


But also: what ought you to do if creating a mirror copy? (duplicate 
copy).



___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm2 raid volumes

2016-09-16 Thread Xen

Heinz Mauelshagen schreef op 16-09-2016 16:13:


Yes, looks like you don't have the 2nd PV accessible by the time when
the raid1 is being discovered and initially activated,
hence the superblock can't be retrieved.

These messages seem to be coming from initramfs, so check which driver
is missing/not loaded to access the 2nd PV.

The fact that you gain access to the raid1 completely after reboot (as
you mention further down) tells the
aforementioned fact is reasoning this degraded activation.
I.e. disk driver loaded after root pivot.
Please ensure it is available in the initramfs and loaded.

Heinz


Yes, thank you.

The problem was that the VG that contained the PV used as the 2nd PV was 
not getting activated at initramfs time.


I solved it now by creating some hooks that would obtain a hierarchical 
PV list from a running system and then ensure all PVs in that list that 
were also LVs, would get activated prior to the root device.


The issue is really that (on Ubuntu) LV activation is very selective in 
the initramfs. Of course it is an embedded or "enclosed" setup, maybe it 
is not recommended. Regardless the only issue was that LVs are getting 
selectively activated (only root and swap).


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM cache/dm-cache questions.

2016-08-29 Thread Xen

lejeczek schreef op 29-08-2016 16:22:


I cannot debug now as for now I've given up the idea to encrypt this
LV, but  I would say is should be easily reproducible (maybe even
waste of time looking at my setup)


I can definitely say I have encrypted a cached volume without issue.

I have had a disk with two cached volumes, if you want to know, both 
caches originating from the same SSD.


Main caches (main origin volume) were on a HDD in a simple regular 
non-thin LVM.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] force umount (2)

2016-08-28 Thread Xen

Here is the latest version of my force umount script lol.


Liberating the free from the oppressed.

Sending TERM signal to -bash
Sending TERM signal to sudo su

Session terminated, terminating shell...Sending TERM signal to su
 ...terminated.
sagemode@perfection:~$ Sending TERM signal to bash
Liberating the oppressed from the free. There are programs remaining.

Connection to sagemode.net closed by remote host.
Connection to sagemode.net closed.



Sorry for being so late with it ;-).

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


[linux-lvm] rudimentary kill script

2016-08-27 Thread Xen
Here is a quick way to kill all processes that might have an open file 
on some volume you want to umount and close.


kill_all_dm() {
hexdev=$(dmsetup info -c -o major,minor $1 --noheadings | tr ':' ' ' 
| { read a b; printf "D0x%02x%02x\n" $a $b; })
process_list=$(lsof -F cDn0 | tr '\0' '\t' | awk -F '\t' '{ if 
(substr($1,1,1) == "p") P=$1; else print P "\t" $0 }' | grep $hexdev | 
awk '{print $1}' | sed "s/^p//")

[ -n "$process_list" ] && echo "$process_list" | xargs kill -9
}

It works on logical volumes and device mapper (paths) equally, but you 
must specify the path:


kill_all_dm /dev/linux/root ;-).



Alternatively this will just list all open processes:

find_processes() {
hexdev=$(dmsetup info -c -o major,minor $1 --noheadings | tr ':' ' ' 
| { read a b; printf "D0x%02x%02x\n" $a $b; })
process_list=$(lsof -F cDn0 | tr '\0' '\t' | awk -F '\t' '{ if 
(substr($1,1,1) == "p") P=$1; else print P "\t" $0 }' | grep $hexdev | 
awk '{print $1}' | sed "s/^p//")

[ -n "$process_list" ] && echo "$process_list" | xargs ps
}

find_processes /dev/linux/root

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM cache/dm-cache questions.

2016-08-26 Thread Xen

lejeczek schreef op 26-08-2016 16:01:

whatever you might call it, it works, luks encrypting, opening &
mounting @boot - so I only wonder (which was my question) why not
cache pool LVs. Is it not supported...
would be great if a developer sees this question, I'm not sure jut yet
about filing a bug report.


Ondrej has it down.

You can only encrypt the combined volume, not the individual parts, 
unless you encrypt those at the PV level.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] LVM cache/dm-cache questions.

2016-08-26 Thread Xen

lejeczek schreef op 26-08-2016 14:45:

well, I prefer to encrypt LV itself, and I'm trying the same what
worked always with my "regular" LVs, yet - "cache pools LVs" fail to
encrypt with: Command failed with code 22.


If you are going to encrypt an LV it will no longer be an LV but an 
(opened) LUKS container.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] lvm2 raid volumes

2016-08-15 Thread Xen

Heinz Mauelshagen schreef op 03-08-2016 15:10:


The Cyp%Sync field tells you about the resynchronization progress,
i.e. the initial mirroring of
all data blocks in a raid1/10 or the initial calculation and storing
of parity blocks in raid4/5/6.


Heinz, can I perhaps ask you here. If I can.

I have put a root volume on raid 1. Maybe "of course" the second disk 
(LVM volumes) are not available at system boot:


aug 15 14:09:19 xenpc2 kernel: device-mapper: raid: Loading target 
version 1.7.0
aug 15 14:09:19 xenpc2 kernel: device-mapper: raid: Failed to read 
superblock of device at position 1
aug 15 14:09:19 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 
mirrors

aug 15 14:09:19 xenpc2 kernel: created bitmap (15 pages) for device mdX
aug 15 14:09:19 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 
pages, set 19642 of 30040 bits
aug 15 14:09:19 xenpc2 kernel: EXT4-fs (dm-6): mounted filesystem with 
ordered data mode. Opts: (null)



This could be because I am using PV directly on disk (no partition 
table) for *some* volumes (actually the first disk, that is booted 
from), however, I force a start of LVM2 service by enabling it in 
SystemD:


aug 15 14:09:19 xenpc2 systemd[1]: Starting LVM2...

This is further down the log, so LVM is actually started after the RAID 
is loading.


At that point normally, from my experience, only the root LV is 
available.


Then at a certain point more devices become available:

aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/mapper/msata-boot.
aug 15 14:09:22 xenpc2 systemd[1]: Started LVM2.

aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/tmp.
aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/swap.
aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/var.

But just before that happens, there are some more RAID1 errors:

aug 15 14:09:22 xenpc2 kernel: device-mapper: raid: Failed to read 
superblock of device at position 1
aug 15 14:09:22 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 
mirrors

aug 15 14:09:22 xenpc2 kernel: created bitmap (1 pages) for device mdX
aug 15 14:09:22 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 
pages, set 320 of 480 bits
aug 15 14:09:22 xenpc2 kernel: device-mapper: raid: Failed to read 
superblock of device at position 1
aug 15 14:09:22 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 
mirrors

aug 15 14:09:22 xenpc2 kernel: created bitmap (15 pages) for device mdX
aug 15 14:09:22 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 
pages, set 19642 of 30040 bits


Well small wonder if the device isn't there yet. There are no messages 
for it, but I will assume the mirror LVs came online at the same time as 
the other "raid" volume group LVs, which means the RAID errors preceded 
that.


Hence, no secondary mirror volumes available, cannot start the raid, 
right.


However after logging in, the Cpy%Sync behaviour seems normal:

  boot msata  rwi-aor--- 240,00m
100,00
  root msata  rwi-aor---  14,67g
100,00


Devices are shown as:

  boot msata rwi-aor--- 240,00m
100,00   boot_rimage_0(0),boot_rimage_1(0)
  root msata rwi-aor---  14,67g
100,00   root_rimage_0(0),root_rimage_1(0)


dmsetup table seems normal:

# dmsetup table | grep msata | sort
coll-msata--lv: 0 60620800 linear 8:36 2048
msata-boot: 0 491520 raid raid1 3 0 region_size 1024 2 252:14 252:15 - -
msata-boot_rimage_0: 0 491520 linear 8:16 4096
msata-boot_rimage_1: 0 491520 linear 252:12 10240
msata-boot_rimage_1-missing_0_0: 0 491520 error
msata-boot_rmeta_0: 0 8192 linear 8:16 495616
msata-boot_rmeta_1: 0 8192 linear 252:12 2048
msata-boot_rmeta_1-missing_0_0: 0 8192 error
msata-root: 0 30760960 raid raid1 3 0 region_size 1024 2 252:0 252:1 - -
msata-root_rimage_0: 0 30760960 linear 8:16 512000
msata-root_rimage_1: 0 30760960 linear 252:12 509952
msata-root_rimage_1-missing_0_0: 0 30760960 error
msata-root_rmeta_0: 0 8192 linear 8:16 503808
msata-root_rmeta_1: 0 8192 linear 252:12 501760
msata-root_rmeta_1-missing_0_0: 0 8192 error

But actually it's not because it should reference 4 devices, not two. 
Apologies.


It only references the volumes of the first disk (image and meta).

E.g. 252:0 and 252:1 are:

lrwxrwxrwx 1 root root   7 aug 15 14:09 msata-root_rmeta_0 -> 
../dm-0
lrwxrwxrwx 1 root root   7 aug 15 14:09 msata-root_rimage_0 -> 
../dm-1


Whereas the volumes from the other disk are:

lrwxrwxrwx 1 root root   7 aug 15 14:09 msata-root_rmeta_1 -> 
../dm-3
lrwxrwxrwx 1 root root   7 aug 15 14:09 msata-root_rimage_1 -> 
../dm-5


If I dismount /boot, lvchange -an msata/boot, lvchange -ay msata/boot, 
it loads correctly:


aug 15 14:56:23 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 
mirrors

aug 15 14:56:23 xenpc2 kernel: created bitmap (1 pages) for device mdX
aug 15 14:56:23 xenpc2 kernel: mdX: bitmap initialized 

[linux-lvm] sata cable disconnect + hotplug after

2016-08-09 Thread Xen
What am I supposed to do when a sata cable disconnects and reconnects as 
another device?


I had a disk at probably /dev/sda.

At a certain point the filesystem had become read-only. I realized the 
cable must have disconnected and after fixing it the same device was now 
at /dev/sdf.


Now the device had gone missing from the system but it would not refind 
it.


# pvscan
  WARNING: Device for PV fEGbBn-tbIp-rL7y-m22b-1rQh-r9i5-Qwlqz7 not 
found or rejected by a filter.
  PV unknown deviceVG xenpc1  lvm2 [600,00 GiB / 
158,51 GiB free]


pvck clearly found it and lvmdiskscan also found it. Nothing happened 
until I did pvscan --cache /dev/sdf:


# pvscan --cache /dev/sdf
# vgscan
  Reading all physical volumes.  This may take a while...
  Duplicate of PV fEGbBn-tbIp-rL7y-m22b-1rQh-r9i5-Qwlqz7 dev /dev/sdf 
exists on unknown device 8:0

  Found volume group "xenpc1" using metadata type lvm2

Now I was able to activate it again and it was no longer flagged as 
partial (but now it is duplicate).


The unknown device 8:0 is clearly going to be /dev/sda, which is no 
longer there.


How can I dump this reference to 8:0 or should something else be done?

Oh right pvscan --cache without a parameter

I wonder if I can run this while the thing is still activated

I was beginning to think there'd be some hidden filter rule, but it was 
"just" the cache.


Should this thing be automatically resolved? Is running pvscan --cache 
enough?


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] export/migrate - but only a LV - how?

2016-07-22 Thread Xen

Brassow Jonathan schreef op 22-07-2016 15:12:


One advantage is that pvmove allows you to keep the device online...


Right, that is pretty advanced. Good going.



We are also working on a feature ATM called “duplicate”.  It allows
you to create a duplicate of any LV stack, create the duplicate where
you want, and even change the segment type in the process.  For
example, you could move a whole RAID5 LV, or thin-p, in one go.


I think one important "trouble spot" is that there is no operation 
yet(?) to create a duplicate of a PV that will not ruin the system 
unless you do pvchange -u and vgchange -u ;-).


It took quite a bit of time for me to realize what was going on ;-).

Is PVMOVE supposed to be a backup task? I don't think so? How are you 
supposed to back something up if you are planning to move your system to 
a new disk?


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] export/migrate - but only a LV - how?

2016-07-18 Thread Xen

Brassow Jonathan schreef op 18-07-2016 17:50:

maybe pvmove the LV to a unique device and then vgsplit?

 brassow


On Jul 12, 2016, at 10:23 AM, lejeczek  wrote:

.. if possible?



Shouldn't you in general just recreate the LV with the same amount of 
extents and then perform a DD?


I realize an atomic move operation for an LV could conceptually be nice 
but apart from the mental effort required to do this recreation 
manually, there is not much practically in the way of doing it yourself.


In the end, it would be nothing more than a shell script doing an 
LVCREATE, DD, and LVREMOVE?


Regards.

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Copying a raw disk image to LVM2

2016-07-11 Thread Xen

emmanuel segura schreef op 11-07-2016 0:02:


the lvm metadata is stored in the begining of the physical volume, the
logical volume is simple block device, so using dd you don't overwrite
any lvm header.


I must say I did experience something weird when copying a LUKS 
partition (encrypted).


Apparently LUKS stores information about the device it is on (or was 
one) because I coudln't get it to re-adjust to a larger volume.


cryptsetup resize is supposed to resize to the size of the underlying 
block device.


I had to recreate my LUKS container before it would recognise the new 
size.


E.g. I copied from 2GB volume to 3GB volume using dd.

LUKS kept thinking it was still on a 2GB volume.

So resize didn't work and I could manually resize it to 3GB but 
automatic resize would resize it back to 2GB.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?

2016-05-18 Thread Xen

matthew patton schreef op 18-05-2016 6:57:


Just want to say your belligerent emails are ending up in the trash can. 
Not automatically, but after scanning, mostly.


At the same time perhaps it is worth noting that although all other 
emails from this list end up in my main email box just fine, except that 
yours (and yours alone) trigger the inbred spamfilter of my email 
provider, even though I have never trained it to spam your emails.


Basically, each and every time I will find your messages in my spam box. 
Makes you think, eh? But then, just for good measure, let me just 
concisely respond to this one:




For the FS to "know" which of it's blocks can be scribbled
on and which can't means it has to constantly poll the block layer
(the next layer down may NOT necessarily be LVM) on every write.
Goodbye performance.


Simply false and I explained already that given that the filesystem is 
already getting optimized for alignment with (possible) "thin" blocks 
(Zdenek has mentioned this) in order to more efficiently allocate (cause 
allocation) on the underlying layer, if it already has knowledge about 
this alignment, and it has knowledge about its own block usage, meaning 
that it can easily discover which of the "alignment" blocks it has 
already written to itself, then it has all the data and all the 
knowledge to know which blocks (extents) are completely "free". 
Supposing you had a 4KB blockmap (bitmap).


Now supposing you have 4MB extents.

Then every 10 bits in the blockmap corresponds to one bit in the extent 
map. You know this.


To condense the free blockmap into a free extent map:

(bit "0" is free, bit "1" is in use):

For every extent:

blockmap_segment = blockmap & (1023 << (extent number * 1024);
is_an_empty_extent = blockmap_segment > 0;

So it knows clearly which extents are empty.

Then it can simply be told not to write to those extents anymore.

If the filesystem is already using discards (mount option) then in 
practice those extents will also be unallocated by thin LVM.


So the filesystem knows which blocks (extents) will cause allocation, if 
it knows it is sitting on a thin device like that.




 However, it does mean the filesystem must know the 'hidden geometry'
 beneath its own blocks, so that it can know about stuff that won't 
work

 anymore.


I'm pretty sure this was explained to you a couple weeks ago: it's
called "integration".


You dumb faced idiot. You know full well this information is already 
there. What are you trying to do here? Send me into the woods again?


For a long time harddisks have shed their geometry data onto us.

And filesystems can be created with geometry information (of a certain 
kind) in mind. Yes, these are creation flags.


But extent alignment is also a creation flag. The extent alignment, or 
block size, does not change over time all of a sudden. Not that it 
should matter that much principially. But this information can simply be 
had. It is no different that knowing the size of the block device to 
begin with.


If the creation tools would be LVM-aware (they don't have to be) the 
administrator could easily SET these parameters without any interaction 
with the block layer itself. They can already do this for flags such as:


stride=stride-size
Configure the filesystem for a RAID array with stride-size
filesystem blocks. This is the number of blocks read or written
to disk before moving to next disk.  This mostly affects placement
of filesystem metadata like bitmaps at mke2fs(2) time to avoid
placing them on a single disk, which can hurt the performance.
It may also be used by block allocator.

stripe_width=stripe-width
Configure the filesystem for a RAID array with stripe-width
filesystem blocks per stripe. This is typically be stride-size * N,
where N is the number of data disks in the RAID (e.g. RAID 5 N+1,
RAID 6 N+2).  This allows the block allocator to prevent
read-modify-write of the parity in a RAID stripe if possible when
the data is written.

And LVM extent size is not going to be any different. Zdenek explained 
earlier:


However what is being implemented is better 'allocation' logic for pool 
chunk provisioning (for XFS ATM)  - as rather 'dated' methods for 
deciding where to store incoming data do not apply with provisioned 
chunks efficiently.


i.e.  it's inefficient to  provision  1M thin-pool chunks and then 
filesystem

uses just 1/2 of this provisioned chunk and allocates next one.
The smaller the chunk is the better space efficiency gets (and need 
with snapshot), but may need lots of metadata and may cause 
fragmentation troubles.


Geometry data has always been part of block device drivers and I am 
sorry I cannot do better at this point (finding the required information 
on code interfaces is hard):


struct hd_geometry {
unsigned char heads;
unsigned char sectors;
unsigned short cylinders;
unsigned long start;
};

Block devices also register 

Re: [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)

2016-05-10 Thread Xen

Hey sweet Linda,

this is beyond me at the moment. You go very far with this.

Linda A. Walsh schreef op 10-05-2016 23:47:


   Isn't using a thin memory pool for disk space similar to using
a virtual memory/swap space that is smaller than the combined sizes of 
all

processes?


I think there is a point to that, but for me the concordance is in the 
idea that filesystems should perhaps have different modes of requesting 
memory (space) as you detail below.


Virtual memory typically cannot be expanded (automatically) although you 
could.


Even with virtual memory there is normally a hard limit, and unless you 
include shared memory, there is not really any relation with 
overprovisioned space, unless you started talking about prior allotment, 
and promises being given to processes (programs) that a certain amount 
of (disk) space is going to be available when it is needed.


So what you are talking about here I think is expectation and 
reservation.


A process or application claims a certain amount of space in advance. 
The system agrees to it. Maybe the total amount of claimed space is 
greater than what is available.


Now processes (through the filesystem) are notified whether the space 
they have reserved is actually going be there, or whether they need to 
wait for that "robot cartridge retrieval system" and whether they want 
to wait or will quit.


They knew they needed space and they reserved it in advance. The system 
had a way of knowing whether the promises could be met and the requests 
could be met.


So the concept that keeps recurring here seems to be reservation of 
space in advance.


That seems to be the holy grail now.

Now I don't know but I assume you could develop a good model for this 
like you are trying here.


Sparse files are difficult for me, I have never used them.

I assume they could be considered sparse by nature and not likely to 
fill up.


Filling up is of the same nature as expanding.

The space they require is virtual space, their real space is the 
condensed space they actually take up.


It is a different concept. You really need two measures for reporting on 
these files: real and virtual.


So your filesystem might have 20G real space.
Your sparse file is the only file. It uses 10G actual space.
Its virtual file size is 2T.

Free space is reported as 10G.

Used space is given two measures: actual used space, and virtual used 
space.


The question is how you store these. I think you should store them 
condensed.


As such only the condensed blocks are given to the underlying block 
layer / LVM.


I doubt you would want to create a virtual space from LVM such that your 
sparse files can use a huge filesystem in a non-condensed state sitting 
on that virtual space?


But you can?

Then the filesystem doesn't need to maintain blocklists or whatever, but 
keep in mind that normally a filesystem will take up a lot of space in 
inode structres and the like, when the filesystem is huge but the actual 
volume is not.


If you create one thin pool, and a bunch of filesystems (thin volumes) 
of the same size, with default parameters, your entire thin pool will 
quickly fill up with just metadata structures.


I don't know. I feel that sparse files are weird anyway, but if you use 
them, you'd want them to be condensed in the first place and existing in 
a sort of mapped state where virtual blocks are mapped to actual blocks. 
That doesn't need to be LVM and would feel odd there. That's not its 
purpose right.


So for sparse you need a mapping at some point but I wouldn't abuse LVM 
for that primarily. I would say that is 80% filesystem and 20% LVM, or 
maybe even 60% custom system, 20% filesystem and 20% LVM.


Many games pack their own filesystems, like we talked about earlier 
(when you discussed inefficiency of many small files in relation to 4k 
block sizes).


If I really wanted sparse personally, as an application data storage 
model, I would first develop this model myself. I would probably want to 
map it myself. Maybe I'd want a custom filesystem for that. Maybe a 
loopback mounted custom filesystem, provided that its actual block file 
could grow.


I would imagine allocating containers for it, and I would want the 
"real" filesystem to expand my containers or to create new instances of 
them. So instead of mapping my sectors directly, I would want to map 
them myself first, in a tiered system, and the filesystem to map the 
higher hierarchy level for me. E.g. I might have containers of 10G each 
allocated in advance, and when I need more, the filesystem allocates 
another one. So I map the virtual sectors to another virtual space, such 
that my containers


container virtual space / container size = outer container addressing
container virtual space % container size = inner container addressing

outer container addressing goes to filesystem structure telling me (or 
it) where to write my data to.


inner container addressing follows normal procedure, and 

[linux-lvm] LVM but no real partitions

2016-05-06 Thread Xen
Did people ever consider whether it was worth being able to start a PV 
on a raw physical device with a small offset to reserve room for a boot 
sector?


If you could create a PV on a bootdisk with a 2048 sector offset, you 
would have a bootable device without partition tables and only LVM.


The same applies to LUKS. If you could put the LUKS header at 2048 
sectors, you could boot a disk with nothing but a luks container, since 
grub2 understands LUKS.


I guess such a thing would be extremely easy to achieve, it just 
wouldn't be portable.


Since you may need to recompile both LUKS and Grub, and possibly even 
dm-crypt.


The total number of changes would probably not be more than 10 lines. 
Not sure.


Anyway.

Anyone ever fancied a LVM system without real partitions?

___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] thin handling of available space

2016-05-04 Thread Xen

Mark Mielke schreef op 04-05-2016 3:25:


Thanks for entertaining this discussion, Matthew and Zdenek. I realize
this is an open source project, with passionate and smart people,
whose time is precious. I don't feel I have the capability of really
contributing code changes at this time, and I'm satisfied that the
ideas are being considered even if they ultimately don't get adopted.
Even the mandatory warning about snapshots exceeding the volume group
size is something I can continue to deal with using scripting and
filtering. I mostly want to make sure that my perspective is known and
understood.


You know, you really don't need to be this apologetic even if I mess up 
my own replies ;-).


I think you have a right and a reason to say what you've said, and 
that's it.


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/


Re: [linux-lvm] thin handling of available space

2016-04-27 Thread Xen

matthew patton schreef op 27-04-2016 12:26:

It is not the OS' responsibility to coddle stupid sysadmins. If you're
not watching for high-water marks in FS growth vis a vis the
underlying, you're not doing your job. If there was anything more than
the remotest chance that the FS would grow to full size it should not
have been thin in the first place.


Who says the only ones who would ever use or consider using thin would 
be sysadmins.?


Monitoring Linux is troublesome enough for most people and it really is 
a "job".


You seem to be intent on making the job harder rather than easy so you 
can be a type of person that has this expert knowledge while others 
don't?


I remember a reason to crack down on sysadmins was that they didn't know 
how to use "vi" - if you can't use fucking vi, you're not a sysadmin. 
This actually is a bloated version of what a system administrator is or 
could at all times be expected to do, because you are ensuring that 
problems are going to surface one way or another when this sysadmin is 
suddenly no longer capable of being this perfect guy at 100% of times.


You are basically ensuring disaster by having that attitude.

That guy that can battle against all odds and still prevail ;-).

More to the point.

No one is getting cuddled because Linux is hard enough and it is usually 
the users who are getting cuddled; strangely enough the attitude exists 
that the average desktop user never needs to look under the hood. If 
something is ugly, who cares, the "average user" doesn't go there.


The average user is oblivious to all system internals.

The system administrator knows everything and can launch a space rocket 
with nothing more than matches and some gallons of rocket fuel.


;-).


The autoextend mechanism is designed to prevent calamity when the 
filesystem(s) grow to full size. By your reasoning , it should not exist 
because it cuddles admins.


A real admin would extend manually.

A real admin would specify the right size in advance.

A real admin would use thin pools of thin pools that expand beyond your 
wildest dreams :p.


But on a more serious note, if there is no chance a file system will 
grow to full size, then it doesn't need to be that big.


But there are more use cases for thin than hosting VMs for clients.

Also I believe thin pools have a use for desktop systems as well, when 
you see that the only alternative really is btrfs and some distros are 
going with it full-time. Btrfs also has thin provisioning in a sense but 
on a different layer, which is why I don't like it.


Thin pools from my perspective are the only valid snapshotting mechanism 
if you don't use btrfs or zfs or something of the kind.


Even a simple desktop monitor, some applet with configured thin pool 
data, would of course alleviate a lot of the problems for a "casual 
desktop user". If you remotely administer your system with VNC or the 
like, that's the same. So I am saying there is no single use case for 
thin, and.


Your response mr. patton falls along the lines of "I only want this to 
be used by my kind of people".


"Don't turn it into something everyone or anyone can use".

"Please let it be something special and nichie".

You can read coddle in place of cuddle.



It seems to me pretty clear to me that a system that *requires* manual 
intervention and monitoring at all times is not a good system, 
particularly if the feedback on its current state cannot be retrieved 
from, or is usable by, other existing systems that guard against more or 
less the same type of things.


Besides, if your arguments here were valid, then 
https://bugzilla.redhat.com/show_bug.cgi?id=1189215 would never have 
existed.





The FS already has a notion of 'reserved'. man(1) tune2fs -r


Alright thanks. But those blocks are manually reserved for a specific 
user.


That's what they are for. It is for -u. These blocks are still available 
to the filesystem.


You could call it calamity prevention as well. There will always be a 
certain amount of space for say the root user.


and by the same measure you can also say the tmpfs overflow mechanism 
for /tmp is not required either because a real admin would not see his 
rootfs go out of diskspace.


Stuff happens. You ensure you are prepared when it does. Not stick your 
head in the sand and claim that real gurus never encounter those 
situations.


The real question you should be asking is if it increases the monitoring 
aspect (enhances it) if thin pool data is seen through the lens of the 
filesystems as well.


Or whether that is going to be a detriment.

Regards.



Erratum:

https://utcc.utoronto.ca/~cks/space/blog/tech/SocialProblemsMatter

There is a widespread attitude among computer people that it is a great 
pity that their beautiful solutions to difficult technical challenges 
are being prevented from working merely by some pesky social issues 
[read: human flaws], and that the problem is solved once the technical 
work is done. This 

[linux-lvm] 2 questions on LVM cache

2016-04-27 Thread Xen
1. Does LVM cache support discards of the underlying blocks (in the 
cache) when the filesystem discards the blocks?


I was reading https://lwn.net/Articles/293658/ which makes it clear that 
years ago kernel developers were introducing discard behaviour into 
Linux filesystems with respect to flash devices and their need to copy 
for wear leveling.


I know so little about it, but I have seen the "discard" flag mentioned 
so much with respect to SSDs, that I must assume these discards are 
there. Are LVM cache blocks discarded when the filesystem layer discards 
these blocks?


Where can I find this info?

In https://www.redhat.com/archives/linux-lvm/2016-April/msg00030.html I 
mentioned that such a discard feature would be necessary in order for a 
filesystem to communicate to a block device layer which blocks are in 
use, and for a block device layer to communicate back a set of available 
blocks if these dynamically change.





I forgot the other question lol.

I'm interested in this solution to 
"https://bugzilla.redhat.com/show_bug.cgi?id=1189215; but I will respond 
in my other email (LVM Thin: Handle out of space conditions better).


___
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/