Re: [linux-lvm] Cannot activate LVs in VG xxx while PVs appear on duplicate devices.
On Sat, 9 Jun 2018, Wolfgang Denk wrote: Any help how to fix/avoid this problem would be highly appreciated. You can try to create a global_filter in /etc/lvm/lvm.conf, I believe, and make sure it also ends up in your initramfs. After you filter out your /dev/sdx devices, they should no longer even be seen by pvscan, and can also prevent this, perhaps illogical (refusal to) activate. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine
Zdenek Kabelac schreef op 28-03-2018 0:17: Hi This is why users do open BZ if they would like to see some enhancement. Normally cache is integral part of a volume - so it's partially missing - whole volume is considered to be garbage. But in 'writethrough' mode there could be likely possible better recovery. Of course this case needs usability without --force. So please open RFE BZ for this case. It goes into the mess I usually get myself into; if you "dd copy" the disk containing the origin volume before uncaching it, and then go to some live session where you only have the new backup copy, but you want to clean up its LVM, then you now must fix the VGs in isolation of the cache; I suppose this is just the wrong order of doing things, but as part of a backup you don't really want to uncache first, as that requires more work to get it back to normal after. So you end up in a situation where the new origin copy has a reference to the cache disk --- all of this assumes writethrough mode --- and you need to clear that reference. However, you cannot, or should not, attach the cache disk again; it might get effected, and you don't want that, you want it to remain in its pristine state. Therefore, you are now left with the task of removing the cache from the VG, because you cannot actually run vgimportclone while the cache disk is missing. The obvious solution is to *also* clone the cache disk and then run operations on the combined set, but this might not be possible. Therefore, all that was left was: vgreduce --remove-missing --force cd /etc/lvm/archive cp /etc/lvm/backup/ cd /etc/lvm/backup vi " remove cache PV, and change origin to regular linear volume, and add " the visible tag vgcfgrestore # presto, origin is restored as regular volume without the cache vgimportclone -i # now have distinct volume group, VG UUID and PV UUID So the problem is making dd backups of origin, perhaps dd backups should be avoided, but for some purposes (such as system migration) file copies are just more work in general, and can complicate things as well, for instance if there are NTFS partitions or whatnot. And disk images can be nice to have, in any case. This was the use case basically. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine
Gang He schreef op 27-03-2018 7:55: I just reproduced a problem from the customer, since they did virtual disk migration from one virtual machine to another one. According to your comments, this does not look like a LVM code problem, the problem can be considered as LVM administer misoperation? Counterintuitively, you must remove the PV from the VG before you remove the (physical) disk from the system. Yes that is something you can often forget doing, but as it stands resolving the situation often becomes a lot harder when you do it in reverse. Ie. removing the disk first and then removing the PV from the VG is a lot harder. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Can't work normally after attaching disk volumes originally in a VG on another machine
Gang He schreef op 23-03-2018 9:30: 6) attach disk2 to VM2(tb0307-nd2), the vg on VM2 looks abnormal. tb0307-nd2:~ # pvs WARNING: Device for PV JJOL4H-kc0j-jyTD-LDwl-71FZ-dHKM-YoFtNV not found or rejected by a filter. PV VG Fmt Attr PSize PFree /dev/vdc vg2 lvm2 a-- 20.00g 20.00g /dev/vdd vg1 lvm2 a-- 20.00g 20.00g [unknown] vg1 lvm2 a-m 20.00g 20.00g This is normal because /dev/vdd contains metadata for vg1 which includes now missing disk /dev/vdc as the PV is no longer the same. tb0307-nd2:~ # vgs WARNING: Device for PV JJOL4H-kc0j-jyTD-LDwl-71FZ-dHKM-YoFtNV not found or rejected by a filter. VG #PV #LV #SN Attr VSize VFree vg1 2 0 0 wz-pn- 39.99g 39.99g vg2 1 0 0 wz--n- 20.00g 20.00g This is normal because you haven't removed /dev/vdc from vg1 on /dev/vdd, since it was detached while you operated on its vg. 7) reboot VM2, the result looks worse (vdc disk belongs to two vg). tb0307-nd2:/mnt/shared # pvs PV VG Fmt Attr PSize PFree /dev/vdc vg1 lvm2 a-- 20.00g 0 /dev/vdc vg2 lvm2 a-- 20.00g 10.00g /dev/vdd vg1 lvm2 a-- 20.00g 9.99g When you removed vdd when it was not attached, the VG1 metadata on vdd was not altered. The metadata resides on both disks, so you had inconsistent metadata between both disks because you operated on the shared volume group while one device was missing. You also did not recreate PV on /dev/vdc so it has the same UUID as when it was part of VG1, this is why VG1 when VDD is booted will still try to include /dev/vdc because it was never removed from the volume group on VDD. So the state of affairs is: /dev/vdc contains volume group info for VG2 and includes only /dev/vdc /dev/vdd contains volume group info for VG1, and includes both /dev/vdc and /dev/vdd by UUID for its PV, however, it is a bug that it should include /dev/vdc even though the VG UUID is now different (and the name as well). Regardless, from vdd's perspective /dev/vdc is still part of VG1. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Zdenek Kabelac schreef op 28-02-2018 22:43: It still depends - there is always some sort of 'race' - unless you are willing to 'give-up' too early to be always sure, considering there are technologies that may write many GB/s... That's why I think it is only possible for snapshots. You can use rootfs with thinp - it's very fast for testing i.e. upgrades and quickly revert back - just there should be enough free space. That's also possible with non-thin. Snapshot are using space - with hope that if you will 'really' need that space you either add this space to you system - or you drop snapshots. And I was saying back then that it would be quite easy to have a script that would drop bigger snapshots first (of larger volumes) given that those are most likely less important and more likely to prevent thin pool fillup, and you can save more smaller snapshots this way. So basically I mean this gives your snapshots a "quotum" that I was asking about. Lol now I remember. You could easily give (by script) every snapshot a quotum of 20% of full volume size, then when 90% thin target is reached, you start dropping volumes with the largest quotum first, or something. Idk, something more meaningful than that, but you get the idea. You can calculate the "own" blocks of the snapshot and when the pool is full you check for snapshots that have surpassed their quotum, and the ones that are past their quotas in the largest numbers you drop first. But as said - with today 'rush' of development and load of updates - user do want to try 'new disto upgrade' - if it works - all is fine - if it doesn't let's have a quick road back - so using thin volume for rootfs is pretty wanted case. But again, regular snapshot of sufficient size does the same thing, you just have to allocate for it in advance, but for root this is not really a problem. Then no more issue with thin-full problem. I agree, less convenient, and a slight bit slower, but not by much for this special use case. There are also some on going ideas/projects - one of them was to have thinLVs with priority to be always fully provisioned - so such thinLV could never be the one to have unprovisioned chunks That's what ZFS does... ;-). Other was a better integration of filesystem with 'provisioned' volumes. That's what I was talking about back then... ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Gionatan Danti schreef op 28-02-2018 20:07: To recap (Zdeneck, correct me if I am wrong): the main problem is that, on a full pool, async writes will more-or-less silenty fail (with errors shown on dmesg, but nothing more). Yes I know you were writing about that in the later emails. Another possible cause of problem is that, even on a full pool, *some* writes will complete correctly (the one on already allocated chunks). Idem. In the past was argued that putting the entire pool in read-only mode (where *all* writes fail, but read are permitted to complete) would be a better fail-safe mechanism; however, it was stated that no current dmtarget permit that. Right. Don't forget my main problem was system hangs due to older kernels, not the stuff you write about now. Two (good) solution where given, both relying on scripting (see "thin_command" option on lvm.conf): - fsfreeze on a nearly full pool (ie: >=98%); - replace the dmthinp target with the error target (using dmsetup). I really think that with the good scripting infrastructure currently built in lvm this is a more-or-less solved problem. I agree in practical terms. Doesn't make for good target design, but it's good enough, I guess. Do NOT take thin snapshot of your root filesystem so you will avoid thin-pool overprovisioning problem. But is someone *really* pushing thinp for root filesystem? I always used it for data partition only... Sure, rollback capability on root is nice, but it is on data which they are *really* important. No, Zdenek thought my system hangs resulted from something else and then in order to defend against that (being the fault of current DM design) he tried to raise the ante by claiming that root-on-thin would cause system failure anyway with a full pool. I never suggested root on thin. In stress testing, I never saw a system crash on a full thin pool That's good to know, I was just using Jessie and Xenial. We discussed that in the past also, but as snapshot volumes really are *regular*, writable volumes (which a 'k' flag to skip activation by default), the LVM team take the "safe" stance to not automatically drop any volume. Sure I guess any application logic would have to be programmed outside of any (device mapper module) anyway. The solution is to use scripting/thin_command with lvm tags. For example: - tag all snapshot with a "snap" tag; - when usage is dangerously high, drop all volumes with "snap" tag. Yes, now I remember. I was envisioning some other tag that would allow a quotum to be set for every volume (for example as a %) and the script would then drop the volumes with the larger quotas first (thus the larger snapshots) so as to protect smaller volumes which are probably more important and you can save more of them. I am ashared to admit I had forgotten about that completely ;-). Back to rule #1 - thin-p is about 'delaying' deliverance of real space. If you already have plan to never deliver promised space - you need to live with consequences I am not sure to 100% agree on that. When Zdenek says "thin-p" he might mean "thin-pool" but not generally "thin-provisioning". I mean to say that the very special use case of an always auto-expanding system is a special use case of thin provisioning in general. And I would agree, of course, that the other uses are also legit. Thinp is not only about "delaying" space provisioning; it clearly is also (mostly?) about fast, modern, usable snapshots. Docker, snapper, stratis, etc. all use thinp mainly for its fast, efficent snapshot capability. Thank you for bringing that in. Denying that is not so useful and led to "overwarning" (ie: when snapshotting a volume on a virtually-fillable thin pool). Aye. !SNAPSHOTS ARE NOT BACKUPS! Snapshot are not backups, as they do not protect from hardware problems (and denying that would be lame) I was really saying that I was using them to run backups off of. however, they are an invaluable *part* of a successfull backup strategy. Having multiple rollaback target, even on the same machine, is a very usefull tool. Even more you can backup running systems, but I thought that would be obvious. Again, I don't understand by we are speaking about system crashes. On root *not* using thinp, I never saw a system crash due to full data pool. I had it on 3.18 and 4.4, that's all. Oh, and I use thinp on RHEL/CentOS only (Debian/Ubuntu backports are way too limited). That could be it too. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
I did not rewrite this entire message, please excuse the parts where I am a little more "on the attack". Zdenek Kabelac schreef op 28-02-2018 10:26: I'll probably repeat my self again, but thin provision can't be responsible for all kernel failures. There is no way DM team can fix all the related paths on this road. Are you saying there are kernel bugs presently? If you don't plan to help resolving those issue - there is not point in complaining over and over again - we are already well aware of this issues... I'm not aware of any issues, what are they? I was responding here to an earlier thread I couldn't respond to back then, the topic was whether it was possible to limit thin snapshot sizes, you said it wasn't, I was just recapping this thread. If the admin can't stand failing system, he can't use thin-p. That just sounds like a blanket excuse for any kind of failure. Overprovisioning on DEVICE level simply IS NOT equivalent to full filesystem like you would like to see all the time here and you've been already many times explained that filesystems are simply not there ready - fixes are on going but it will take its time and it's really pointless to exercise this on 2-3 year old kernels... Pardon me, but your position has typically been that it is fundamentally impossible, not that "we're not there yet". My questions have always been about fundamental possibilities, to which you always answer in the negative. If something is fundamentally impossible, don't be surprised if you then don't get any help in getting there: you always close off all paths leading towards it. You shut off any interest, any discussion, and any development interest in paths that a long time later, you then say "we're working on it" whereas before you always said "it's impossible". This happened before where first you say "It's not a problem, it's admin error" and then a year later you say "Oh yeah, it's fixed now". Which is it? My interest has always been, at least philosophically, or concerning principle abilities, in development and design, but you shut it off saying it's impossible. Now you complain you are not getting any help. Thin provisioning has it's use case and it expects admin is well aware of possible problems. That's a blanket statement once more that says nothing about actual possibilities or impossibilities. If you are aiming for a magic box working always right - stay away from thin-p - the best advice Another blanket statement excusing any and all mistakes or errors or failures the system could ever have. Do NOT take thin snapshot of your root filesystem so you will avoid thin-pool overprovisioning problem. Zdenek, could you please make up your mind? You brought up thin snapshotting as a reason for putting root on thin, as a way of saying that thin failure would lead to system failure and not just application failure, whereas I maintained that application failure was acceptable. I tried to make the distinction between application level failure (due to filesystem errors) and system instability caused by thin. You then tried to make those equivalent by saying that you can also put root on thin, in which case application failure becomes system failure. I never wanted root on thin, so don't tell me not to snapshot it, that was your idea. Rule #1: Thin-pool was never targeted for 'regular' usage of full thin-pool. All you are asked is to design for error conditions. You want only to take care of the special use case where nothing bad happens. Why not just take care of the general use case where bad things can happen? You know, real life? In any development process you first don't take care of all error conditions, you just can't be bothered with them yet. Eventually, you do. It seems you are trying to avoid having to deal with the glaring error conditions that have always existed, but you are trying to avoid having to take any responsibility for it by saying that it was not part of the design. To make this more clear Zdenek, your implementation does not cater to the general use case of thin provisioning, but only to the special use case where full thin pools never happen. That's a glaring omission in any design. You can go on and on on how thin-p was not "targetted" at that "use case", but that's like saying you built a car engine that was not "targetted" at "running out of fuel". Then when the engine breaks down you say it's the user's fault. Maybe retarget your design? Running out of fuel is not a use case. It's a failure condition that you have to design for. Full thin-pool is serious ERROR condition with bad/ill effects on systems. Yes and your job as a systems designer is to design for those error conditions and make sure they are handled gracefully. You just default on your responsibility there. The reason you brought up root on thin was to elevate application failure to the level of
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Zdenek Kabelac schreef op 24-04-2017 23:59: I'm just currious - what the you think will happen when you have root_LV as thin LV and thin pool runs out of space - so 'root_LV' is replaced with 'error' target. Why do you suppose Root LV is on thin? Why not just stick to the common scenario when thin is used for extra volumes or data? I mean to say that you are raising an exceptional situation as an argument against something that I would consider quite common, which doesn't quite work that way: you can't prove that most people would not want something by raising something most people wouldn't use. I mean to say let's just look at the most common denominator here. Root LV on thin is not that. Well then you might be surprised - there are user using exactly this. I am sorry, this is a long time ago. I was concerned with thin full behaviour and I guess I was concerned with being able to limit thin snapshot sizes. I said that application failure was acceptable, but system failure not. Then you brought up root on thin as a way of "upping the ante". I contended that this is a bigger problem to tackle, but it shouldn't mean you shouldn't tackle the smaller problems. (The smaller problem being data volumes). Even if root is on thin and you are using it for snapshotting, it would be extremely unwise to overprovision such a thing or to depend on "additional space" being added by the admin; root filesystems are not meant to be expandable. If on the other hand you do count on overprovisioning (due to snapshots) then being able to limit snapshot size becomes even more important. When you have rootLV on thinLV - you could easily snapshot it before doing any upgrade and revert back in case something fails on upgrade. See also projects like snapper... True enough, but if you risk filling your pool because you don't have full room for a full snapshot, that would be extremely unwise. I'm also not sure write performance for a single snapshot is very much different between thin and non-thin? They are both CoW. E.g. you write to an existing block it has to be duplicated, only for non-allocated writes thin is faster, right? I simply cannot reconcile an attitude that thin-full-risk is acceptable and the admin's job while at the same time advocating it for root filesystems. Now most of this thread I was under the impression that "SYSTEM HANGS" where the norm because that's the only thing I ever experienced (kernel 3.x and kernel 4.4 back then), however you said that this was fixed in later kernels. So given that, some of the disagreement here was void as apparently no one advocated that these hangs were acceptable ;-). :). I have tried it, yes. Gives troubles with Grub and requires thin package to be installed on all systems and makes it harder to install a system too. lvm2 is cooking some better boot support atm Grub-probe couldn't find the root volume so I had to maintain my own grub.cfg. Regardless if I ever used this again I would take care to never overprovision or to only overprovision at low risk with respect to snapshots. Ie. you could thin provision root + var or something similar but I would always put data volumes (home etc) elsewhere. Ie. not share the same pool. Currently I was using a regular snapshot but I allocated it too small and it always got dropped much faster than I anticipated. (A 1GB snapshot constantly filling up with even minor upgrade operations). Thin root LV is not the idea for most people. So again, don't you think having data volumes produce errors is not preferable to having the entire system hang? Not sure why you insist system hangs. If system hangs - and you have recent kernel & lvm2 - you should fill bug. If you set '--errorwhenfull y' - it should instantly fail. There should not be any hanging.. Right well Debian Jessie and Ubuntu Xenial just experienced that. That's irrelevant; if the thin pool is full you need to mitigate it, rebooting won't help with that. well it's really admins task to solve the problem after panic call. (adding new space). That's a lot easier if your root filesystem doesn't lock up. ;-). Good luck booting to some rescue environment on a VPS or with some boot stick on a PC; the Ubuntu rescue environment for instance has been abysmal since SystemD. You can't actually use the rescue environment because there is some weird interaction with systemd spewing messages and causing weird behaviour on the TTY you are supposed to work on. Initrd yes, but not the "full rescue" systemd target, doesn't work. My point with this thread was. When my root snapshot fills up and gets dropped, I lose my undo history, but at least my root filesystem won't lock up. I just calculated the size too small and I am sure I can also put a snapshot IN a thin pool for a non-thin root volume? Haven't tried. However, I don't have the space for a full copy of every
Re: [linux-lvm] Saying goodbye to LVM
Gionatan Danti schreef op 07-02-2018 22:19: LVM just has conceptual problems. As a CentOS user, I *never* encountered such problems. I really think these are caused by the lack of proper integration testing from Debian/Ubuntu. That would only apply to udev/boot problems, not the tooling issues. If you never make DD copies, you never run into such issues. And if you don't use Cache you won't have those missing PV issues either. Maybe I am just great at finding missing features but LVM has in the end cost me a lot more time than it has saved me. I mean, if I had just stuck to regular partitions I would have been further ahead in life by now ;-). Including any lack of LVM expertise I would have had by then. Which, in the end, I don't think is worth it. But hey - all key LVM developers are RedHat people, so it should be expected (for the better/worse). The denialist nature of Linux people ensures that even if LVM upstream says UPGRADE, Ubuntu will say "why? everything works fine for me". Or, "I never ran into such issues" ;-). True. I never use it with boot device. Even on Solaris it is limited, for example the root pool cannot have an external log device (that means SLOG). Then, you have no clue if this is also going to be the case on Linux or not ;-). And Grub supports booting from a root dataset but only barely, I don't think anything else (e.g. a ZVOL) is any kind of realism. The biggest downside is inflexibility in shrinking pools, and people complain about ZVOL snapshots requiring a lot of space. Btrfs, on the other hand, supports removing disks from raid sets and just reorganizing what's left. LVM and XFS are, on the other hand, extremely well integrated into mainline kernel/userspace utilities. Except that apparently there are (or were, or can be) extreme initramfs/udev issues and the Ubuntu support/integration has been flimsy at best -- what's not flimsy is Grub support, it will even load an embedded LVM just fine. I mean you can have an LV on a PV that is an LV on a PV and Grub will be able to read it, the Ubuntu initramfs will not. Hence my great interest in stratis... I don't deny you there but I wonder if I'm not better off sticking to ordinary partitions ;-). But my main idea is to use compressed ZVOLs if I can. You can just stick partition tables on those too. ZFS has a lot of different "models". ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] pvscan hung in LVM 160
I have a small question if I may ask. I upgraded one of my systems to LVM 160, I know that's still rather old. During boot, pvscan hangs, I don't know if this is because udev is terminated. The pvscan --background never completes. If I remove --background, pvscan instantly exits with code 3. Consequently, none of my volumes are activated save what comes down to the root device, because it is activated explicitly, this is Ubuntu Xenial basically. The thing only started happening when I upgraded LVM to a newer version. Though pvscan exits with 3, at least my system doesn't take 6 minutes to boot now. My only question is: was this a known bug in 160 and has this since been fixed, and if so, in what version? Some people on Arch investigated this bug when 160 was current (around 2014) and for them the bug was caused by udev exiting before pvscan could finish. The basic boot delays are caused by udevadm settle calls. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reattach cache
matthew patton schreef op 22-11-2017 10:16: by definition when you detach a cache it is now entirely invalid and will (should) be treated as empty. Yeah but it wasn't. Within seconds the rootfs was readonly and because I was running some apt thing at the same time, now a large bunch of packages became corrupted in the apt index (dpkg index). A bunch are still missing md5 files but I am on limited bandwidth so not reinstalling. So I assumed the above that you write. But it bit me once more. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reattach cache
Zdenek Kabelac schreef op 22-11-2017 10:57: In your case - just destroy the cache (--uncache) and do not try to reuse cache-pool unless you really know what you are doing. But I still don't know how to clean the metadata manually. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
matthew patton schreef op 23-10-2017 21:02: On Mon, 10/23/17, John Stoffelwrote: SSD pathologies aside, why are we concerned about the cache layer on a streaming read? By definition the cache shouldn't be involved at all. Because whatever purpose you are using it for, it shouldn't OOM the system. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
lejeczek schreef op 20-10-2017 16:20: I would - if bigger part of a storage subsystem resides in the hardware - stick to the hardware, use CacheCade, let the hardware do the lot. In other words -- keep it simple (smart person) ;-). Complicatedness is really the biggest reason for failure everywhere ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
Oleg Cherkasov schreef op 20-10-2017 10:21: On 19. okt. 2017 20:13, Xen wrote: The main cause was a way too slow SSD but at the same time... that sorta thing still shouldn't happen, locking up the entire system. I haven't had a chance to try again with a faster SSD. I have double checked with MegaRAID/CLI and all disks on that rig (including SSD ones of course) are SAS 6Gb/s both devices and links. My first thought about those SSDs was that those are slower than RAID5 however it seems not the case. Could it be TRIMing issue because those are from 2012? You mean that the SATA version is too low to interleave TRIMs with data access? Because I think that was the case with my mSata SSD. I don't currently remember the sata version that allowed interleaving but that SSD didn't reach or have it. After trimming performance would go up greatly. So I don't know about SAS but it might be similar right. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
matthew patton schreef op 20-10-2017 2:12: It is just a backup server, Then caching is pointless. That's irrelevant and not up to another person to decide. Furthermore any half-wit caching solution can detect streaming read/write and will deliberately bypass the cache. The problem was not performance, it was stability. Furthermore DD has never been a useful benchmark for anything. And if you're not using 'odirect' it's even more pointless. Performance was not the issue, stability was. Server has 2x SSD drives by 256Gb each and for purposes of 'cache' should be individual VD and not waste capacity on RAID1. Is probably also going to be quite irrelevant to the problem at hand. 10x 3Tb drives. In addition there are two MD1200 disk arrays attached with 12x 4Tb disks each. All Raid5 for this size footprint is NUTs. Raid6 is the bare minimum. That's also irrelevant to the problem at hand. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
John Stoffel schreef op 19-10-2017 23:14: And RHEL7.4/CentOS 7 is all based on kernel 3.14 (I think) with lots of RedHat specific backports. So knowing the full details will only help us provide help to him. Alright I missed that, sorry. Still given that a Red Hat developer has stated awareness about the problem that means that other than the kernel it isn't likely that individual config is going to play a big role. Also it is likely that anyone in the position to really help would already recognise the problems. I just mean to say that it is going to need a developer and is not very likely that individual config is at fault. Although a different kernel would see different behaviour, you're right about that, my apologies. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
John Stoffel schreef op 19-10-2017 21:09: How did you setup your LVM config and your cache config? Did you mirror the two SSDs using MD He said he used hardware RAID to mirror the devices. I ask because I'm running lvcache at home on my main file/kvm server and I've never seen this problem. But! I suspect you're running a much older kernel, lvm config, etc. lvm2-2.02.171-8.el7.x86_64 CentOS 7.4 was released a month ago. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] cache on SSD makes system unresponsive
Oleg Cherkasov schreef op 19-10-2017 19:54: Any ideas what may be wrong? All I know myself in the past have tried to cache an embedded encrypted LVM in a regular home system. The problem was probably caused by the SSD not clearing write caches fast enough but I too got some 2 minute "hanging process" outputs on the console. So it was probably a queueing issue within the kernel and might not have been related to the cache, but I'm still not sure if there wasn't an interplay at work. The main cause was a way too slow SSD but at the same time... that sorta thing still shouldn't happen, locking up the entire system. I haven't had a chance to try again with a faster SSD. Regards... ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] raid10 to raid what? - convert
lejeczek schreef op 18-10-2017 17:52: I'm still looking for an answer - if it's possible then how to split raid10 into two raid0 LVs(with perhaps having data intact?) I've been fiddling with --splitmirrors but either I got it wrong or I didn't and command just fails. More than contemplating theories and general knowledge on raid I'd prise a lot succinct, concrete info, actuall experience of "howto". Sorry that I can't help you here, but I believe it is possible with mdraid. However note that if you can spend the time you could take two disks, wipe them, put raid 0 on it, and then copy the still functioning RAID 10 over. I know that's not what you're asking but personally I have no hands-on experience with LVM raid and metadata. I assume that even if you were to manage to get a raid 0 going with pure DM commands you'd still need to change LVM's metadata and no clue how to do that myself. Conceivably, copying your data over is not ideal but it should not take longer than a day? Don't forget that RAID 10 is 2 stripes of mirrors... But LVM typically prevents you to operate on a volume group if not all PVs are present... Anyway if it can't be done by lvconvert, it probably can't be done (without whatever complex thing). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Difference between Debian and some other distributions with thin provisioning
Jan Tulak schreef op 29-09-2017 18:42: Debian: # cat /etc/debian_version 8.9 # lvm version LVM version: 2.02.111(2) (2014-09-01) Library version: 1.02.90 (2014-09-01) Driver version: 4.27.0 Centos: # cat /etc/redhat-release CentOS Linux release 7.4.1708 (Core) # lvm version LVM version: 2.02.171(2)-RHEL7 (2017-05-03) Library version: 1.02.140-RHEL7 (2017-05-03) Driver version: 4.35.0 Versions are not actually very close. On Debian I do know that I am often confused by thin commands, sometimes -T seems to be necessary and sometimes --thin-pool or I am just confused, I don't know. Same with Grub2, it has been in 2.02beta for ages and was only recently released as 2.02 but I think years of development went into that. Although I think I can stay they are quit headstrong as compared to LVM2... But you can see that the versions are 3 years apart anyway... Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Restoring snapshot gone bad
Mauricio Tavares schreef op 22-09-2017 8:03: I have a lv, vmzone/desktop that I use as drive for a kvm guest; nothing special here. I wanted to restore its snapshot so like I have done many times before I shut guest down and then lvconvert --merge vmzone/desktop_snap_20170921 Logical volume vmzone/desktop is used by another device. Can't merge over open origin volume. Merging of snapshot vmzone/desktop_snap_20170921 will occur on next activation of vmzone/desktop. What is it really trying to tell me? How to find out which other device is using it? Other people will have better answers but I think it will be hard to see unless it is used by the device mapper for some target. I hope there is a better answer. But obviously the "o" means open volume (ie. mounted or something else) and that means that if you can't find a way to close it next time you boot the machine it will get merged? I hope there is a good way to get your usage information (apart from something like "lsof"). Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reserve space for specific thin logical volumes
Hi, thank you for your response once more. Zdenek Kabelac schreef op 21-09-2017 11:49: Hi Some more 'light' into the existing state as this is really not about what can and what cannot be done in kernel - as clearly you can do 'everything' in kernel - if you have the code for it... Well thank you for that ;-). In practice your 'proposal' is quite different from the existing target - essentially major rework if not a whole new re-implementation - as it's not 'a few line' patch extension which you might possibly believe/hope into. Well I understand that the solution I would be after would require modification to the DM target. I was not arguing for LVM alone; I assumed that since DM and LVM are both hosted in the same space there would be at least the idea of cooperation between the two teams. And that it would not be too 'radical' to talk about both at the same time. Of course this decision makes some tasks harder (i.e. there are surely problems which would not even exist if it would be done in kernel) - but lots of other things are way easier - you really can't compare those I understand. But many times lack of integration of shared goal of multiple projects is also big problem in Linux. However if we *can* standardize on some tag or way of _reserving_ this space, I'm all for it. Problems of a desktop user with 0.5TB SSD are often different with servers using 10PB across multiple network-connected nodes. I see you call for one standard - but it's very very difficult... I am pretty sure that if you start out with something simple, it can extend into the complex. That's of course why an elementary kernel feature would make sense. A single number. It does not get simpler than that. I am not saying you have to. I was trying to find out if your statements that something was impossible, was actually true. You said that you need a completely new DM target from the ground up. I doubt that. But hey, you're the expert, not me. I like that you say that you could provide an alternative to the regular DM target and that LVM could work with that too. Unfortunately I am incapable of doing any development myself at this time (sounds like fun right) and I also of course could not myself test 20 PB. I think a 'critical' tag in combination with the standard autoextend_threshold (or something similar) is too loose and ill-defined and not very meaningful. We look for delivering admins rock-solid bricks. If you make small house or you build a Southfork out of it is then admins' choice. We have spend really lot of time thinking if there is some sort of 'one-ring-to-rule-them-all' solution - but we can't see it yet - possibly because we know wider range of use-cases compared with individual user-focused problem. I think you have to start simple. You can never come up with a solution if you start out with the complex. The only thing I ever said was: - give each volume a number of extents or a percentage of reserved space if needed - for all the active volumes in the thin pool, add up these numbers - when other volumes require allocation, check against free extents in the pool - possibly deny allocation for these volumes I am not saying here you MUST do anything like this. But as you say, it requires features in the kernel that are not there. I did not know or did not realize the upgrade paths of the DM module(s) and LVM2 itself would be so divergent. So my apologies for that but obviously I was talking about a full-system solution (not partial). And I would prefer to set individual space reservation for each volume even if it can only be compared to 5% threshold values. Which needs 'different' kernel target driver (and possibly some way to kill/split page-cache to work on 'per-device' basis) No no, here I meant to set it by a script or to read it by a script or to use it by a script. And just as an illustration of problems you need to start solving for this design: You have origin and 2 snaps. You set different 'thresholds' for these volumes - I would not allow setting threshold for snapshots. I understand that for dm thin target they are all the same. But for this model it does not make sense because LVM talks of "origin" and "snapshots". You then overwrite 'origin' and you have to maintain 'data' for OTHER LVs. I don't understand. Other LVs == 2 snaps? So you get into the position - when 'WRITE' to origin will invalidate volume that is NOT even active (without lvm2 being even aware). I would not allow space reservation for inactive volumes. Any space reservation is meant for safeguarding the operation of a machine. Thus it is meant for active volumes. So suddenly rather simple individual thinLV targets will have to maintain whole 'data set' and cooperate with all other active thins targets in case they share some data I don't know what data sharing has to do with it. The entire system only works with
Re: [linux-lvm] Reserve space for specific thin logical volumes
Gionatan Danti schreef op 18-09-2017 21:20: Xen, I really think that the combination of hard-threshold obtained by setting thin_pool_autoextend_threshold and thin_command hook for user-defined script should be sufficient to prevent and/or react to full thin pools. I will hopefully respond to Zdenek's message later (and the one before that that I haven't responded to), I'm all for the "keep it simple" on the kernel side. But I don't mind if you focus on this, That said, I would like to see some pre-defined scripts to easily manage pool fullness. (...) but I would really like the standardisation such predefined scripts imply. And only provide scripts instead of kernel features. Again, the reason I am also focussing on the kernel is because: a) I am not convinced it cannot be done in the kernel b) A kernel feature would make space reservation very 'standardized'. Now I'm not convinced I really do want a kernel feature but saying it isn't possible I think is false. The point is that kernel features make it much easier to standardize and to put some space reservation metric in userland code (it becomes a default feature) and scripts remain a little bit off to the side. However if we *can* standardize on some tag or way of _reserving_ this space, I'm all for it. I think a 'critical' tag in combination with the standard autoextend_threshold (or something similar) is too loose and ill-defined and not very meaningful. In other words you would be abusing one feature for another purpose. So I do propose a way to tag volumes with a space reservation (turning them cricical) or alternatively to configure a percentage of reserved space and then merely tag some volumes as critical volumes. I just want these scripts to be such that you don't really need to modify them. In other words: values configured elsewhere. If you think that should be the thin_pool_autoextend_threshold, fine, but I really think it should be configured elsewhere (because you are not using it for autoextending in this case). thin_command is run every 5%: https://www.mankier.com/8/dmeventd You will need to configure a value to check against. This is either going to be a single, manually configured, fixed value (in % or extents) Or it can be calculated based on reserved space of individual volumes. So if you are going to have a kind of "fsfreeze" script based on critical volumes vs. non-critical volumes I'm just saying it would be preferable to set the threshold at which to take action in another way than by using the autoextend_threshold for that. And I would prefer to set individual space reservation for each volume even if it can only be compared to 5% threshold values. So again: if you want to focus on scripts, fine. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Option to silence "WARNING: Sum of all thin volume sizes exceeds the size of thin pool"
Gionatan Danti schreef op 19-09-2017 10:44: Sure, I was only describing a possible case where the warning is "redundant" (ie: because the admin know the snapshot will be short-lived). I would only like to say that reducing the "warning" to a "notice" would also reduce the irritation. Ie. instead "WARNING. Word word word" it could also be "New volume xx overprovisions thin pool by xxx". This way you don't give the user the idea that you think he/she is stupid. I know this still implies some verbosity but it doesn't have to be an "error". It can simply be a "notice". ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reserve space for specific thin logical volumes
Brassow Jonathan schreef op 15-09-2017 4:06: There are many solutions that could work - unique to every workload and different user. It is really hard for us to advocate for one of these unique solutions that may work for a particular user, because it may work very badly for the next well-intentioned googler. Well, thank you. Of course in the split between saying "it is the administrator's job that everyone works well" and at the same time saying that those administrators can be "googlers". There's a big gap between that. I think that many who do employ thinp will be at least a bit more serious about it, but perhaps not as serious that they can devote all the resources to developing all of the mitigating measures that anyone could want. So I think the common truth lies more in the middle: they are not googlers who implement the first random article they find without thinking about it, and they are not professional people in full time employment doing this thing. So because of that fact that most administrators interested in thin like myself will have read LVM manpages a great deal already on their own systems... And any common default targets for "thin_command" could also be well documented and explained, and pros and cons layed out. The only thing we are talking about today is reserving space due to some threshold. And performing an action when that reservation is threatened. So this is the common need here. This need is going to be the same for everyone that uses any scheme that could be offered. Then the question becomes: are interventions also as common? Well there are really only a few available: a) turning into error volume as per the bug b) fsfreezing c) merely reporting d) (I am not sure if "lvremove" should really be seriously considered). At this point you have basically exhausted any default options you may have that are "general". No one actually needs more than that. What becomes interesting now is the logic underpinning these decisions. This logic needs some time to write and this is the thing that administrators will put off. So they will live with not having any intelligence in automatic response and will just live with the risk of a volume filling up without having written the logic that could activate the above measures. That's the problem. So what I am advocating for -- I am not disregarding Mr. Zdenek's bug ;-), [1], In fact I think this "lverror" would be very welcome (paraphrasing here) even though personally I would want to employ a filesystem mechanic if I am doing this using a userland too anyway!!! But sure, why not. I think that is complementary to and orthogonal to the issue of where the logic is coming from, and that the logic also requires a lot of resources to write. So even though you could probably hack it together in some 15 minutes, and then you need testing etc... I think it would just be a lot more pleasant if this logic framework already existed, was tried and tested, did the job correctly, and can easily be employed by anyone else. So I mean to say that currently we are only talking about space reservation. You can only do this in a number of ways: - % of total volume size. - fixed amount configured per volume And that's basically it. The former merely requires each volume to be 'flagged' as 'critical' as suggested. The latter requires some number to be defined and then flagging is unnecessary. The script would ensure that: - not ALL thin volumes are 'critical'. - as long as a single volume is non-critical, the operation can continue - all critical volumes are aggregated in required free space - the check is done against currently available free space - the action on the non-critical-volumes is performed if necessary. That's it. Anyone could use this. The "Big vs. Small" model is a little bit more involved and requires a little bit more logic, and I would not mind writing it, but it follows along the same lines. *I* say that in this department, *only* these two things are needed. + potentially the lverror thing. So I don't really see this wildgrowth of different ideas. So personally I would like the "set manual size" more than the "use percentage" in the above. I would not want to flag volumes as critical, I would just want to set their reserved space. I would prefer if I could set this in the LVM volumes themselves, rather than in the script. If the script used a percentage, I would want to be able to configure the percentage outside the script as well. I would want the script to do the heavy lifting of knowing how to extract these values from the LVM volumes, and some information on how to put them there. (Using tags and all of that is not all that common knowledge I think). Basically, I want the script to know how to set and retrieve properties from the LVM volumes. Then I want it to be easy to see the reserved space (potentially) (although this can
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 14-09-2017 21:05: Basically user-land tool takes a runtime snapshot of kernel metadata (so gets you information from some frozen point in time) then it processes the input data (up to 16GiB!) and outputs some number - like what is the real unique blocks allocated in thinLV. That is immensely expensive indeed. Typically snapshot may share some blocks - or could have already be provisioning all blocks in case shared blocks were already modified. I understand and it's good technology. Yes I mean my own 'system' I generally of course know how much data is on it and there is no automatic data generation. However lvm2 is not 'Xen oriented' tool only. We need to provide universal tool - everyone can adapt to their needs. I said that to indicate that prediction problems are not current important for me as much but they definitely would be important in other scenarios or for other people. You twist my words around to imply that I am trying to make myself special, while I was making myself unspecial: I was just being modest there. Since your needs are different from others needs. Yes and we were talking about the problems of prediction, thank you. But if I do create snapshots (which I do every day) when the root and boot snapshots fill up (they are on regular lvm) they get dropped which is nice, old snapshot are different technology for different purpose. Again, what I was saying was to support the notion that having snapshots that may grow a lot can be a problem. I am not sure the purpose of non-thin vs. thin snapshots is all that different though. They are both copy-on-write in a certain sense. I think it is the same tool with different characteristics. With 'plain' lvs output is - it's just an orientational number. Basically highest referenced chunk for a thin given volume. This is great approximation of size for a single thinLV. But somewhat 'misleading' for thin devices being created as snapshots... (having shared blocks) I understand. The above number for "snapshots" were just the missing numbers from this summing up the volumes. So I had no way to know snapshot usage. I just calculated all used extents per volume. The missing extents I put in snapshots. So I think it is a very good approximation. So you have no precise idea how many blocks are shared or uniquely owned by a device. Okay. But all the numbers were attributed to the correct volume probably. I did not count the usage of the snapshot volumes. Whether they are shared or unique is irrelevant from the point of view of wanting to know the total consumption of the "base" volume. In the above 6 extents were not accounted for (24 MB) so I just assumed that would be sitting in snapshots ;-). Removal of snapshot might mean you release NOTHING from your thin-pool if all snapshot blocks where shared with some other thin volumes Yes, but that was not indicated in above figure either. It was just 24 MB that would be freed ;-). Snapshots can only become a culprit if you start overwriting a lot of data, I guess. If you say that any additional allocation checks would be infeasible because it would take too much time per request (which still seems odd because the checks wouldn't be that computation intensive and even for 100 gigabyte you'd only have 25.000 checks at default extent size) -- of course you asynchronously collect the data. Processing of mapping of upto 16GiB of metadata will not happen in miliseconds and consumes memory and CPU... I get that. If that is the case. That's just the sort of thing that in the past I have been keeping track of continuously (in unrelated stuff) such that every mutation also updated the metadata without having to recalculate it... I am meaning to say that if indeed this is the case and indeed it is this expensive, then clearly what I want is not possible with that scheme. I mean to say that I cannot argue about this design. You are the experts. I would have to go in learning first to be able to say anything about it ;-). So I can only defer to your expertise. Of course. But the purpose of what you're saying is that the number of uniquely owned blocks by any snapshot is not known at any one point in time. And needs to be derived from the entire map. Okay. Thus reducing allocation would hardly be possible, you say. Because the information is not known anyway. Well pardon me for digging this deeply. It just seemed so alien that this thing wouldn't be possible. I mean it seems so alien that you cannot keep track of those numbers runtime without having to calculate them using aggregate measures. It seems information you want the system to have at all times. I am just still incredulous that this isn't being done... But I am not well versed in kernel concurrency measures so I am hardly qualified to comment on any of that. In any case, thank you for you
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 13-09-2017 21:35: We are moving here in right direction. Yes - current thin-provisiong does not let you limit maximum number of blocks individual thinLV can address (and snapshot is ordinary thinLV) Every thinLV can address exactly LVsize/ChunkSize blocks at most. So basically the only options are allocation check with asynchronously derived intel that might be a few seconds late, as a way to execute some standard and general "prioritizing" policy, and an interventionalist policy that will (fs)freeze certain volumes depending on admin knowledge about what needs to happen in his/her particular instance. This is part of the problem: you cannot calculate in advance what can happen, because by design, mayhem should not ensue, but what if your predictions are off? Great - 'prediction' - we getting on the same page - prediction is big problem Yes I mean my own 'system' I generally of course know how much data is on it and there is no automatic data generation. Matthew Patton referenced quotas in some email, I didn't know how to do it as quickly when I needed it so I created a loopback mount from a fixed sized container to 'solve' that issue when I did have an unpredictable data source... :p. But if I do create snapshots (which I do every day) when the root and boot snapshots fill up (they are on regular lvm) they get dropped which is nice, but particularly the big data volume if I really were to move a lot of data around I might need to first get rid of the snapshots or else I don't know what will happen or when. Also my system (yes I am an "outdated moron") does not have thin_ls tool yet so when I was last active here and you mentioned that tool (thank you for that, again) I created this little script that would give me also info: $ sudo ./thin_size_report.sh [sudo] password for xen: Executing self on linux/thin Individual invocation for linux/thin name pct size - data54.34% 21.69g sites4.60% 1.83g home 6.05% 2.41g - + volumes 64.99% 25.95g snapshots0.09% 24.00m - + used65.08% 25.97g available 34.92% 13.94g - + pool size 100.00% 39.91g The above "sizes" are not volume sizes but usage amounts. And the % are % of total pool size. So you can see I have 1/3 available on this 'overprovisioned' thin pool ;-). But anyway. Being able to set a maximum snapshot size before it gets dropped could be very nice. You can't do that IN KERNEL. The only tool which is able to calculate real occupancy - is user-space thin_ls tool. Yes my tool just aggregated data from "lvs" invocations to calculate the numbers. If you say that any additional allocation checks would be infeasible because it would take too much time per request (which still seems odd because the checks wouldn't be that computation intensive and even for 100 gigabyte you'd only have 25.000 checks at default extent size) -- of course you asynchronously collect the data. So I don't know if it would be *that* slow provided you collect the data in the background and not while allocating. I am also pretty confident that if you did make a policy it would turn out pretty good. I mean I generally like the designs of the LVM team. I think they are some of the most pleasant command line tools anyway... But anyway. On the other hand if all you can do is intervene in userland, then all LVM team can do is provide basic skeleton for execution of some standard scripts. So all you need to do is to use the tool in user-space for this task. So maybe we can have an assortment of some 5 interventionalist policies like: a) Govern max snapshot size and drop snapshots when they exceed this b) Freeze non-critical volumes when thin space drops below aggegrate values appropriate for the critical volumes c) Drop snapshots when thin space <5% starting with the biggest one d) Also freeze relevant snapshots in case (b) e) Drop snapshots when exceeding max configured size in case of threshold reach. So for example you configure max size for snapshot. When snapshots exceeds size gets flagged for removal. But removal only happens when other condition is met (threshold reach). So you would have 5 different interventions you could use that could be considered somewhat standard and the admit can just pick and choose or customize. This is the main issue - these 'data' are pretty expensive to 'mine' out of data structures. But how expensive is it to do it say every 5 seconds? It's the user space utility which is able to 'parse' all the structure and take a 'global' picture. But of course it takes CPU and TIME and it's
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 13-09-2017 21:17: Please if you can show the case where the current upstream thinLV fails and you lose your data - we can finally start to fix something. Hum, I can only say "I owe you one" on this. I mean to say it will have to wait, but I hope to get to this at some point. I'm still unsure what problem you want to get resolved from pretty small group of people around dm/lvm2 - do you want from us to rework kernel page-cache ? I'm simply still confused what kind action you expect... Be specific with real world example. I think Brassow Jonathan's idea is very good to begin with (thank you sir ;-)). I get that you say that kernel space solution is impossible to implement (apart from not crashing the system, and I get that you say that this is no longer the case) because checking several things would prolong execution paths considerably, is what you say. And I realize that any such thing would need asynchronous checking and updating some values and then execution paths that need to check for such things which I guess could indeed by rather expensive to actually execute. I mean the only real kernel experience I have was trying to dabble with filename_lookup and path_lookupat or whatever it was called. I mean inode path lookups, which is a bit of the same thing. And indeed even a single extra check would have incurred a performance overhead. I mean the code to begin with differentiated between fast lookup and slow lookup and all of that. And particularly the fast lookup was not something you'd want to mess with, etc. But I absolutely have no issue to begin with I want to say with asynchronous 'intervention' even if it is not byte accurate, as you say in the other email. And I get that you prefer user-space tools doing the thing... And you say there that this information is hard to mine. And that the "thin_ls" tool does that. It's just that I don't want it to be 'random' and depending on your particular random sysadmin doing the right thing in isolation of all other random sysadmins having to do the right thing all in isolation of each other all writing the same code. At the very least if you recognise your responsibility, which you are doing now, we can have a bit of a framework that is delivered by upstream LVM so the thing comes out more "fully fleshed" and sysadmins have less work to do, even if they still have to customize the scripts or anything. Most ideal thing would definitely be something you "set up" and then the thing takes care of itself, ie. you only have to input some values and constraints. But intervention in forms of "fsfreeze" or whatever is very personal, I get that. And I get that previously auto-unmounting also did not really solve issues for everyone. So a general interventionalist policy that is going to work for everyone is hard to get. So the only thing that could work for everyone is if there is actually a block on new allocations. If that is not possible, then indeed I agree that a "one size fits all" approach is hardly possible. Intervention is system-specific. Regardless at least it should be easy to ensure that some constraints are enforced, that's all I'm asking. Regards, (I'll respond further in the other email). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 12-09-2017 23:57: Users interested in thin-provisioning are really mostly interested in performance - especially on multicore machines with lots of fast storage with high IOPS throughput (some of them even expect it should be at least as good as linear) Why don't you hold a survey? And not phrase it in terms of "Would you like to sacrifice performance for more safety?" But please. Ask people: 1) What area does the LVM team needs to focus on for thin provisioning: a) Performance and keeping performance intact b) Safety and providing good safeguards against human and program error c) User interface and command line tools d) Monitoring and reporting software and systems e) Graphical user interfaces f) Integration into default distributions and support for booting/grub And then allow people to score these things with a percentage or to distribute some 20 points across these 6 points. Invent more points as needed. Give people 20 points to distribute across some 8 areas of interest. Then ask people what areas are most interesting to them. So topics could be: (a) Performance (b) Robustness (c) Command line user interface (d) Monitoring systems (e) Graphical user interface (f) Distribution support So ask people. Don't assume. (NetworkManager team did this pretty well by the way. They were really interested in user perception some time ago). if you will keep thinking for a while you will at some point see the reasoning. Only if your reasoning is correct. Not if your reasoning is wrong. I could also say to you, we could also say to you "If you think longer on this you will see we are right". That would probably be more accurate even. Repeated again - whoever targets for 100% full thin-pool usage has misunderstood purpose of thin-provisioning. Again, no one "targets" for 100% full. It is just an eventuality we need to take care of. You design for failure. A nuclear plant who did not take account of operator drunkenness and had no safety measures in place to ensure that would not lead to catastrophe, would be a very bad nuclear plant. Human error can be calculated into the design. In fact, it must. DESIGN FOR HUMAN WEAKNESS. NOT EVERYONE IS PERFECT and human faults happen. If I was a customer and I was paying your bills, you would never respond like this. We like some assurance that things do not go immediate mayhem the moment someone somewhere slacks off and falls asleep. We like to design in advance so we do not have to keep a constant eye out. We build "structure" so that the structure works for us, and not constant vigilance. Constant vigilance can fail. Structure cannot. Focus on "being" not "doing". ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 12-09-2017 16:57: This bug has been reported (by me even to libblkid maintainer) AND already fixed already in past I was the one who reported it. This was Karel Zak's message from 30 august 2016: "On Fri, Aug 19, 2016 at 01:14:29PM +0200, Karel Zak wrote: On Thu, Aug 18, 2016 at 10:39:30PM +0200, Xen wrote: Would someone be will to fix the issue that a Physical Volume from LVM2 (PV) when placed directly on disk (no partitions or partition tables) will not be This is very unusual setup, but according to feedback from LVM guys it's supported, so I will improve blkid to support it too. Fixed in the git tree (for the next v2.29). Thanks. Karel" So yes, I knew what I was talking about. At least slightly ;-). :p. But to defend a bit libblkid maintainer side :) - this feature was not really well documented from lvm2 side... That's fine. You can sync every second to minimize amount of dirty pages Lots of things all of them will in some other the other impact system performance He said no people would be hurt by such a measure except people who wanted to unpack and compile kernel pure in page buffers ;-). So clearly you need to spend resources effectively and support both groups... Sometimes is better to use large RAM (common laptops have 32G of RAM nowadays) Yes and he said those people wanting to compile the kernel purely in memory (without using RAM disk for it) have issues anyway... ;-). So no it is not that clear that you need to support both groups. Certainly not by default. Or at least not in its default configuration for some dirty page file flag ;-). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 12-09-2017 16:37: On block layer - there are many things black & white If you don't know which process 'create' written page, nor if you write i.e. filesystem data or metadata or any other sort of 'metadata' information, you can hardly do any 'smartness' logic on thin block level side. You can give any example to say that something is black and white somewhere, but I made a general point there, nothing specific. The philosophy with DM device is - you can replace then online with something else - i.e. you could have a linear LV which is turned to 'RAID" and than it could be turned to 'Cache RAID' and then even to thinLV - all in one raw on life running system. I know. So what filesystem should be doing in this case ? I believe in most of these systems you cite the default extent size is still 4MB, or am I mistaken? Should be doing complex question of block-layer underneath - checking current device properties - and waiting till the IO operation is processed - before next IO comes in the process - and repeat the some in very synchronous slow logic ??Can you imagine how slow this would become ? You mean a synchronous way of checking available space in thin volume by thin pool manager? We are targeting 'generic' usage not a specialized case - which fits 1 user out of 100 - and every other user needs something 'slightly' different That is completely exaggerative. I think you will find this issue comes up often enough to think that it is not one out of 100 and besides unless performance considerations are at the heart of your ...reluctance ;-) no one stands to lose anything. So only question is design limitations or architectural considerations (performance), not whether it is a wanted feature or not (it is). I don't think there is anything related... Thin chunk-size ranges from 64KiB to 1GiB Thin allocation is not by default in extent-sizes? The only inter-operation is the main filesystem (like extX & XFS) are getting fixed for better reactions for ENOSPC... and WAY better behavior when there are 'write-errors' - surprisingly there were numerous faulty logic and expectation encoded in them... Well that's good right. But I did read here earlier about work between ExtFS team and LVM team to improve allocation characteristics to better align with underlying block boundaries. If zpools - are 'equally' fast as thins - and gives you better protection, and more sane logic the why is still anyone using thins??? I don't know. I don't like ZFS. Precisely because it is a 'monolith' system that aims to be everything. Makes it more complex and harder to understand, harder to get into, etc. Of course if you slow down speed of thin-pool and add way more synchronization points and consume 10x more memory :) you can get better behavior in those exceptional cases which are only hit by unexperienced users who tends to intentionally use thin-pools in incorrect way. I'm glad you like us ;-). Yes apologies here, I responded to this thing earlier (perhaps a year ago) and the systems I was testing on was 4.4 kernel. So I cannot currently confirm and probably is already solved (could be right). Back then the crash was kernel messages on TTY and then after some 20-30 there is by default 60sec freeze, before unresized thin-pool start to reject all write to unprovisioned space as 'error' and switches to out-of-space state. There is though a difference if you are out-of-space in data or metadata - the later one is more complex... I can't say whether it was that or not. I am pretty sure the entire system froze for longer than 60 seconds. In page cache there are no thing logically separated - you have 'dirty' pages you need to write somewhere - and if you writes leads to errors, and system reads errors back instead of real-data - and your execution code start to run on completely unpredictable data-set - well 'clean' reboot is still very nice outcome IMHO Well even if that means some dirty pages are lost before the application discovers it, any read or write errors should at some point lead to the application to shut down right. I think for most applications the most sane behaviour would simply be to shut down. Unless there is more sophisticated error handling. I am not sure what we are arguing about at this point. Application needs to go anyway. If I had a system crashing because I wrote to some USB device that was malfunctioning, that would not be a good thing either. Well try to BOOT from USB :) and detach and then compare... Mounting user data and running user-space tools out of USB is uncomparable... Systems would also grind to a halt from user-data and not system files. I know booting from USB can be 1000x slower than user data. But shared page cache for all devices is bad design, period. AFAIK - this is still not resolved issue... That's a shame. You can have
Re: [linux-lvm] Reserve space for specific thin logical volumes
Zdenek Kabelac schreef op 12-09-2017 13:46: What's wrong with BTRFS I don't think you are a fan of it yourself. Either you want fs & block layer tied together - that the btrfs/zfs approach Gionatan's responses used only Block layer mechanics. or you want layered approach with separate 'fs' and block layer (dm approach) Of course that's what I want or I wouldn't be here. If you are advocating here to start mixing 'dm' with 'fs' layer, just because you do not want to use 'btrfs' you'll probably not gain main traction here... You know Zdenek, it often appears to me your job here is to dissuade people from having any wishes or wanting anything new. But if you look a little bit further, you will see that there is a lot more possible within the space that you define, than you think in a black & white vision. "There are more things in Heaven and Earth, Horatio, than is dreamt of in your philosophy" ;-). I am pretty sure many of the impossibilities you cite spring from a misunderstanding of what people want, you think they want something extreme, but it is often much more modest than that. Although personally I would not mind communication between layers in which providing layer (DM) communicates some stuff to using layer (FS) but 90% of the time that is not even needed to implement what people would like. Also we see ext4 being optimized around 4MB block sizes right? To create better allocation. So that's example of "interoperation" without mixing layers. I think Gionatan has demonstrated that pure block layer functionality, is possible to have more advanced protection ability that does not need any knowledge about filesystems. We need to see EXACTLY which kind of crash do you mean. If you are using some older kernel - then please upgrade first and provide proper BZ case with reproducer. Yes apologies here, I responded to this thing earlier (perhaps a year ago) and the systems I was testing on was 4.4 kernel. So I cannot currently confirm and probably is already solved (could be right). Back then the crash was kernel messages on TTY and then after some 20-30 seconds total freeze. After I copied too much data to (test) thin pool. Probably irrelevant now if already fixed. BTW you can imagine an out-of-space thin-pool with thin volume and filesystem as a FS, where some writes ends with 'write-error'. If you think there is OS system which keeps running uninterrupted, while number of writes ends with 'error' - show them :) - maybe we should stop working on Linux and switch to that (supposedly much better) different OS I don't see why you seem to think that devices cannot be logically separated from each other in terms of their error behaviour. If I had a system crashing because I wrote to some USB device that was malfunctioning, that would not be a good thing either. I have said repeatedly that the thin volumes are data volumes. Entire system should not come crashing down. I am sorry if I was basing myself on older kernels in those messages, but my experience dates from a year ago ;-). Linux kernel has had more issues with USB for example that are unacceptable, and even Linus Torvalds himself complained about it. Queues filling up because of pending writes to USB device and entire system grinds to a halt. Unacceptable. You can have different pools and you can use rootfs with thins to easily test i.e. system upgrades Sure but in the past GRUB2 would not work well with thin, I was basing myself on that... I do not see real issue with using thin rootfs myself but grub-probe didn't work back then and OpenSUSE/GRUB guy attested to Grub not having thin support for that. Most thin-pool users are AWARE how to properly use it ;) lvm2 tries to minimize (data-lost) impact for misused thin-pools - but we can't spend too much effort there Everyone would benefit from more effort being spent there, because it reduces the problem space and hence the burden on all those maintainers to provide all types of safety all the time. EVERYONE would benefit. But if you advocate for continuing system use of out-of-space thin-pool - that I'd probably recommend start sending patches... as an lvm2 developer I'm not seeing this as best time investment but anyway... Not necessarily that the system continues in full operation, applications are allowed to crash or whatever. Just that system does not lock up. But you say these are old problems and now fixed... I am fine if filesystem is told "write error". Then filesystem tells application "write error". That's fine. But it might be helpful if "critical volumes" can reserve space in advance. That is what Gionatan was saying...? Filesystem can also do this itself but not knowing about thin layer it has to write random blocks to achieve this. I.e. filesystem may guess about thin layout underneath and just write 1 byte to each block it wants to allocate. But
Re: [linux-lvm] Reserve space for specific thin logical volumes
Gionatan Danti schreef op 08-09-2017 12:35: Hi list, as by the subject: is it possible to reserve space for specific thin logical volumes? This can be useful to "protect" critical volumes from having their space "eaten" by other, potentially misconfigured, thin volumes. Another, somewhat more convoluted, use case is to prevent snapshot creation when thin pool space is too low, causing the pool to fill up completely (with all the associated dramas for the other thin volumes). For my 'ideals' thin space reservation (which would be like allocation in advance) would definitely be a welcome thing. You can also think of it in terms of a default pre-allocation setting. I.e. every volume keeps a bit of space over-allocated while only doing so if there is actually room in the thin volume (some kind of lazy allocation?). Of course not trying to steal your question here and I do not know if any such thing is possible but it might be and I wouldn't mind hearing the answer as well. No offense intended. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Zdenek Kabelac schreef op 22-04-2017 23:17: That is awesome, that means a errors=remount-ro mount will cause a remount right? Well 'remount-ro' will fail but you will not be able to read anything from volume as well. Well that is still preferable to anything else. It is preferable to a system crash, I mean. So if there is no other last rather, I think this is really the only last resort that exists? Or maybe one of the other things Gionatan suggested. Currently lvm2 can't support that much variety and complexity... I think it's simpler but okay, sure... I think pretty much anyone would prefer a volume-read-errors system rather than a kernel-hang system. It is just not of the same magnitude of disaster :p. The explanation here is simple - when you create a new thinLV - there is currently full suspend - and before 'suspend' pool is 'unmonitored' after resume again monitored - and you get your warning logged again. Right, yes, that's what syslog says. It does make it a bit annoying to be watching for messages but I guess it means filtering for the monitoring messages too. If you want to filter out the recurring message, or check current thin pool usage before you send anything. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Zdenek Kabelac schreef op 18-04-2017 12:17: Already got lost in lots of posts. But there is tool 'thin_ls' which can be used for detailed info about used space by every single thin volume. It's not support directly by 'lvm2' command (so not yet presented in shiny cool way via 'lvs -a') - but user can relatively easily run this command on his own on life pool. See for usage of dmsetup message /dev/mapper/pool 0 [ reserve_metadata_snap | release_metadata_snap ] and 'man thin_ls' Just don't forget to release snapshot of thin-pool kernel metadata once it's not needed... There are two ways: polling a number through some block device command or telling the filesystem through a daemon. Remounting the filesystem read-only is one such "through a daemon" command. Unmount of thin-pool has been dropped from upstream version >169. It's now delegated to user script executed on % checkpoints (see 'man dmeventd') So I write something useless again ;-). Always this issue with versions... So Let's see, Debian Unstable (Sid) still has version 168 as does Testing (Stretch). Ubuntu Zesty Zapus (17.04) has 167. So for the foreseeable future both those distributions won't have that feature at least. I heard you speak of those scripts yes but I did not know when or what yet, thanks. I guess my script could be run directly from the script execution in the future then. Thanks for responding though, much obliged. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Stuart D. Gathman schreef op 13-04-2017 19:32: On Thu, 13 Apr 2017, Xen wrote: Stuart Gathman schreef op 13-04-2017 17:29: IMO, the friendliest thing to do is to freeze the pool in read-only mode just before running out of metadata. It's not about metadata but about physical extents. In the thin pool. Ok. My understanding is that *all* the volumes in the same thin-pool would have to be frozen when running out of extents, as writes all pull from the same pool of physical extents. Yes, I simply tested with a small thin pool not used for anything else. The volumes were not more than a few hundred megabytes big, so easy to fill up. Putting a file copy to one of the volumes that the pool couldn't handle, the system quickly crashed. Upon reboot it was neatly filled 100% and I could casually remove the volumes or whatever. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Stuart Gathman schreef op 13-04-2017 17:29: IMO, the friendliest thing to do is to freeze the pool in read-only mode just before running out of metadata. It's not about metadata but about physical extents. In the thin pool. While still involving application level data loss (the data it was just trying to write), and still crashing the system (the system may be up and pingable and maybe even sshable, but is "crashed" for normal purposes) Then it's not crashed. Only some application that may make use of the data volume may be crashed, but not the entire system. The point is that errors and some filesystem that has errors=remount-ro, is okay. If a regular snapshot that is mounted fills up, the mount is dropped. System continues operating, as normal. , it is simple to understand and recover. A sysadmin could have a plain LV for the system volume, so that logs and stuff would still be kept, and admin logins work normally. There is no panic, as the data is there read-only. Yeah a system panic in terms of some volume becoming read-only is perfectly acceptable. However the kernel going entirely mayhem, is not. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Zdenek Kabelac schreef op 13-04-2017 16:33: Hello Just let's repeat. Full thin-pool is NOT in any way comparable to full filesystem. Full filesystem has ALWAYS room for its metadata - it's not pretending it's bigger - it has 'finite' space and expect this space to just BE there. Now when you have thin-pool - it cause quite a lot of trouble across number of layers. There are solvable and being fixed. But as the rule #1 still applies - do not run your thin-pool out of space - it will not always heal easily without losing date - there is not a simple straighforward way how to fix it (especially when user cannot ADD any new space he promised to have) So monitoring pool and taking action ahead in time is always superior solution to any later postmortem systems restores. Yes that's what I said. If your thin pool runs out, your system will crash. Thanks for alluding that this will also happen if a thin snapshot causes this (obviously). Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Snapshot behavior on classic LVM vs ThinLVM
Gionatan Danti schreef op 13-04-2017 12:20: Hi, anyone with other thoughts on the matter? I wondered why a single thin LV does work for you in terms of not wasting space or being able to make more efficient use of "volumes" or client volumes or whatever. But a multitude of thin volumes won't. See, you only compared multiple non-thin with a single-thin. So my question is: did you consider multiple thin volumes? ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] LVM thin pool advice
David Shaw schreef op 15-02-2017 1:33: Is there some way to cap the amount of data that the snapshot can allocate from the pool? Also, is there some way to allocate enough metadata space that it can't run out? By way of analogy, using the old snapshot system, if the COW is sufficiently large (larger than the volume being snapshotted), it cannot overflow because even if every block of the original volume is dirtied, the COW can handle all of it. Is there some similar way to size the metadata space of a thin pool such that overflow is "impossible"? Personally I do not know the current state of affairs but the response I've often got here is that there is no such mechanic and it is up to the administrator to find out. Maybe this is a bit ghastly to say it like this, my apologies. I would very much like to be called wrong here. The problem is although the LVM monitor (I think) does respond, or can be configured to respond to a "thin pool fillup" it does so as a kind of daemon, a watch-dog, but it is not an in-system guard. Typically what I've found in the past is that a fill-up will just hang your system. So I am probably very wrong about some things so I would rather let the developers answer. But as you've found it, the snapshot for a thin volume is always allocated with the same size as the origin volume. That means unless you have double the space available, your system can crash. I have personally once ventured -- but I am just some by-stander right -- that a proper solution would have to involve inter-layer communication between filesystems and block devices, but that is even outside of the problem here. The problem as far as I can see it is that there is very unexpected behaviour when the thin pool fills up. Zdenek once pointed out that the allocator does not have a full map of what is available. For efficiency reasons, it goes "in search" of the next block to allocate. (Next extent). It does so in response to a filesystem read or write (a write, supposedly). The filesystem knows of no limits in the thin pool and expects sufficient behaviour. The block layer (in this case LVM) can respond with failure or success but I do not know how it is handled or what results it produces when the thin pool is full and no new blocks can be allocated. However I expect your system to freeze when the snapshot allocates more space than is available. I think the designated behaviour is for the snapshot to be dropped but I doubt this happens? After all the snapsnot might be mounted, etc?... It seems to me the first thing to do is to create safety margins, but then... I do not develop this thing right now :p. I think what is required is advance-allocation where each (individual) volume allocates a pre-defined number of blocks in advance. Then, any out of space message from the thin volume manager would implicate the pre-allocation and not the actual allocation for the filesystem. You create a bit of a buffer. In time. Once the individual pool allocator knows the thin pool is having problems, but it still has extents available to itself that it pre-allocated, it can already start informing the filesystem -- ideally -- that there is mayhem to be coming. But also it means that a snapshot could recognise problems ahead of time and be told that it needs to start failing if a certain minimum of free space is not to be found. But also, all of this requires that the central thin volume manager knows ahead of time, or in any case, at any single moment, how many extents are available. If this is concurrently done and there are many such allocators operating, all of them would need to operate on synchronized numbers of available space. Particularly when space is running out I feel there should be some sort of emergency mode where restrictions start to apply. It is just unacceptable to me that the system will crash when space runs out. In case of a depleted thin pool, any snapshot should really be discarded by default I feel. Otherwise the entire thin pool should be readily frozen. But why the system should crash on this is beyond me. My apologies for this perhaps petulant message. I just think it should not be understated how important it is that a system does not crash, and I just was indicating that in the past the message has often been that it is _your_ job to create safety. But this is slightly impossible. This would indicate... well whatever. The failure case of a filled-up thin pool should not be relegated to the shadows. I hope to be made wrong here and good luck with your endeavour. I would suggest that a thin pool is very sexy ;-). But thus far there are no safeguards. Please be advised that I do not know if such limits currently exist that you ask of. I have just been told here that the thin snapshot is of equal size to origin volume and there is nothing you can do about it? Regards.
Re: [linux-lvm] inner VG inside chroot not visible inside chroot
Xen schreef op 07-01-2017 1:48: Why does it not show the mounted LV in the "lvs" table and why does it not show the PV that IS visible in the "lvs" table in the output of "vgdisplay -v" ? I am sorry. I had followed Zdenek's advice at some point and had edited some config file to set filters. It was no help at the time for whatever reason but I had forgotten I had set it and it had made it to the new system. At which point it caused the behaviour described above. Being new to filters I could not imagine what was causing it ;-). So much time lost again... and my filesystem is now messed up because of the way the old LVM (133) was constantly "exchanging" physical volumes. The filesystem would enter read-only state but not before causing gigantic corruption. Only to files opened or in use at the time (so mostly cache files etc) but still. lost+found now contains 505 files/dirs on this disk. And now I am left with the task of returning this filesystem to a proper state... it doesn't really get better, this life. Sorry. Another problem to fix and no time for any of it. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] how to change UUID of PV of duplicate partition (followup)
Zdenek Kabelac schreef op 06-01-2017 21:06: You can always 'fix it' you way. Just setup device filter in lvm.conf so you either see diskA or diskB In you particular case: filter = [ "r|/dev/sdb4|" ] or filter = [ "r|/dev/sdc4|" ] And set/change things with your existing lvm2 version Oh right, thank you man. I was looking for that but barring an option available on pvchange or vgimportclone itself I just decided to go the easy package upgrade route with David's help or suggestion. I knew that should have been possible, so thanks. I just hadn't gotten around to diving into the config file yet. Much appreciated. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] how to change UUID of PV of duplicate partition (followup)
David Teigland schreef op 06-01-2017 20:19: On Fri, Jan 06, 2017 at 08:10:20PM +0100, Xen wrote: This is what I mean: Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 not /dev/sdc4 The handling of duplicate PVs has been entirely redone in recent versions. The problems you are having are well known and should now be fixed. Oh right, I was going to write my LVM version, but did not manage to produce it yet, sorry :p. Good to know. Yeah I am using 16.04 from Ubuntu's version. LVM version: 2.02.133(2) (2015-10-30) Library version: 1.02.110 (2015-10-30) Driver version: 4.34.0 Sigh... I guess I'll have to make some time to start using a recent version then on this system. Perhaps it was due time. Always these old versions... ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] duplicate pv change uuid
David Teigland schreef op 06-01-2017 20:04: How does one do this, again? vgimportclone is meant to do this. It strategically uses filters to isolate and modify the intended devs. Thanks. I thought running it in the current situation would be enough. For some reason I am not getting my own emails though. Basically I am not getting list emails, only direct replies atm. So I will CC myself I guess. Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 not /dev/sdc4 Using duplicate PV /dev/sdb4 without holders, replacing /dev/sdc4 Volume group containing /dev/sdb4 has active logical volumes Physical volume /dev/sdb4 not changed 0 physical volumes changed / 1 physical volume not changed This was after running vgimportclone /dev/sdc4 It immediately replaces the one I target with the one I am using, (don't need to change, don't need) and then says it's active, duh. I guess I'll just need to reboot a live DVD or actually OpenSUSE DVDs are faster. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] how to change UUID of PV of duplicate partition (followup)
I mean, This is what I mean: Found duplicate PV 3U9ac3Ah5lcZUf03Iwm0cgMMaKxdflg0: using /dev/sdb4 not /dev/sdc4 Using duplicate PV /dev/sdb4 without holders, replacing /dev/sdc4 Volume group containing /dev/sdb4 has active logical volumes Physical volume /dev/sdb4 not changed 0 physical volumes changed / 1 physical volume not changed It immediately replaced the good PV with the bad PV (that I was trying to change) so I cannot actually get to the "bad" PV (which is duplicate) to change it without booting an external system in which I can effect one disk in isolation. But, after running that command my root filesystem was now mounted read-only instantly so even just attaching the disk basically causes the entire system to instantly fail. Real good right. Probably my entire fault right :-/. "Let's cause this system to crash, we'll attach a harddisk." "Job done!" Actually I guess in this case it replaced the bad with the good but behind the scenes something else happened as well. This time it is hiding /dev/sdc4, the other time it was hiding /dev/sdb4, it seems to be random. Basically any eSata system that a disk gets attached to could cause the operating system to fail. The same would probably be true of regular USB disks. Even inserting a USB stick could crash a system like this. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] pvcreate: vfat signature detected on /dev/sda5
John L. Poole schreef op 10-12-2016 4:35: zeta jlpoole # pvcreate /dev/sda5 WARNING: vfat signature detected on /dev/sda5 at offset 54. Wipe it? [y/n]: n Aborted wiping of vfat. 1 existing signature left on the device. Aborting pvcreate on /dev/sda5. zeta jlpoole # Is the warning something I need to be concerned about when creating a physical volume on /dev/sda5? I'm wondering if it was looking at another partition and bubbling up just as a precaution. I am assuming this is a pretty standard thing; and nothing you need to worry about. If you wanted to be sure, you could create a dd of the 2nd partition, or of the first 131 MB. Then after PV create you could do a diff on the saved data and the real data still on disk, but I am going to assume -- as a layman -- that LVM is not messing up the messages; it is really talking about sda5, and there is nothing going to happen. I encounter spurious signatures all the time, most of the time I do not even worry how it got there. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] boot volume in activated containing PV does not activate
Linda A. Walsh schreef op 07-11-2016 17:58: Xen wrote: Now at boot (prior to running of systemd) I activate both coll/msata-lv and msata/root. --- Systemd requires that it be PID 1 (the first process to run on your system). I'm told that it won't run unless that is true. So how can you do anything before systemd has been run? Hey lover :p. Because I do it in the initrd. I'm not sure how much of that is "handed over" to systemd but it makes a big difference. For example, pre-activating one half of my (embedded) mirrors was the key to making sure the entire mirror is activated properly upon boot. If not, I would have to run something like vgchange --refresh or lvchange --refresh or something of the kind. This works fine when both mirrors are present. As a test I removed the primary (SSD) mirror of the root volume (and the boot volume). Now the system still boots (off of another disk which has grub on it, but only the grub core and boot image, nothing else, so it still references my VG) and the root volume still gets activated but the boot volume doesn't get activated anymore. What can cause this? It ought to get activated by udev rules. You remember I patched the udev file to ensure a PV directly on disk always gets activated but this is not that disk. Could it be that the msata/boot volume doesn't get activated because the PV had already been activated in the initrd but only as an LV and not its volumes? Sounds very similar to opensuse's requirement that /usr be on the same partition as root -- if not, then you have to boot using a ramdisk, which mounts /usr, and then does the real system boot, so of course, booting directly from disk (which is what my machine does) is not supported. I also mount "/usr" as a first step in my boot process, but that disallows a systemd boot, which some define as systemd being pid 1. So you skip an initrd right(or initramfs). My Ubuntu thing just mounts itself as /, then pivots to the real root after activating some stuff. On Ubuntu (and Debian) you create hooks and scripts and then they just get embedded into initramfs and that just gets loaded prior to systemd doing anything. It was said that LVM (or actually, SystemD and UDev) just processes all things again; all devices all passed through the udev system once more, and so everything is getting activated or at least processed (creates a udev trigger for something to happen). The strange thing was -- and I was just a bit confused just a moment about about a similar Grub question I asked somewhere -- that... while booting from my regular msata thing that had the first half of the mirrors, and while pre-activating the outer-embedding PV (and its LV) that is used for that second half, this was enough to get the boot volume completely activated without explicitly doing so. Meaning, the second PV to "msata" volume group was now activated and this PV contains both the second half of msata/root and the second half of msata/boot. As a consequence, when boot is activated by systemd, it finds the volume (PV) just fine. But now for some strange reason it does not, so now that I remove the msata disk, the PV for it (the second half that is still there) is still getting activated but now that the first half is not found, and triggered, so to speak... the second half suddenly does not prompt a systemd activation anymore. Actually systemd is no part of it at all, it is just the udev rule that rules the pvchange. So *apparently* there is no udev rule being run on the second-half PV (that contains both second-halves) and the *root* one has already been activated by my initrd (ramfs) so there's no problem there. The boot one (which is just one half at this point) does not get activated at all. Which leads me to leave the udev rule misses it. I'm told systemd has changed the meaning of 'fstab' to not be a table of disks to mount, but to be a table of disks to keep mounted (even if umount'd by an administrator). Since udev's functionality was incorporated into systemd, might it not be doing something similar -- trying to maintain the boot-state in effect before you added the mirror? I don't think so. At this point there is no mirror because I am only providing one half. I am pretty sure the bootstate is maintained if I were to preactivate the boot LV in the initrd(ramfs). I am sorry for not having responded to your email of last January. I guess I'm living in a slow-time bubble. There was a reason I couldn't respond, I don't remember what it was. Seems like 3 days ago for me. Nothing much happens except for terrible things. Everything moves as fast as my feet can take me, which is not very fast. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] creating DD copies of disks
Michael D. Setzer II schreef op 17-09-2016 17:18: With windows systems, there seem to be more issues. On the same machine it works fine, but moving to systems that are just a little different sometimes results in various messages. Windows 10 has less issues than before. You can move a system from a "regular" disk to a "firmware raid" disk (same disk with a little bit of firmware data at the end, and run from a different controller) and if your raid drivers are already installed, it will boot without issue whatsoever. Anyway, thanks for your response. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] creating DD copies of disks
Xen schreef op 17-09-2016 16:16: I just won't be able to use the old system until I change it back. Time to test, isn't it. Indeed both Linux and Windows have no issues with the new disk, I think. I was making a backup onto a bigger-sized disk, but neither Windows nor Linux have an issue with it. They just see the old partition table of the old disk and are fine with that. Now I only need to know if the Linux system will run with the new disk, but I'm sure there won't be issues there now. The system is not actually *on* that disk. Call it a data disk for an SSD system. So perhaps you can see why I would want to have the two disks loaded at the same time: - if they work, I can copy data even after the DD (perhaps, to make some rsync copy as you indicate) but now I already have all the required structures (partition tables...) without any work. - I wasn't actually yet done with the old disk. This was also just research. Better make a backup first before you finalize things. I want to do more work on the "old" system before I finalize things. But having a backup sitting there that I can use is a plus. Having to not be able to use both disks at the same time is a huge detriment. It is also not that hard now to change the system back but I still don't know how I can manually change the UUIDs that a VG/LV references. It is a huge plus if I can just exchange one PV with another at will. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] creating DD copies of disks
Lars Ellenberg schreef op 17-09-2016 15:49: On Sat, Sep 17, 2016 at 09:29:16AM +0200, Xen wrote: I want to ask again: What is the proper procedure when duplicating a disk with DD? depends on what you define as "proper", what the desired outcome is supposed to look like. What exactly are you trying to do? If you intend to "clone" PVs of some LVM2 VG, and want to be able to activate that on the same system without first deactivating the "original", I suggest: 1) create consistent snapshot(s) or clone(s) of all PVs 2) import them with "vgimportclone", which is a shell script usually in /sbin/vgimportclone, that will do all the neccessary magic for you, creating new "uuid"s and renaming the vg(s). Right so that would mean first duplicating partition tables etc. I will check that out some day. At this point it is already done, mostly. I didn't yet know you could do that, or what a "clone" would be, so thank you. - my experience indicates that pvchange -ay on a PV that contains a duplicate VG, even if it has a different UUID, but with an identical name, creates problems. You don't say... I do say but this is very common and you can run into it without realizing, e.g. as you open some loopback image of some other system and you hadn't realized it would contain an identically named VG as your own system. The issue is not that the problem happens, the issue is that you can't recover from it. After both VGs are activated, in my experience, you are screwed. You may not be able to rename the 2nd PV, or even the first. I mean the VG sitting in that PV. Sometimes it means having to reboot the system and then doing it again while renaming your own VG prior to loading the alien one. This "you need foresight" situation is not very good. Perhaps you can deactivate the new VG and close the PV and clear it from the cache; I'm not sure, back then my "skill" was not as great. The problem really is that LVM will activate a second VG with the same name *just fine* without renaming it internally or even in display. However, once it is activated, you are at a loss. So it will happily, without you being able to know about it in advance, create a difficult to reverse situation for you. What if you *are* doing forensics (or recovery) as the Matthew person indicated? Are you now to give your own VGS completely unique names? Just so you can prevent any conflicts? Not a good situation. LVM should really auto-rename conflicting VGS that get loaded after activation of the original ones, however it is hard to pick which one that should be, perhaps. At least, maybe it should bolt before activating a duplicate and then require manual intervention. Or, just make it easier to recover from the situation. It is just extremely common if you ever open an image of another disk (particularly if it's your own) or if you are doing anything with default "ubuntu-vg" or "kubuntu-vg" systems, in that sense. I had a habit of calling my main VGs "Linux". Not any longer. I now try to specify the system they are from, no matter how ugly. Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] creating DD copies of disks
matthew patton schreef op 17-09-2016 15:04: What is the proper procedure when duplicating a disk with DD? If you're literally duplicating the entire disk, what on earth are you doing keeping it in the same system? That's very simple, I'm surprised you wouldn't see it. OF COURSE you remove it from the origin box if you expect to do anything with it. Why would you? That's like making a photo-copy of something and then moving to another house before you can read it. If anything, it's only technical limitations that would mandate such a thing. This also doesn't answer the question of what to do if you have VG with identical names. And I presume there are no active filesystems or frankly writable LVM components on the source disk while the DD is running? Nope. All VGS had been deactivated (was running from a bootable stick). Most times it's only the filesystems that contain interesting data an so a DD of the filesystem is understandable even though there are other tools like RSYNC which are more logical. Trouble is making a backup of a complex setup is also complex if you don't have the required tools for it and even "clonezilla" cannot really handle LVM that well. So you're down to manually writing scripts to do all of the steps that you need to do to back up the required data (e.g. LVM metadata, and such) and then the steps to recreate it when you restore a backup (if any). So in this case I was just making a backup of a disk because I might be needing to send the origin disk out for repair, so to speak. The disk contains various partitions and LVM structures. A clonezilla backup is possible, but cannot handle encryption. But because the new disk is meant to replace the old one (for a time) I need a literal copy anyway. Now of course I could clone the non-LVM partitions and then recreate volume groups etc. with different names, but this is arduous. In that case I would have unique UUIDs but would still need to change my new volume group names so the systems can coexist while the copy is running. At this point I'm not even sure well. Let's just say I need to ensure the operation of this disk in this system completely prior to dumping the old one. There are only two ways: disconnect the source disk (and try to boot from the new system, etc.) or run from usb stick and disconnect the source disk, in that case. But if issues arise, I may need the source disk as well. Why would there not be an option to have it loaded at the same time? They are separate disks, and should ideally not directly conflict. In the days prior to UUID, this never happened; there were never any conflicts in that sense (unless you use filesystem labels and partition labels of course). So I first want to settle into a peaceful coexistence because that is the most potent place to move forward from, I'm sure you understand. First cover the basics, then move on. One answer well. In any case it is clear that after changing the UUID of the PV and VG and changing the VG name, the duplicate disk can serve just fine for the activation of certain things, because LVM doesn't care what your VG is called, it will just find your LV by its UUID, if that makes sense. So the duplicate LVS still have identical UUIDs and hence still perform in the old way (and cannot really coexist now). However it seems not possible to change the UUID of a LV. https://www.redhat.com/archives/rhl-list/2008-March/msg00329.html Not answered to satisfaction. Why would you need to use two different systems to copy data between two disks? That seems hardly possible. I have now two VGS with different UUIDs: VG UUID jrDQRC-6tlI-n1xK-O7nh-xVAt-Y5SL-Ou8X7b VG UUID KyWokE-ddUN-8GXO-HgWA-5bqU-9HN2-57Qyho But when I allow the 2nd one (the new one) to be activated, and activate something else as well, its LVs will be used just fine as PV for something else, based on UUID and nothing else. Indeed, blkid will show them as having identical UUIDs. Now I had forgot to run pvchange -u on those LVs, so I guess Alistair was right in that thread. But the pvchange -u also instantly updated the VGS that referenced it; which is not so bad, but now the system will run with the new disk, and not the old disk. But that means part of the "migration" is at least complete from this point of view. So thank you. Now that Linux has no issues whatsoever I will have to see what Windows is going to do. It's nice to know that when you change the UUID of a LV that is used as PV for something else, that something else is updated automatically. That was part of my question: how do I inform the "user" of the changed PVS (UUIDS)? So what I have now is basically a duplicate disk but all the UUIDs are different (at least for the LVM). But generally I do not mount with UUID so for the partitions it is not really an issue now. The backup was also made for safety at this point. I just
[linux-lvm] creating DD copies of disks
I want to ask again: What is the proper procedure when duplicating a disk with DD? - after duplication you cannot update a PV with pvchange -u because it will supersede your duplicate with the original and not do anything So to do that you need to boot off a different system, deactivate loaded vgs (if any) and then pvchange -u the duplicate PV. - my experience indicates that pvchange -ay on a PV that contains a duplicate VG, even if it has a different UUID, but with an identical name, creates problems. I mean that anytime you load a VG with the same name you get issues. VG names are often standardized between installs, so that Ubuntu might have "ubuntu-vg" as the name and kubuntu might have "kubuntu-vg" as the name. So if you then load two of those disks in the same system, you instantly have issues. If the system were to auto- or temporary-rename an offending 2nd VG it wouldn't be so bad. But usually you have to vgrename rename your current VG in advance of loading a second disk. Which isn't exactly as intended, because now you are changing your local name to make room for a second system, when it should really be the other way around. In the end I feel I have to do: pvchange -u vgchange -u vgrename To get something that will at least not bug me when both systems are loaded at the same time. This then renders it impossible to use it as a backup because any other disk referencing the PV will not find it because the UUID has changed. Now you would first have to reverse these operations (particularly the vgrename and pvchange -u) towards the data of the first disk (the original) to be able to use the device again. All of that is not very resilient. Now both PV UUIDs and VG names have to be unique. Particularly I wonder how easy it is to point an existing VG to a disk that has a new (duplicate) PV and tell it: use that one from now on. I mean: how can I add a disk that has a duplicate PV with a different UUID and add it to the VG in such a way that it replaces the references that VG has for the old PV? But also: what ought you to do if creating a mirror copy? (duplicate copy). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] lvm2 raid volumes
Heinz Mauelshagen schreef op 16-09-2016 16:13: Yes, looks like you don't have the 2nd PV accessible by the time when the raid1 is being discovered and initially activated, hence the superblock can't be retrieved. These messages seem to be coming from initramfs, so check which driver is missing/not loaded to access the 2nd PV. The fact that you gain access to the raid1 completely after reboot (as you mention further down) tells the aforementioned fact is reasoning this degraded activation. I.e. disk driver loaded after root pivot. Please ensure it is available in the initramfs and loaded. Heinz Yes, thank you. The problem was that the VG that contained the PV used as the 2nd PV was not getting activated at initramfs time. I solved it now by creating some hooks that would obtain a hierarchical PV list from a running system and then ensure all PVs in that list that were also LVs, would get activated prior to the root device. The issue is really that (on Ubuntu) LV activation is very selective in the initramfs. Of course it is an embedded or "enclosed" setup, maybe it is not recommended. Regardless the only issue was that LVs are getting selectively activated (only root and swap). Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] LVM cache/dm-cache questions.
lejeczek schreef op 29-08-2016 16:22: I cannot debug now as for now I've given up the idea to encrypt this LV, but I would say is should be easily reproducible (maybe even waste of time looking at my setup) I can definitely say I have encrypted a cached volume without issue. I have had a disk with two cached volumes, if you want to know, both caches originating from the same SSD. Main caches (main origin volume) were on a HDD in a simple regular non-thin LVM. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] force umount (2)
Here is the latest version of my force umount script lol. Liberating the free from the oppressed. Sending TERM signal to -bash Sending TERM signal to sudo su Session terminated, terminating shell...Sending TERM signal to su ...terminated. sagemode@perfection:~$ Sending TERM signal to bash Liberating the oppressed from the free. There are programs remaining. Connection to sagemode.net closed by remote host. Connection to sagemode.net closed. Sorry for being so late with it ;-). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
[linux-lvm] rudimentary kill script
Here is a quick way to kill all processes that might have an open file on some volume you want to umount and close. kill_all_dm() { hexdev=$(dmsetup info -c -o major,minor $1 --noheadings | tr ':' ' ' | { read a b; printf "D0x%02x%02x\n" $a $b; }) process_list=$(lsof -F cDn0 | tr '\0' '\t' | awk -F '\t' '{ if (substr($1,1,1) == "p") P=$1; else print P "\t" $0 }' | grep $hexdev | awk '{print $1}' | sed "s/^p//") [ -n "$process_list" ] && echo "$process_list" | xargs kill -9 } It works on logical volumes and device mapper (paths) equally, but you must specify the path: kill_all_dm /dev/linux/root ;-). Alternatively this will just list all open processes: find_processes() { hexdev=$(dmsetup info -c -o major,minor $1 --noheadings | tr ':' ' ' | { read a b; printf "D0x%02x%02x\n" $a $b; }) process_list=$(lsof -F cDn0 | tr '\0' '\t' | awk -F '\t' '{ if (substr($1,1,1) == "p") P=$1; else print P "\t" $0 }' | grep $hexdev | awk '{print $1}' | sed "s/^p//") [ -n "$process_list" ] && echo "$process_list" | xargs ps } find_processes /dev/linux/root ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] LVM cache/dm-cache questions.
lejeczek schreef op 26-08-2016 16:01: whatever you might call it, it works, luks encrypting, opening & mounting @boot - so I only wonder (which was my question) why not cache pool LVs. Is it not supported... would be great if a developer sees this question, I'm not sure jut yet about filing a bug report. Ondrej has it down. You can only encrypt the combined volume, not the individual parts, unless you encrypt those at the PV level. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] LVM cache/dm-cache questions.
lejeczek schreef op 26-08-2016 14:45: well, I prefer to encrypt LV itself, and I'm trying the same what worked always with my "regular" LVs, yet - "cache pools LVs" fail to encrypt with: Command failed with code 22. If you are going to encrypt an LV it will no longer be an LV but an (opened) LUKS container. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] lvm2 raid volumes
Heinz Mauelshagen schreef op 03-08-2016 15:10: The Cyp%Sync field tells you about the resynchronization progress, i.e. the initial mirroring of all data blocks in a raid1/10 or the initial calculation and storing of parity blocks in raid4/5/6. Heinz, can I perhaps ask you here. If I can. I have put a root volume on raid 1. Maybe "of course" the second disk (LVM volumes) are not available at system boot: aug 15 14:09:19 xenpc2 kernel: device-mapper: raid: Loading target version 1.7.0 aug 15 14:09:19 xenpc2 kernel: device-mapper: raid: Failed to read superblock of device at position 1 aug 15 14:09:19 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 mirrors aug 15 14:09:19 xenpc2 kernel: created bitmap (15 pages) for device mdX aug 15 14:09:19 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 pages, set 19642 of 30040 bits aug 15 14:09:19 xenpc2 kernel: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null) This could be because I am using PV directly on disk (no partition table) for *some* volumes (actually the first disk, that is booted from), however, I force a start of LVM2 service by enabling it in SystemD: aug 15 14:09:19 xenpc2 systemd[1]: Starting LVM2... This is further down the log, so LVM is actually started after the RAID is loading. At that point normally, from my experience, only the root LV is available. Then at a certain point more devices become available: aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/mapper/msata-boot. aug 15 14:09:22 xenpc2 systemd[1]: Started LVM2. aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/tmp. aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/swap. aug 15 14:09:22 xenpc2 systemd[1]: Found device /dev/raid/var. But just before that happens, there are some more RAID1 errors: aug 15 14:09:22 xenpc2 kernel: device-mapper: raid: Failed to read superblock of device at position 1 aug 15 14:09:22 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 mirrors aug 15 14:09:22 xenpc2 kernel: created bitmap (1 pages) for device mdX aug 15 14:09:22 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 pages, set 320 of 480 bits aug 15 14:09:22 xenpc2 kernel: device-mapper: raid: Failed to read superblock of device at position 1 aug 15 14:09:22 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 mirrors aug 15 14:09:22 xenpc2 kernel: created bitmap (15 pages) for device mdX aug 15 14:09:22 xenpc2 kernel: mdX: bitmap initialized from disk: read 1 pages, set 19642 of 30040 bits Well small wonder if the device isn't there yet. There are no messages for it, but I will assume the mirror LVs came online at the same time as the other "raid" volume group LVs, which means the RAID errors preceded that. Hence, no secondary mirror volumes available, cannot start the raid, right. However after logging in, the Cpy%Sync behaviour seems normal: boot msata rwi-aor--- 240,00m 100,00 root msata rwi-aor--- 14,67g 100,00 Devices are shown as: boot msata rwi-aor--- 240,00m 100,00 boot_rimage_0(0),boot_rimage_1(0) root msata rwi-aor--- 14,67g 100,00 root_rimage_0(0),root_rimage_1(0) dmsetup table seems normal: # dmsetup table | grep msata | sort coll-msata--lv: 0 60620800 linear 8:36 2048 msata-boot: 0 491520 raid raid1 3 0 region_size 1024 2 252:14 252:15 - - msata-boot_rimage_0: 0 491520 linear 8:16 4096 msata-boot_rimage_1: 0 491520 linear 252:12 10240 msata-boot_rimage_1-missing_0_0: 0 491520 error msata-boot_rmeta_0: 0 8192 linear 8:16 495616 msata-boot_rmeta_1: 0 8192 linear 252:12 2048 msata-boot_rmeta_1-missing_0_0: 0 8192 error msata-root: 0 30760960 raid raid1 3 0 region_size 1024 2 252:0 252:1 - - msata-root_rimage_0: 0 30760960 linear 8:16 512000 msata-root_rimage_1: 0 30760960 linear 252:12 509952 msata-root_rimage_1-missing_0_0: 0 30760960 error msata-root_rmeta_0: 0 8192 linear 8:16 503808 msata-root_rmeta_1: 0 8192 linear 252:12 501760 msata-root_rmeta_1-missing_0_0: 0 8192 error But actually it's not because it should reference 4 devices, not two. Apologies. It only references the volumes of the first disk (image and meta). E.g. 252:0 and 252:1 are: lrwxrwxrwx 1 root root 7 aug 15 14:09 msata-root_rmeta_0 -> ../dm-0 lrwxrwxrwx 1 root root 7 aug 15 14:09 msata-root_rimage_0 -> ../dm-1 Whereas the volumes from the other disk are: lrwxrwxrwx 1 root root 7 aug 15 14:09 msata-root_rmeta_1 -> ../dm-3 lrwxrwxrwx 1 root root 7 aug 15 14:09 msata-root_rimage_1 -> ../dm-5 If I dismount /boot, lvchange -an msata/boot, lvchange -ay msata/boot, it loads correctly: aug 15 14:56:23 xenpc2 kernel: md/raid1:mdX: active with 1 out of 2 mirrors aug 15 14:56:23 xenpc2 kernel: created bitmap (1 pages) for device mdX aug 15 14:56:23 xenpc2 kernel: mdX: bitmap initialized
[linux-lvm] sata cable disconnect + hotplug after
What am I supposed to do when a sata cable disconnects and reconnects as another device? I had a disk at probably /dev/sda. At a certain point the filesystem had become read-only. I realized the cable must have disconnected and after fixing it the same device was now at /dev/sdf. Now the device had gone missing from the system but it would not refind it. # pvscan WARNING: Device for PV fEGbBn-tbIp-rL7y-m22b-1rQh-r9i5-Qwlqz7 not found or rejected by a filter. PV unknown deviceVG xenpc1 lvm2 [600,00 GiB / 158,51 GiB free] pvck clearly found it and lvmdiskscan also found it. Nothing happened until I did pvscan --cache /dev/sdf: # pvscan --cache /dev/sdf # vgscan Reading all physical volumes. This may take a while... Duplicate of PV fEGbBn-tbIp-rL7y-m22b-1rQh-r9i5-Qwlqz7 dev /dev/sdf exists on unknown device 8:0 Found volume group "xenpc1" using metadata type lvm2 Now I was able to activate it again and it was no longer flagged as partial (but now it is duplicate). The unknown device 8:0 is clearly going to be /dev/sda, which is no longer there. How can I dump this reference to 8:0 or should something else be done? Oh right pvscan --cache without a parameter I wonder if I can run this while the thing is still activated I was beginning to think there'd be some hidden filter rule, but it was "just" the cache. Should this thing be automatically resolved? Is running pvscan --cache enough? ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] export/migrate - but only a LV - how?
Brassow Jonathan schreef op 22-07-2016 15:12: One advantage is that pvmove allows you to keep the device online... Right, that is pretty advanced. Good going. We are also working on a feature ATM called “duplicate”. It allows you to create a duplicate of any LV stack, create the duplicate where you want, and even change the segment type in the process. For example, you could move a whole RAID5 LV, or thin-p, in one go. I think one important "trouble spot" is that there is no operation yet(?) to create a duplicate of a PV that will not ruin the system unless you do pvchange -u and vgchange -u ;-). It took quite a bit of time for me to realize what was going on ;-). Is PVMOVE supposed to be a backup task? I don't think so? How are you supposed to back something up if you are planning to move your system to a new disk? ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] export/migrate - but only a LV - how?
Brassow Jonathan schreef op 18-07-2016 17:50: maybe pvmove the LV to a unique device and then vgsplit? brassow On Jul 12, 2016, at 10:23 AM, lejeczekwrote: .. if possible? Shouldn't you in general just recreate the LV with the same amount of extents and then perform a DD? I realize an atomic move operation for an LV could conceptually be nice but apart from the mental effort required to do this recreation manually, there is not much practically in the way of doing it yourself. In the end, it would be nothing more than a shell script doing an LVCREATE, DD, and LVREMOVE? Regards. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Copying a raw disk image to LVM2
emmanuel segura schreef op 11-07-2016 0:02: the lvm metadata is stored in the begining of the physical volume, the logical volume is simple block device, so using dd you don't overwrite any lvm header. I must say I did experience something weird when copying a LUKS partition (encrypted). Apparently LUKS stores information about the device it is on (or was one) because I coudln't get it to re-adjust to a larger volume. cryptsetup resize is supposed to resize to the size of the underlying block device. I had to recreate my LUKS container before it would recognise the new size. E.g. I copied from 2GB volume to 3GB volume using dd. LUKS kept thinking it was still on a 2GB volume. So resize didn't work and I could manually resize it to 3GB but automatic resize would resize it back to 2GB. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] Unexptected filesytem unmount with thin provision and autoextend disabled - lvmetad crashed?
matthew patton schreef op 18-05-2016 6:57: Just want to say your belligerent emails are ending up in the trash can. Not automatically, but after scanning, mostly. At the same time perhaps it is worth noting that although all other emails from this list end up in my main email box just fine, except that yours (and yours alone) trigger the inbred spamfilter of my email provider, even though I have never trained it to spam your emails. Basically, each and every time I will find your messages in my spam box. Makes you think, eh? But then, just for good measure, let me just concisely respond to this one: For the FS to "know" which of it's blocks can be scribbled on and which can't means it has to constantly poll the block layer (the next layer down may NOT necessarily be LVM) on every write. Goodbye performance. Simply false and I explained already that given that the filesystem is already getting optimized for alignment with (possible) "thin" blocks (Zdenek has mentioned this) in order to more efficiently allocate (cause allocation) on the underlying layer, if it already has knowledge about this alignment, and it has knowledge about its own block usage, meaning that it can easily discover which of the "alignment" blocks it has already written to itself, then it has all the data and all the knowledge to know which blocks (extents) are completely "free". Supposing you had a 4KB blockmap (bitmap). Now supposing you have 4MB extents. Then every 10 bits in the blockmap corresponds to one bit in the extent map. You know this. To condense the free blockmap into a free extent map: (bit "0" is free, bit "1" is in use): For every extent: blockmap_segment = blockmap & (1023 << (extent number * 1024); is_an_empty_extent = blockmap_segment > 0; So it knows clearly which extents are empty. Then it can simply be told not to write to those extents anymore. If the filesystem is already using discards (mount option) then in practice those extents will also be unallocated by thin LVM. So the filesystem knows which blocks (extents) will cause allocation, if it knows it is sitting on a thin device like that. However, it does mean the filesystem must know the 'hidden geometry' beneath its own blocks, so that it can know about stuff that won't work anymore. I'm pretty sure this was explained to you a couple weeks ago: it's called "integration". You dumb faced idiot. You know full well this information is already there. What are you trying to do here? Send me into the woods again? For a long time harddisks have shed their geometry data onto us. And filesystems can be created with geometry information (of a certain kind) in mind. Yes, these are creation flags. But extent alignment is also a creation flag. The extent alignment, or block size, does not change over time all of a sudden. Not that it should matter that much principially. But this information can simply be had. It is no different that knowing the size of the block device to begin with. If the creation tools would be LVM-aware (they don't have to be) the administrator could easily SET these parameters without any interaction with the block layer itself. They can already do this for flags such as: stride=stride-size Configure the filesystem for a RAID array with stride-size filesystem blocks. This is the number of blocks read or written to disk before moving to next disk. This mostly affects placement of filesystem metadata like bitmaps at mke2fs(2) time to avoid placing them on a single disk, which can hurt the performance. It may also be used by block allocator. stripe_width=stripe-width Configure the filesystem for a RAID array with stripe-width filesystem blocks per stripe. This is typically be stride-size * N, where N is the number of data disks in the RAID (e.g. RAID 5 N+1, RAID 6 N+2). This allows the block allocator to prevent read-modify-write of the parity in a RAID stripe if possible when the data is written. And LVM extent size is not going to be any different. Zdenek explained earlier: However what is being implemented is better 'allocation' logic for pool chunk provisioning (for XFS ATM) - as rather 'dated' methods for deciding where to store incoming data do not apply with provisioned chunks efficiently. i.e. it's inefficient to provision 1M thin-pool chunks and then filesystem uses just 1/2 of this provisioned chunk and allocates next one. The smaller the chunk is the better space efficiency gets (and need with snapshot), but may need lots of metadata and may cause fragmentation troubles. Geometry data has always been part of block device drivers and I am sorry I cannot do better at this point (finding the required information on code interfaces is hard): struct hd_geometry { unsigned char heads; unsigned char sectors; unsigned short cylinders; unsigned long start; }; Block devices also register
Re: [linux-lvm] thin disk -- like overcomitted/virtual memory? (was Re: about the lying nature of thin)
Hey sweet Linda, this is beyond me at the moment. You go very far with this. Linda A. Walsh schreef op 10-05-2016 23:47: Isn't using a thin memory pool for disk space similar to using a virtual memory/swap space that is smaller than the combined sizes of all processes? I think there is a point to that, but for me the concordance is in the idea that filesystems should perhaps have different modes of requesting memory (space) as you detail below. Virtual memory typically cannot be expanded (automatically) although you could. Even with virtual memory there is normally a hard limit, and unless you include shared memory, there is not really any relation with overprovisioned space, unless you started talking about prior allotment, and promises being given to processes (programs) that a certain amount of (disk) space is going to be available when it is needed. So what you are talking about here I think is expectation and reservation. A process or application claims a certain amount of space in advance. The system agrees to it. Maybe the total amount of claimed space is greater than what is available. Now processes (through the filesystem) are notified whether the space they have reserved is actually going be there, or whether they need to wait for that "robot cartridge retrieval system" and whether they want to wait or will quit. They knew they needed space and they reserved it in advance. The system had a way of knowing whether the promises could be met and the requests could be met. So the concept that keeps recurring here seems to be reservation of space in advance. That seems to be the holy grail now. Now I don't know but I assume you could develop a good model for this like you are trying here. Sparse files are difficult for me, I have never used them. I assume they could be considered sparse by nature and not likely to fill up. Filling up is of the same nature as expanding. The space they require is virtual space, their real space is the condensed space they actually take up. It is a different concept. You really need two measures for reporting on these files: real and virtual. So your filesystem might have 20G real space. Your sparse file is the only file. It uses 10G actual space. Its virtual file size is 2T. Free space is reported as 10G. Used space is given two measures: actual used space, and virtual used space. The question is how you store these. I think you should store them condensed. As such only the condensed blocks are given to the underlying block layer / LVM. I doubt you would want to create a virtual space from LVM such that your sparse files can use a huge filesystem in a non-condensed state sitting on that virtual space? But you can? Then the filesystem doesn't need to maintain blocklists or whatever, but keep in mind that normally a filesystem will take up a lot of space in inode structres and the like, when the filesystem is huge but the actual volume is not. If you create one thin pool, and a bunch of filesystems (thin volumes) of the same size, with default parameters, your entire thin pool will quickly fill up with just metadata structures. I don't know. I feel that sparse files are weird anyway, but if you use them, you'd want them to be condensed in the first place and existing in a sort of mapped state where virtual blocks are mapped to actual blocks. That doesn't need to be LVM and would feel odd there. That's not its purpose right. So for sparse you need a mapping at some point but I wouldn't abuse LVM for that primarily. I would say that is 80% filesystem and 20% LVM, or maybe even 60% custom system, 20% filesystem and 20% LVM. Many games pack their own filesystems, like we talked about earlier (when you discussed inefficiency of many small files in relation to 4k block sizes). If I really wanted sparse personally, as an application data storage model, I would first develop this model myself. I would probably want to map it myself. Maybe I'd want a custom filesystem for that. Maybe a loopback mounted custom filesystem, provided that its actual block file could grow. I would imagine allocating containers for it, and I would want the "real" filesystem to expand my containers or to create new instances of them. So instead of mapping my sectors directly, I would want to map them myself first, in a tiered system, and the filesystem to map the higher hierarchy level for me. E.g. I might have containers of 10G each allocated in advance, and when I need more, the filesystem allocates another one. So I map the virtual sectors to another virtual space, such that my containers container virtual space / container size = outer container addressing container virtual space % container size = inner container addressing outer container addressing goes to filesystem structure telling me (or it) where to write my data to. inner container addressing follows normal procedure, and
[linux-lvm] LVM but no real partitions
Did people ever consider whether it was worth being able to start a PV on a raw physical device with a small offset to reserve room for a boot sector? If you could create a PV on a bootdisk with a 2048 sector offset, you would have a bootable device without partition tables and only LVM. The same applies to LUKS. If you could put the LUKS header at 2048 sectors, you could boot a disk with nothing but a luks container, since grub2 understands LUKS. I guess such a thing would be extremely easy to achieve, it just wouldn't be portable. Since you may need to recompile both LUKS and Grub, and possibly even dm-crypt. The total number of changes would probably not be more than 10 lines. Not sure. Anyway. Anyone ever fancied a LVM system without real partitions? ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] thin handling of available space
Mark Mielke schreef op 04-05-2016 3:25: Thanks for entertaining this discussion, Matthew and Zdenek. I realize this is an open source project, with passionate and smart people, whose time is precious. I don't feel I have the capability of really contributing code changes at this time, and I'm satisfied that the ideas are being considered even if they ultimately don't get adopted. Even the mandatory warning about snapshots exceeding the volume group size is something I can continue to deal with using scripting and filtering. I mostly want to make sure that my perspective is known and understood. You know, you really don't need to be this apologetic even if I mess up my own replies ;-). I think you have a right and a reason to say what you've said, and that's it. ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
Re: [linux-lvm] thin handling of available space
matthew patton schreef op 27-04-2016 12:26: It is not the OS' responsibility to coddle stupid sysadmins. If you're not watching for high-water marks in FS growth vis a vis the underlying, you're not doing your job. If there was anything more than the remotest chance that the FS would grow to full size it should not have been thin in the first place. Who says the only ones who would ever use or consider using thin would be sysadmins.? Monitoring Linux is troublesome enough for most people and it really is a "job". You seem to be intent on making the job harder rather than easy so you can be a type of person that has this expert knowledge while others don't? I remember a reason to crack down on sysadmins was that they didn't know how to use "vi" - if you can't use fucking vi, you're not a sysadmin. This actually is a bloated version of what a system administrator is or could at all times be expected to do, because you are ensuring that problems are going to surface one way or another when this sysadmin is suddenly no longer capable of being this perfect guy at 100% of times. You are basically ensuring disaster by having that attitude. That guy that can battle against all odds and still prevail ;-). More to the point. No one is getting cuddled because Linux is hard enough and it is usually the users who are getting cuddled; strangely enough the attitude exists that the average desktop user never needs to look under the hood. If something is ugly, who cares, the "average user" doesn't go there. The average user is oblivious to all system internals. The system administrator knows everything and can launch a space rocket with nothing more than matches and some gallons of rocket fuel. ;-). The autoextend mechanism is designed to prevent calamity when the filesystem(s) grow to full size. By your reasoning , it should not exist because it cuddles admins. A real admin would extend manually. A real admin would specify the right size in advance. A real admin would use thin pools of thin pools that expand beyond your wildest dreams :p. But on a more serious note, if there is no chance a file system will grow to full size, then it doesn't need to be that big. But there are more use cases for thin than hosting VMs for clients. Also I believe thin pools have a use for desktop systems as well, when you see that the only alternative really is btrfs and some distros are going with it full-time. Btrfs also has thin provisioning in a sense but on a different layer, which is why I don't like it. Thin pools from my perspective are the only valid snapshotting mechanism if you don't use btrfs or zfs or something of the kind. Even a simple desktop monitor, some applet with configured thin pool data, would of course alleviate a lot of the problems for a "casual desktop user". If you remotely administer your system with VNC or the like, that's the same. So I am saying there is no single use case for thin, and. Your response mr. patton falls along the lines of "I only want this to be used by my kind of people". "Don't turn it into something everyone or anyone can use". "Please let it be something special and nichie". You can read coddle in place of cuddle. It seems to me pretty clear to me that a system that *requires* manual intervention and monitoring at all times is not a good system, particularly if the feedback on its current state cannot be retrieved from, or is usable by, other existing systems that guard against more or less the same type of things. Besides, if your arguments here were valid, then https://bugzilla.redhat.com/show_bug.cgi?id=1189215 would never have existed. The FS already has a notion of 'reserved'. man(1) tune2fs -r Alright thanks. But those blocks are manually reserved for a specific user. That's what they are for. It is for -u. These blocks are still available to the filesystem. You could call it calamity prevention as well. There will always be a certain amount of space for say the root user. and by the same measure you can also say the tmpfs overflow mechanism for /tmp is not required either because a real admin would not see his rootfs go out of diskspace. Stuff happens. You ensure you are prepared when it does. Not stick your head in the sand and claim that real gurus never encounter those situations. The real question you should be asking is if it increases the monitoring aspect (enhances it) if thin pool data is seen through the lens of the filesystems as well. Or whether that is going to be a detriment. Regards. Erratum: https://utcc.utoronto.ca/~cks/space/blog/tech/SocialProblemsMatter There is a widespread attitude among computer people that it is a great pity that their beautiful solutions to difficult technical challenges are being prevented from working merely by some pesky social issues [read: human flaws], and that the problem is solved once the technical work is done. This
[linux-lvm] 2 questions on LVM cache
1. Does LVM cache support discards of the underlying blocks (in the cache) when the filesystem discards the blocks? I was reading https://lwn.net/Articles/293658/ which makes it clear that years ago kernel developers were introducing discard behaviour into Linux filesystems with respect to flash devices and their need to copy for wear leveling. I know so little about it, but I have seen the "discard" flag mentioned so much with respect to SSDs, that I must assume these discards are there. Are LVM cache blocks discarded when the filesystem layer discards these blocks? Where can I find this info? In https://www.redhat.com/archives/linux-lvm/2016-April/msg00030.html I mentioned that such a discard feature would be necessary in order for a filesystem to communicate to a block device layer which blocks are in use, and for a block device layer to communicate back a set of available blocks if these dynamically change. I forgot the other question lol. I'm interested in this solution to "https://bugzilla.redhat.com/show_bug.cgi?id=1189215; but I will respond in my other email (LVM Thin: Handle out of space conditions better). ___ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/