Re: [linux-lvm] Reserve space for specific thin logical volumes

Zdenek Kabelac Fri, 15 Sep 2017 02:27:19 -0700

Dne 15.9.2017 v 09:34 Xen napsal(a):

Zdenek Kabelac schreef op 14-09-2017 21:05:

But if I do create snapshots (which I do every day) when the root and bootsnapshots fill up (they are on regular lvm) they get dropped which is nice,
old snapshot are different technology for different purpose.
Again, what I was saying was to support the notion that having snapshots thatmay grow a lot can be a problem.

lvm2 makes them look the same - but underneath it's very different (and it'snot just by age - but also for targeting different purpose).

- old-snaps are good for short-time small snapshots - when there is estimationfor having low number of changes and it's not a big issue if snapshot is 'lost'.

- thin-snaps are ideal for long-time living objects with possibility to takesnaps of snaps of snaps and you are guaranteed the snapshot will not 'justdissapear' while you modify your origin volume...


Both have very different resources requirements and performance...

I am not sure the purpose of non-thin vs. thin snapshots is all that differentthough.
They are both copy-on-write in a certain sense.

I think it is the same tool with different characteristics.

That are cases where it's quite valid option to take old-snap of thinLV andit will payoff...

Even exactly in the case you use thin and you want to make sure your temporarysnapshot will not 'eat' all your thin-pool space and you want to let snapshot die.

Thin-pool still does not support shrinking - so if the thin-pool auto-grows tobig size - there is not a way for lvm2 to reduce the thin-pool size...

That's just the sort of thing that in the past I have been keeping track ofcontinuously (in unrelated stuff) such that every mutation also updated themetadata without having to recalculate it...

Would you prefer to spend all you RAM to keep all the mapping information forall the volumes and put very complex code into kernel to parse the informationwhich is technically already out-of-data in the moment you get the result ??


In 99.9% of runtime you simply don't need this info.

But the purpose of what you're saying is that the number of uniquely ownedblocks by any snapshot is not known at any one point in time.

As long as 'thinLV' (i.e. your snapshot thinLV) is NOT active - there isnothing in kernel maintaining its dataset. You can have lots of thinLV activeand lots of other inactive.

Well pardon me for digging this deeply. It just seemed so alien that thisthing wouldn't be possible.


I'd say it's very smart ;)

You can use only very small subset of 'metadata' information for individualvolumes.


It becomes a rather big enterprise to install thinp for anyone!!!


It's enterprise level software ;)

Because to get it running takes no time at all!!! But to get it running wellthen implies huge investment.

In most common scenarios - user knows when he runs out-of-space - it will notbe 'pleasant' experience - but users data should be safe.

And then it depends how much energy/time/money user wants to put intomonitoring effort to minimize downtime.


As has been said - disk-space is quite cheap.

So if you monitor and insert your new disk-space in-time (enterprise...) youhave less set of problems - then if you try to fight constantly with 100% fullthin-pool...


You have still problems even when you have 'enough' disk-space ;)

i.e. you select small chunk-size and you want extend thin-pool data volumebeyond addressable capacity - each chunk-size has its final maximum data size....

That means for me and for others that may not be doing it professionally or ina larger organisation, the benefit of spending all that time may not weigh upto the cost it has and the result is then that you keep stuck with a deeplysuboptimal situation in which there is little or no reporting or fixing, allbecause the initial investment is too high.


You can always use normal device - it's really about the choice and purpose...

While personally I also like the bigger versus smaller idea because you don'thave to configure it.


I'm still proposing to use different pools for different purposes...

Sometimes spreading the solution across existing logic is way easier,
then trying to achieve some super-inteligent universal one...

Script is called at  50% fullness, then when it crosses 55%, 60%, ...
95%, 100%. When it drops bellow threshold - you are called again once
the boundary is crossed...


How do you know when it is at 50% fullness?

If you are proud sponsor of your electricity provider and you like the
extra heating in your house - you can run this in loop of course...

Threshold are based on  mapped size for whole thin-pool.

Thin-pool surely knows all the time how many blocks are allocated and free for
its data and metadata devices.

But didn't you just say you needed to process up to 16GiB to know thisinformation?


Of course thin-pool has to be aware how much free space it has.
And this you can somehow imagine as 'hidden' volume with FREE space...

So to give you this 'info' about free blocks in pool - you maintain verysmall metadata subset - you don't need to know about all other volumes...

If other volume is releasing or allocation chunks - your 'FREE space' getsupdated....

It's complex underneath and locking is very performance sensitive - but foreasy understanding you can possibly get the picture out of this...

You may not know the size and attribution of each device but you do know theoverall size and availability?

Kernel support 1 setting for threshold - where the user-space (dmeventd) iswaked-up when usage has passed it.


The mapping of value is lvm.conf autoextend threshold.

As a 'secondary' source - dmeventd checks every 10 second pool fullness withsingle ioctl() call and compares how the fullness has changed and provides youwith callbacks for those 50,55... jumps

(as can be found in  'man dmeventd')

So for autoextend theshold passing you get instant call.
For all others there is up-to 10 second delay for discovery.

In the single thin-pool  all thins ARE equal.


But you could make them unequal ;-).


I cannot ;)  - I'm lvm2 coder -   dm thin-pool is Joe's/Mike's toy :)

In general - you can come with many different kernel modules which takedifferent approach to the problem.

Worth to note - RH has now Permabit in its porfolio - so there can more thenone type of thin-provisioning supported in lvm2...


Permabit solution has deduplication, compression, 4K blocks - but no 
snapshots....

The goal was more to protect the other volumes, supposing that log writinghappened on another one, for that other log volume not to impact the othermain volumes.


IMHO best protection is different pool for different thins...
You can more easily decide which pool can 'grow-up'
and which one should rather be taken offline.

So your 'less' important data volumes may simply hit the wall hard,

while your 'strategically important' one will avoid using overprovisioning asmuch as possible to keep it running.


Motto: keep it simple ;)

So you have thin global reservation of say 10GB.
Your log volume is overprovisioned and starts eating up the 20GB you haveavailable and then runs into the condition that only 10GB remains.
The 10GB is a reservation maybe for your root volume. The system (scripts) (orwhatever) recognises that less than 10GB remains, that you have claimed it forthe root volume, and that the log volume is intruding upon that.
It then decides to freeze the log volume.

Of course you can play with 'fsfreeze' and other things - but all these thingsare very special to individual users with their individual preferences.

Effectively if you freeze your 'data' LV - as a reaction you may paralyze therest of your system - unless you know the 'extra' information about the useruse-pattern.

But do not take this as something to discourage you to try it - you may comewith perfect solution for your particular system - and some other user mayfind it useful in some similar pattern...


It's just something that lvm2 can't give support globally.

But lvm2 will give you enough bricks for writing 'smart' scripts...

Okay.. I understand. I guess I was deluded a bit by non-thin snapshotbehaviour (filled up really fast without me understanding why, and concludingthat it was doing 4MB copies).


Fast disks are now easily able to write gigabytes in second... :)

But attribution of an extent to a snapshot will still be done in extent-sizesright?


Allocation unit in VG  is 'extent'   - ranges from 1sector to 4GiB
and default is 4M - yes....

So I don't think the problems of freezing are bigger than the problems ofrebooting.


With 'reboot' you know where you are -  it's IMHO fair condition for this.

With frozen FS and paralyzed system and your 'fsfreeze' operation ofunimportant volumes actually has even eaten the space from thin-pool which maypossibly been used better to store data for important volumes....and there is even big danger you will 'freeze' yourself already during call offsfreeze (unless you of course put BIG margins around)

"System is still running but some applications may have crashed. You will needto unfreeze and restart in order to solve it, or reboot if necessary. But youcan still log into SSH, so maybe you can do it remotely without a console ;-)".


Compare with  email:

Your system has run out-of-space, all actions to gain some more space hasfailed - going to reboot into some 'recovery' mode

So there is no issue with snapshots behaving differently. It's all the sameand all committed data will be safe prior to the fillup and not change afterward.


Yes - snapshot is 'user-land' language  -  in kernel - all thins  maps chunks...

If you can't map new chunk - things is going to stop - and start to errorthings out shortly...


Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

Reply via email to