Re: [linux-lvm] Reserve space for specific thin logical volumes

Xen Tue, 12 Sep 2017 09:56:28 -0700

Zdenek Kabelac schreef op 12-09-2017 16:37:

On block layer - there are many things  black & white....


If you don't know which process 'create' written page, nor if you write

i.e. filesystem data or metadata or any other sort of 'metadata'information,

you can hardly do any 'smartness' logic on thin block level side.

You can give any example to say that something is black and whitesomewhere, but I made a general point there, nothing specific.

The philosophy with DM device is - you can replace then online with
something else - i.e. you could have a linear LV  which is turned to
'RAID" and than it could be turned to   'Cache RAID'  and then even to
thinLV -  all in one raw
on life running system.


I know.

So what filesystem should be doing in this case ?

I believe in most of these systems you cite the default extent size isstill 4MB, or am I mistaken?

Should be doing complex question of block-layer underneath - checking
current device properties - and waiting till the IO operation is
processed  - before next IO comes in the process - and repeat the
some  in very synchronous
slow logic ??    Can you imagine how slow this would become ?

You mean a synchronous way of checking available space in thin volume bythin pool manager?

We are targeting 'generic' usage not a specialized case - which fits 1
user out of 1000000 - and every other user needs something 'slightly'
different....


That is completely exaggerative.

I think you will find this issue comes up often enough to think that itis not one out of 1000000 and besides unless performance considerationsare at the heart of your ...reluctance ;-) no one stands to loseanything.

So only question is design limitations or architectural considerations(performance), not whether it is a wanted feature or not (it is).

I don't think there is anything related...
Thin chunk-size ranges from 64KiB to 1GiB....


Thin allocation is not by default in extent-sizes?

The only inter-operation is the main filesystem (like extX & XFS) are
getting fixed for better reactions for ENOSPC...
and WAY better behavior when there are 'write-errors' - surprisingly
there were numerous faulty logic and expectation encoded in them...

Well that's good right. But I did read here earlier about work betweenExtFS team and LVM team to improve allocation characteristics to betteralign with underlying block boundaries.

If zpools - are 'equally' fast as thins - and gives you betterprotection,
and more sane logic the why is still anyone using thins???

I don't know. I don't like ZFS. Precisely because it is a 'monolith'system that aims to be everything. Makes it more complex and harder tounderstand, harder to get into, etc.

Of course if you slow down speed of thin-pool and add way more
synchronization points and consume 10x more memory :) you can get
better behavior in those exceptional cases which are only hit by
unexperienced users who tends to intentionally use thin-pools in
incorrect way.....


I'm glad you like us ;-).

Yes apologies here, I responded to this thing earlier (perhaps a yearago) and the systems I was testing on was 4.4 kernel. So I cannotcurrently confirm and probably is already solved (could be right).
Back then the crash was kernel messages on TTY and then after some20-30
there is by default 60sec freeze, before unresized thin-pool start toreject
all write to unprovisioned space as 'error' and switches to
out-of-space state.  There is though a difference if you are
out-of-space in data
or metadata -  the later one is more complex...

I can't say whether it was that or not. I am pretty sure the entiresystem froze for longer than 60 seconds.

In page cache there are no thing logically separated - you have 'dirty'pages

you need to write somewhere - and if you writes leads to errors,
and system reads errors back instead of real-data - and your execution
code start to run on completely unpredictable data-set - well 'clean'
reboot is still very nice outcome IMHO....

Well even if that means some dirty pages are lost before the applicationdiscovers it, any read or write errors should at some point lead to theapplication to shut down right.

I think for most applications the most sane behaviour would simply be toshut down.


Unless there is more sophisticated error handling.

I am not sure what we are arguing about at this point.

Application needs to go anyway.

If I had a system crashing because I wrote to some USB device that wasmalfunctioning, that would not be a good thing either.
Well try to BOOT from USB :) and detach and then compare...
Mounting user data and running user-space tools out of USB isuncomparable...


Systems would also grind to a halt from user-data and not system files.

I know booting from USB can be 1000x slower than user data.

But shared page cache for all devices is bad design, period.

AFAIK - this is still not resolved issue...


That's a shame.

You can have different pools and you can use rootfs  with thins to
easily test i.e. system upgrades....
Sure but in the past GRUB2 would not work well with thin, I was basingmyself on that...
/boot   cannot be on thin
/rootfs is not a problem - there will be even some great enhancementfor Grub
to support this more easily and switching between various snapshots...


That's great, like with BTRFS I guess that this is possible?

But /rootfs was a problem. Grub-probe reported that it could not findthe rootfs.

When I ran with custom grub config it worked fine. It was onlygrub-probe that failed, nothing else (Kubuntu 16.04).

EVERYONE would benefit.


Fortunately most users NEVER need it ;)

You're wrong. The assurance of a system not crashing (for instance) orsome sane behaviour in case of fill-up, will put many minds at ease.

Since they properly operate thin-pool and understand it's weakpoints....


Yes they are all superhumans right.

I am sorry for being so inferior ;-).

Not necessarily that the system continues in full operation,applications are allowed to crash or whatever. Just that system doesnot lock up.
When you get bad data from your block device - your system's reaction
is unpredictable -  if your /rootfs cannot store its metadata - the
most sane behavior is to stop - all other solutions are so complex and
complicated, that spending resources to avoid hitting this state are
way better spent effort...


About rootfs, I agree.

But the nominal distinction was between thin-as-system and thin-as-data.

If you say that thin-as-data is specific use case that cannot betailored for, that is a bit odd. It is still 90% of use.

Once again -  USE different pool - solve problems at proper level....
Do not over-provision critical volumes...


Again what we want is a valid use case and a valid request.

If the system is designed so badly (or designed in such a way) that itcannot be achieved, that does not immediately make it a bad wish.

For example if a problem is caused by the page-cache of the kernel beingfor all block devices at once, then anyone wanting something that isimpossible because of that system...


...does not make that person bad for wanting it.

It makes the kernel bad for not achieving it.

I am sure your programmers are good enough to achieve asynchronousstate-updating for a thin-pool that does not interfere with allocationto the extent that it will lazily update stats and which pointallocation constraints might be basing themselves on older data (maybeseconds old) but that still doesn't mean it is useless.


It doesn't have to be perfect.

If my "critical volume" wants 1000 free extents, but it only has 988,that is not so great a problem.


Of course, I know, I hear you say "Use a different pool".

The whole idea for thin is resource efficiency.

There is no real reason that this "space reservation" can't happen.

Even if due to current design limitations, that might be there for agood reason, you are the arbiter on that.


It cannot be perfect or has to happen asynchronously.

It is better if non-critical volume starts failing than critical volume.

Failure is imminent, but we can choose which fails first.




I mean your argument is no different from.

"We need better man pages."

"REAL system administrators can use current man pages just fine."

"But any improvement would also benefit them, no need for them to dohard stuff when it can be easier."

"Since REAL system administrators can do their job as it is, ourpriorities lie elsewhere."


It's a stupid argument.

Any investment in user friendliness pays off for everyone.

Linux is often so impossible to use because no one makes thatinvestment, even though it would have immeasurable benefits foreveryone.

And then when someone does make the effort (e.g. makefile that displayshelp screen when run with no arguments) someone complains that it breaksthe contract that "make" should start compiling instantly, thus using"status quo" as a way to never improve anything.

In this case, make "help screen" can save people litterally hours oftime, multiplied by a 1000 people at least.

I.e. filesystem may guess about thin layout underneath and just write1 byte to each block it wants to allocate.
:) so how do you resolve error paths -  i.e. how do you restore space
you have not actually used....
There are so many problems with this you can't even imagine...
Yeah - we've spent quite some time in past analyzing those paths....

In this case it seems that if this is possible for regular files (anddirectories in that sense) it should also be possible for "magic" filesand directories that only exist to allocate some space somewhere. In anycase it is FS issue, not LVM.

Besides, you only strengthen my argument that it isn't FS that should bedoing it.

Please finally stop thinking about  some 'reserved' storage for
critical volume. It leads to nowhere....


It leads to you trying to convince me it isn't possible.

But no matter how much you try to dissuade, it is still an acceptableuse case and desire.

Do the right action at right place.

For critical volume  use  non-overprovisiong pools - there is nothing
better you can do - seriously!

For Gionatan's use case the problem was poor performance ofnon-overprovisioning system.

Maybe start to understand how kernel works in practice ;)


Or how it doesn't work ;-).

Like,

I will give stupid example.

Suppose using a pen is illegal.

Now lots of people want to use pen, but they end up in jail.

Now you say "Wanting to use pen is bad desire, because of consequences".

But it's pretty clear the desire won't go away.

And the real solution needs to be had at changing the law.

In this case, people really want something and for good reasons. Ifthere are structural reasons that it cannot be achieved, that is justthat.


That doesn't mean the desires are bad.

You can forever keep saying "Do this instead" but that still doesn'tever make the prime desires bad.


"Don't use a pen, use a pencil. Problem solved."

Doesn't make wanting to use a pen a bad desire, nor does it make wantingsome safe space in provisioning a bad desire ;-).

Otherwise you spend you live boring developers with ideas which simply
cannot work...


Or maybe changing their mind, who knows ;-).

So use 2 different POOLS, problem solved....


Was not possible for Gionatan's use case.

Myself I do not use critical volume, but I can imagine still wantingsome space efficiency even when "criticalness" from one volume to thenext differs.





It is proper desire Zdenek. Even if LVM can't do it.

Well it's always about checking 'upstream' first and then bothering
your upstream maintainer...


If you knew about the pre-existing problems, you could have informed me.

In fact it has happened that you said something cannot be done, and thensomeone else said "Yes, this has been a problem, we have been working onit and problems should be resolved now in this version".




You spend most of your time denying that something is wrong.

And then someone else says "Yes, this has been an issue, it is resolvednow".

If you communicate more clearly then you also have less people buggingyou.

We really cannot be solving problems of every possible deployed
combination of software.


The issue is more that at some point this was the main released version.

Main released kernel and main released LVM, in a certain sense.

Some of your colleagues are a little more forthcoming withacknowledgements that something has been failing.

This would considerably cut down the amount of time you spend being"bored" because you try to fight people who are trying to tell yousomething.

If you say "Oh yes, I think you mean this and that, yes that's a problemand we are working on it" or "Yes, that was the case before, thisversion fixes that" then



these long discussions also do not need to happen.

But you almost never say "Yes it's a problem", Zdenek.

That's why we always have these debates ;-).

_______________________________________________
linux-lvm mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

Re: [linux-lvm] Reserve space for specific thin logical volumes

Reply via email to