Re: [btrfs-progs] coreutils-like -i parameter, splitting permissions for various tasks

2018-02-11 Thread Axel Burri


On 2018-02-10 12:24, Tomasz Pala wrote:
> There is a serious flaw in btrfs subcommands handling. Since all of them
> are handled by single 'btrfs' binary, there is no way to create any
> protection against accidental data loss for (the only one I've found,
> but still DANGEROUS) 'btrfs subvolume delete'.
> 
> There are several protections that are being used for various commands.
> For example, with zsh having hist_ignore_space enabled I got:
> 
> alias kill=' kill'
> alias halt=' halt'
> alias init=' init'
> alias poweroff=' poweroff'
> alias reboot=' reboot'
> alias shutdown=' shutdown'
> alias telinit=' telinit'
> 
> so that these command are never saved into my shell history.
> 
> Other system-wide protection enabled by default might be coreutils.sh
> creating aliases:
> 
> alias cp=' cp --interactive --archive --backup=numbered --reflink=auto'
> alias mv=' mv --interactive --backup=numbered'
> alias rm=' rm --interactive --one-file-system --interactive=once'
> 
> All such countermeasures reduce the probability of fatal mistakes.
> 
> 
> There is no 'prompt before doing ANYTHING irreversible' option for btrfs,
> so everyone needs to take special care typing commands. Since snapshotting
> and managing subvolumes is some daily-routine, not anything special
> (like creating storage pools or managing devices), this should be more
> forgiving for any user errors. Since there is no other (obvious)
> solution, I propose makeing "subvolume delete" ask for confirmation by
> default, unless used with newly introduced option, like -y(--yes).
> 
> 
> Moreover, since there might be different admin roles on the system, the
> btrfs-progs should be splitted into separate tools, so one could have
> quota-admin without permissions for managing devices, backup-admin
> with access to all the subvolumes or maintenance-admin that could issue
> scrub or rebalance volumes. For backward compatibility, these tools
> could be issued by 'btrfs' wrapper binary.

FWIW, I maintain a little patchset on btrfs-progs which separates
specific btrfs command groups and thus can be used to set/restrict
privileged access:

https://github.com/digint/btrfs-progs-btrbk

It's far from being complete, merely an ugly hack. I use it to constrain
btrfs actions (issued by btrbk) on remote machines within cron jobs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Hans van Kranenburg
On 02/11/2018 04:59 PM, Ellis H. Wilson III wrote:
> Thanks Tomasz,
> 
> Comments in-line:
> 
> On 02/10/2018 05:05 PM, Tomasz Pala wrote:
>> You won't have anything close to "accurate" in btrfs - quotas don't
>> include space wasted by fragmentation, which happens to allocate from
>> tens
>> to thousands times (sic!) more space than the files itself.
>> Not in some worst-case scenarios, but in real life situations...
>> I got 10 MB db-file which was eating 10 GB of space after a week of
>> regular updates - withOUT snapshotting it. All described here.
> 
> The underlying filesystem this is replacing was an in-house developed
> COW filesystem, so we're aware of the difficulties of fragmentation. I'm
> more interested in an approximate space consumed across snapshots when
> considering CoW.  I realize it will be approximate.  Approximate is ok
> for us -- no accounting for snapshot space consumed is not.

If your goal is to have an approximate idea for accounting, and you
don't need to be able to actually enforce limits, and if the filesystems
that you are using are as small as the 40GiB example you gave...

Why not just use `btrfs fi du   ` now and then and
update your administration with the results? .. Instead of putting the
burden of keeping track of all administration during every tiny change
all day long?

> Also, I don't see the thread you mentioned.  Perhaps you forgot to
> mention it, or an html link didn't come through properly?
> 
>>> course) or how many subvolumes/snapshots there are.  If I know that
>>> above N snapshots per subvolume performance tanks by M%, I can apply
>>> limits on the use-case in the field, but I am not aware of those kinds
>>> of performance implications yet.
>>
>> This doesn't work like this. It all depends on data that are subject of
>> snapshots, especially how they are updated. How exactly, including write
>> patterns.
>>
>> I think you expect answers that can't be formulated - with fs
>> architecture so
>> advanced as ZFS or btrfs it's behavior can't be analyzed for simple
>> answers like 'keep less than N snapshots'.
> 
> I was using an extremely simple heuristic to drive at what I was looking
> to get out of this.  I should have been more explicit that the example
> was not to be taken literally.
> 
>> This is an exception of easy-answer: btrfs doesn't handle databases with
>> CoW. Period. Doesn't matter if snapshotted or not, ANY database files
>> (systemd-journal, PostgreSQL, sqlite, db) are not handled at all. They
>> slow down entire system to the speed of cheap SD card.
> 
> I will keep this in mind, thank you.  We do have a higher level above
> BTRFS that stages data.  I will consider implementing an algorithm to
> add the nocow flag to the file if it has been written to sufficiently to
> indicate it will be a bad fit for the BTRFS COW algorithm.

Adding nocow attribute to a file only works when it's just created and
not written to yet or when setting it on the containing directory and
letting it inherit for new files. You can't just turn it on for existing
files with content.

https://btrfs.wiki.kernel.org/index.php/FAQ#Can_copy-on-write_be_turned_off_for_data_blocks.3F

>> Actually, if you do not use compression and don't need checksums of data
>> blocks, you may want to mount all the btrfs with nocow by default.
>> This way the quotas would be more accurate (no fragmentation _between_
>> snapshots) and you'll have some decent performance with snapshots.
>> If that is all you care.
> 
> CoW is still valuable for us as we're shooting to support on the order
> of hundreds of snapshots per subvolume,

Hundreds will get you into trouble even without qgroups.

> and without it (if BTRFS COW
> works the same as our old COW FS) that's going to be quite expensive to
> keep snapshots around.  So some hybrid solution is required here.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Hans van Kranenburg
On 02/11/2018 05:15 PM, Ellis H. Wilson III wrote:
> Thanks Hans.  Sorry for the top-post, but I'm boiling things down here
> so I don't have a clear line-item to respond to.  The take-aways I see
> here to my original queries are:
> 
> 1. Nobody has done a thorough analysis of the impact of snapshot
> manipulation WITHOUT qgroups enabled on foreground I/O performance
> 2. Nobody has done a thorough analysis of the impact of snapshot
> manipulation WITH qgroups enabled on foreground I/O performance

It's more that there is no simple list of clear-cut answers that apply
to every possible situation and type/pattern of work that you can throw
at a btrfs filesystem.

> 3. I need to look at the code to understand the interplay between
> qgroups, snapshots, and foreground I/O performance as there isn't
> existing architecture documentation to point me to that covers this

Well, the excellent write-up of Qu this morning shows some explanation
from the design point of view.

> 4. I should be cautioned that CoW in BTRFS can exhibit pathological (if
> expected) capacity consumption for very random-write-oriented datasets
> with or without snapshots, and nocow (or in my case transparently
> absorbing and coalescing writes at a higher tier) is my friend.

nocow only keeps the cows on a distance as long as you don't start
snapshotting (or cp --reflink) those files... If you take a snapshot,
then you force btrfs to keep the data around that is referenced by the
snapshot. So, that means that every next write will be cowed once again,
moo, so small writes will be redirected to a new location, causing
fragmentation again. The second and third write can go in the same (new)
location of the first new write, but as soon as you snapshot again, this
happens again.

> 5. I should be cautioned that CoW is broken across snapshots when
> defragmentation is run.
> 
> I will update a test system to the most recent kernel and will perform
> tests to answer #1 and #2.  I will plan to share it when I'm done.  If I
> have time to write-up my findings for #3 I will similarly share that.
> 
> Thanks to all for your input on this issue.

Have fun!

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Adam Borowski
On Sun, Feb 11, 2018 at 12:31:42PM +0300, Andrei Borzenkov wrote:
> 11.02.2018 04:02, Hans van Kranenburg пишет:
> >> - /dev/sda6 / btrfs
> >> rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot
> >> 0 0
> > 
> > Note that changes on atime cause writes to metadata, which means cowing
> > metadata blocks and unsharing them from a previous snapshot, only when
> > using the filesystem, not even when changing things (!).
> 
> With relatime atime is updated only once after file was changed. So your
> description is not entirely accurate and things should not be that
> dramatic unless files are continuously being changed.

Alas, that's untrue.  relatime updates happen if:
* the file has been written after it was last read, or
* previous atime was older than 24 hours

Thus, you get at least one unshare per inode per day, which is also the most
widespread frequency of both snapshotting and cronjobs.

Fortunately, most uses of atime are gone, thus it's generally safe to
disable it completely.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ The bill with 3 years prison for mentioning Polish concentration
⣾⠁⢰⠒⠀⣿⡁ camps is back.  What about KL Warschau (operating until 1956)?
⢿⡄⠘⠷⠚⠋⠀ Zgoda?  Łambinowice?  Most ex-German KLs?  If those were "soviet
⠈⠳⣄ puppets", Bereza Kartuska?  Sikorski's camps in UK (thanks Brits!)?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Ellis H. Wilson III
Thanks Hans.  Sorry for the top-post, but I'm boiling things down here 
so I don't have a clear line-item to respond to.  The take-aways I see 
here to my original queries are:


1. Nobody has done a thorough analysis of the impact of snapshot 
manipulation WITHOUT qgroups enabled on foreground I/O performance
2. Nobody has done a thorough analysis of the impact of snapshot 
manipulation WITH qgroups enabled on foreground I/O performance
3. I need to look at the code to understand the interplay between 
qgroups, snapshots, and foreground I/O performance as there isn't 
existing architecture documentation to point me to that covers this
4. I should be cautioned that CoW in BTRFS can exhibit pathological (if 
expected) capacity consumption for very random-write-oriented datasets 
with or without snapshots, and nocow (or in my case transparently 
absorbing and coalescing writes at a higher tier) is my friend.
5. I should be cautioned that CoW is broken across snapshots when 
defragmentation is run.


I will update a test system to the most recent kernel and will perform 
tests to answer #1 and #2.  I will plan to share it when I'm done.  If I 
have time to write-up my findings for #3 I will similarly share that.


Thanks to all for your input on this issue.

ellis
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Ellis H. Wilson III

Thanks Tomasz,

Comments in-line:

On 02/10/2018 05:05 PM, Tomasz Pala wrote:

You won't have anything close to "accurate" in btrfs - quotas don't
include space wasted by fragmentation, which happens to allocate from tens
to thousands times (sic!) more space than the files itself.
Not in some worst-case scenarios, but in real life situations...
I got 10 MB db-file which was eating 10 GB of space after a week of
regular updates - withOUT snapshotting it. All described here.


The underlying filesystem this is replacing was an in-house developed 
COW filesystem, so we're aware of the difficulties of fragmentation. 
I'm more interested in an approximate space consumed across snapshots 
when considering CoW.  I realize it will be approximate.  Approximate is 
ok for us -- no accounting for snapshot space consumed is not.


Also, I don't see the thread you mentioned.  Perhaps you forgot to 
mention it, or an html link didn't come through properly?



course) or how many subvolumes/snapshots there are.  If I know that
above N snapshots per subvolume performance tanks by M%, I can apply
limits on the use-case in the field, but I am not aware of those kinds
of performance implications yet.


This doesn't work like this. It all depends on data that are subject of
snapshots, especially how they are updated. How exactly, including write
patterns.

I think you expect answers that can't be formulated - with fs architecture so
advanced as ZFS or btrfs it's behavior can't be analyzed for simple
answers like 'keep less than N snapshots'.


I was using an extremely simple heuristic to drive at what I was looking 
to get out of this.  I should have been more explicit that the example 
was not to be taken literally.



This is an exception of easy-answer: btrfs doesn't handle databases with
CoW. Period. Doesn't matter if snapshotted or not, ANY database files
(systemd-journal, PostgreSQL, sqlite, db) are not handled at all. They
slow down entire system to the speed of cheap SD card.


I will keep this in mind, thank you.  We do have a higher level above 
BTRFS that stages data.  I will consider implementing an algorithm to 
add the nocow flag to the file if it has been written to sufficiently to 
indicate it will be a bad fit for the BTRFS COW algorithm.



Actually, if you do not use compression and don't need checksums of data
blocks, you may want to mount all the btrfs with nocow by default.
This way the quotas would be more accurate (no fragmentation _between_
snapshots) and you'll have some decent performance with snapshots.
If that is all you care.


CoW is still valuable for us as we're shooting to support on the order 
of hundreds of snapshots per subvolume, and without it (if BTRFS COW 
works the same as our old COW FS) that's going to be quite expensive to 
keep snapshots around.  So some hybrid solution is required here.


Best,

ellis
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-cleaner / snapshot performance analysis

2018-02-11 Thread Andrei Borzenkov
11.02.2018 04:02, Hans van Kranenburg пишет:
...
> 
>> - /dev/sda6 / btrfs
>> rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot
>> 0 0
> 
> Note that changes on atime cause writes to metadata, which means cowing
> metadata blocks and unsharing them from a previous snapshot, only when
> using the filesystem, not even when changing things (!).

With relatime atime is updated only once after file was changed. So your
description is not entirely accurate and things should not be that
dramatic unless files are continuously being changed.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html