Re: BTRFS, relatime vs. noatime

2020-09-12 Thread Gordon Messmer

On 9/5/20 5:29 AM, Neal Becker wrote:

If BTRFS is to become fedora default, we should consider this?

"BTRFS relatime vs. noatime - Huge Performance Difference - linux" 
https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body



The *performance* difference is probably vastly overstated by people who 
haven't actually compared btrfs to other filesystems. At least, I keep 
seeing this come up, but haven't yet seen anyone offer a comparison to 
any other filesystem, and I suspect that it is a myth.


I copied 150,000 files from btrfs to ext4 with atime updates, and the 
same transfer with no atime updates. I did the same process with ext4 to 
ext4.


The performance impact of atime updates on ext4 was 9.6% (31.65s -> 
28.614s). The performance impact of atime updates on btrfs was 3.86% 
(29.626s -> 28.482s).


This isn't a scientific test, and I'm not going to argue henceforth that 
btrfs performs better than ext4, but the test does suggest that btrfs 
with relatime isn't the "sky is falling" performance disaster that 
people are making it out to be.



|#!/bin/sh bf1=/home/input bf2=/home/output dd if=/dev/zero of=$bf1 
bs=1M count=1 dd if=/dev/zero of=$bf2 bs=1M count=1 losetup 
/dev/loop1 $bf1 losetup /dev/loop2 $bf2 mkfs.btrfs /dev/loop1 mkfs.ext4 
/dev/loop2 mkdir /mnt/input mkdir /mnt/output mount /dev/loop1 
/mnt/input mount /dev/loop2 /mnt/output rsync -aHS /home/flatpak 
/mnt/input find /mnt/input -exec touch -h -a -d 'Jan 1 2020' {} + time 
rsync -aHS /mnt/input/ /mnt/output umount /mnt/output mkfs.ext4 
/dev/loop2 mount /dev/loop2 /mnt/output # atime is already updated on 
/mnt/input, so it's not strictly necessary to remount it with noatime 
time rsync -aHS /mnt/input/ /mnt/output umount /mnt/input umount 
/mnt/output rmdir /mnt/input rmdir /mnt/output losetup -d /dev/loop1 
losetup -d /dev/loop2 rm $bf1 rm $bf2|

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-07 Thread Chris Murphy
On Mon, Sep 7, 2020 at 6:30 AM Kamil Paral  wrote:
>
> On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy  wrote:
>>
>> But if you've got a snapshot once per day, times ten days, and this kind of 
>> aggressive search function touching every file? Maybe an extra 1-2G of 
>> metadata being pinned
>
>
> I don't follow. If you have one master copy and 10 snapshots, and you change 
> the metadata on the master copy, why would it generate more metadata (than 
> when having a single snapshot)? All those snapshots can share the same 
> metadata block (provided the file wasn't changed in the meantime) and just 
> the master copy would get a new metadata block. So it should be the same 
> amount of newly written blocks, regardless of how many snapshots you have. 
> What am I missing?

Take the case of no snapshots, and also I'll just use "blocks"  to
refer to both metadata and data extents.

The normal "modify something" pattern for copy-on-write is to write
updated blocks into free space. It's never an overwrite. Even deleting
a file requires some free space to write blocks indicating the file's
deletion. Only once the new blocks are committed to stable media, are
stale blocks deallocated (turned into free space). The resulting write
pattern is: write changes into free space -> free space is reduced ->
delay -> remove references to the stale blocks -> free space
increases. If it's just atime updates happening, the net change in
used space is zero.

Now, let's snapshot. The effect of a snapshot on a copy-on-write file
system is that the "stale" blocks are preserved. They aren't
deallocated. This is why Btrfs snapshots are cheap. The snapshot
effectively prevents the deallocation and clean up steps. Since
there's no deallocation step, free space does not go up. The net
change in used space goes up upon metadata updates following the
snapshot. If I take no further snapshots, this is just a one time hit.
Subsequent changes resume the normal pattern: write
new->delay->deallocate stale = no net change. But upon snapshotting
again, I pin that subvolume's state at that moment in time.


--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-07 Thread Neal Gompa
On Mon, Sep 7, 2020 at 8:30 AM Kamil Paral  wrote:
>
> On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy  wrote:
>>
>> But if you've got a snapshot once per day, times ten days, and this kind of 
>> aggressive search function touching every file? Maybe an extra 1-2G of 
>> metadata being pinned
>
>
> I don't follow. If you have one master copy and 10 snapshots, and you change 
> the metadata on the master copy, why would it generate more metadata (than 
> when having a single snapshot)? All those snapshots can share the same 
> metadata block (provided the file wasn't changed in the meantime) and just 
> the master copy would get a new metadata block. So it should be the same 
> amount of newly written blocks, regardless of how many snapshots you have. 
> What am I missing?

When you have snapshots, you initially share all the same blocks.
However, as you continue to make changes, fewer of the blocks are
shared as new blocks are allocated for new changes. This is also true
for metadata, except that this situation results in the filesystem
making new instances of the metadata for most of its files. However,
because you have snapshots, the old instances cannot be deleted
either. Thus, you wind up with essentially a new copy of the
filesystem metadata. This is amplified as you take more snapshots.

This is the *expensive* part of snapshots and the part that people
don't necessarily realize: when you take a snapshot, you're asking for
the preservation of the entire filesystem tree, with all its data. If
you take a snapshot, then change the atimes of all the files, take a
snapshot again, then change the atimes again, you wind up having two
whole unshared instances of the filesystem snapshotted. It's
essentially the worst case scenario for filesystem metadata.

Of course, this is cheap to *make*, but expensive to *keep* as you run
out of space due to just having so many unique copies of filesystem
metadata. This is why it's often recommended that atimes are switched
off on parts of the filesystem you wish to aggressively snapshot. For
example, most of the time you don't really care about atimes for /usr
and /etc, so you can turn atimes off for that, while leaving it on for
/var and /home.

Now, we are not setting up snapshotting automatically in Fedora like
openSUSE does, so there's not as much pressure to deal with it now.
But if we make a straightforward way for people to set up
snapshotting, then part of that strategy needs to include dealing with
that problem.



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-07 Thread Kamil Paral
On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy  wrote:

> But if you've got a snapshot once per day, times ten days, and this kind
> of aggressive search function touching every file? Maybe an extra 1-2G of
> metadata being pinned
>

I don't follow. If you have one master copy and 10 snapshots, and you
change the metadata on the master copy, why would it generate more metadata
(than when having a single snapshot)? All those snapshots can share the
same metadata block (provided the file wasn't changed in the meantime) and
just the master copy would get a new metadata block. So it should be the
same amount of newly written blocks, regardless of how many snapshots you
have. What am I missing?
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-06 Thread Chris Murphy
On Sun, Sep 6, 2020 at 3:00 AM Roberto Ragusa  wrote:

> I've found atime useful in several cases. If you are doubting about a 
> configuration file being
> read or not by an application, you just check the atime before and after 
> running it
> (way easier than strace). If you are investigating what a suspect script or 
> confused user
> has just done, you can find for recent atime.

Sure, but for that to be reliable, you need 'atime'. You're changing
the default anyway.

fatrace? auditctl watch? inotify IN_ACCESS?

> After it took years to go back from noatime to a weak relatime, we are now 
> going to
> lose it completely again.

atime/relatime make read-only operations into ones that write.
Possibly quite a lot of writes. Does it make sense for everyone to
have it enabled by default everywhere? Is that on-going cost worth
some regular benefit?

> Did any filesystem developer ever think about storing atime in a different 
> way, instead
> of usual inode metadata? Maybe a dedicated journal of overriding atime entries
> (column based DB vs inode's row based DB) to cope with "access many files"
> patterns.

Yes.

> And what happened to "lazytime"? It sounded like a great approach.

Lazytime is a better default than relatime. But it just delays the
inevitable, because it updates the on-disk metadata once every 24
hours. So it doesn't solve the problem under discussion.

Are atime updates generally useful? For most files, most users, most
of the time? If not, the default should be noatime. And programs that
need this information should use inotify.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-06 Thread Roberto Ragusa

On 2020-09-06 01:35, Chris Murphy wrote:


I figured nothing was using it these days and it was a complete waste. If 
tracker uses atime, maybe I'll get more worried. But if it uses mtime, I'm not.



I've found atime useful in several cases. If you are doubting about a 
configuration file being
read or not by an application, you just check the atime before and after 
running it
(way easier than strace). If you are investigating what a suspect script or 
confused user
has just done, you can find for recent atime.

After it took years to go back from noatime to a weak relatime, we are now 
going to
lose it completely again.

Did any filesystem developer ever think about storing atime in a different way, 
instead
of usual inode metadata? Maybe a dedicated journal of overriding atime entries
(column based DB vs inode's row based DB) to cope with "access many files"
patterns.

And what happened to "lazytime"? It sounded like a great approach.

Regards.

--
   Roberto Ragusamail at robertoragusa.it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-05 Thread Chris Murphy
On Sat, Sep 5, 2020 at 11:22 AM Dominique Martinet
 wrote:
>
> Matthew Miller wrote on Sat, Sep 05, 2020:
> > On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote:
> > > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" 
> > > > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body
> > > It's something that's being looked at, see
> > > https://pagure.io/fedora-btrfs/project/issue/9
> >
> > Huh. That's... unfortunate.
> >
> > I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm
> > sure there's plenty of other practical use cases. I guess with btrfs I
> > should make separate subvolumes for these or something?
>
> It really depends on what you plan on taking snapshots on -- for example
> if you don't plan on taking snapshots for your home, it won't cost all
> that much (basically where a classic filesystem would edit the atime in
> place, btrfs needs to copy it and remove the old one, but overall there
> really shouldn't be so much difference in how it feels)
>
> However if there are snapshots the metadata has to be copied over if the
> atime changes, it's really a fundamental of cow and snapshots... It will
> have to keep an extra copy of the metadata around everytime there's a
> new snapshot with different atimes.
> And at this point then yes it might make sense to have ~/tmp or whatever
> in a different subvolume, but I don't suppose regular users would want
> to have to think about this kind of things.

Or even just 'chattr +A' on certain directories to exclude from atime
updates. It's something the desktop can just do as a courtesy.


> (note I'm also a big user of atimes, for cruft but also for pointless
> reasons like just looking at what I was doing last year or sorting files
> by access times in my home all the time... So that just means being
> reasonable about snapshots for me :P)

Yeah and at which point there is a snapshot/rollback regime in Fedora,
it isn't going to be keeping 100's of snapshots around unless the user
configures it. Silverblue defaults to two deployments. And traditional
installs have three complete kernel installs, with a fourth partial
one in the form of a rescue boot option (which is just a nohostonly
initramfs). I figure somewhere around 3-5 root snapshots?

And /home snapshots are up to the user, they can configure it however
they want. I don't tend to have more than one per day, frequently only
one per 2 week period. This is very lightweight, and almost always
gets me the kind of "what if" retention behavior I want.

I got ensnared quite badly recently with a restorecon behavior, where
it insisted on relabeling everything in /mnt. And at the time I had
the top-level of a ~1T Btrfs file system with 100's of snapshots in
it, which to a program looks like 100 directories with 1TB each of
unique data, so it tried to relabel 100TB. Fortunately almost all of
those snapshots were read-only so the relabel caused no writes but it
took me a while to figure out what was going on with a tiny dnf update
that was taking an incredibly long time. (This is not an atime update
but it's the same concept and effect in terms of exploding metadata
updates.)

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-05 Thread Chris Murphy
On Sat, Sep 5, 2020 at 6:30 AM Neal Becker  wrote:

>
> If BTRFS is to become fedora default, we should consider this?
>
> "BTRFS relatime vs. noatime - Huge Performance Difference - linux"
> https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body
>
>

Ordinary desktop workloads I doubt show a performance difference. An
overwriting file system will have a bunch of random writes to update its
metadata in place. Btrfs will delay writes and it'll be one fairly
sequential write commit - it has no fixed metadata locations the writes
just go into free space. There might be more net writes in this case with
Btrfs, but it really depends on how many files are being updated and in
what time frame. With relatime, it's hard to say. But also, it won't happen
again right away either, at least to those same files.

The contrived case is to snapshot your root subvolume. Just one snapshot is
enough. 'btrfs fi us /' to check the data and metadata usage. Use -r if you
want raw values for more precision. And now 'grep -r beer /usr'. Give it a
minute *after* the command returns, again due to delayed metadata writes.
And check usage again. So what's going on is, the prior atimes are pinned
in the snapshot of root. While root has its atimes all or mostly updated.
That's maybe  100-200 megabytes of metadata writes. Even on a hard drive
you won't likely notice that write, it'll take a couple seconds over ~1
minute of commit time. But if you've got a snapshot once per day, times ten
days, and this kind of aggressive search function touching every file?
Maybe an extra 1-2G of metadata being pinned

For what it's worth, same thing happens with thin provisioning snapshots.
And ZFS. It's a case worth understanding. And solving with some selective
noatime mounts, which as a VFS mount, can be done per bind mount (and
subvolume mounts are a pseudo-bind mount behind the scenes).

I'd say it's not a problem per se. It's a possible optimization opportunity
if the problem is big enough to be worth carving out noatime mounts by
default.

I use noatime full time for / and /home. I just checked three computers'
/var/tmp and they are all less than 1MiB. My laptop, which I use the most
by far, has 64KiB on /var/tmp. *shrug* Something is cleaning it up without
needing atime updates. And I'm certainly not cleaning it up.

GNOME Shell trash management uses .trashinfo files to time stamp everything
to track their aging I presume. I use it. And have forgotten about Trash
entirely until just now.  88MiB. And nothing in it has been there more than
7 days.

I figured nothing was using it these days and it was a complete waste. If
tracker uses atime, maybe I'll get more worried. But if it uses mtime, I'm
not.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-05 Thread Dominique Martinet
Matthew Miller wrote on Sat, Sep 05, 2020:
> On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote:
> > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" 
> > > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body
> > It's something that's being looked at, see
> > https://pagure.io/fedora-btrfs/project/issue/9
> 
> Huh. That's... unfortunate.
> 
> I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm
> sure there's plenty of other practical use cases. I guess with btrfs I
> should make separate subvolumes for these or something?

It really depends on what you plan on taking snapshots on -- for example
if you don't plan on taking snapshots for your home, it won't cost all
that much (basically where a classic filesystem would edit the atime in
place, btrfs needs to copy it and remove the old one, but overall there
really shouldn't be so much difference in how it feels)

However if there are snapshots the metadata has to be copied over if the
atime changes, it's really a fundamental of cow and snapshots... It will
have to keep an extra copy of the metadata around everytime there's a
new snapshot with different atimes.
And at this point then yes it might make sense to have ~/tmp or whatever
in a different subvolume, but I don't suppose regular users would want
to have to think about this kind of things.

(note I'm also a big user of atimes, for cruft but also for pointless
reasons like just looking at what I was doing last year or sorting files
by access times in my home all the time... So that just means being
reasonable about snapshots for me :P)
-- 
Dominique
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-05 Thread Matthew Miller
On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote:
> > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" 
> > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body
> It's something that's being looked at, see
> https://pagure.io/fedora-btrfs/project/issue/9

Huh. That's... unfortunate.

I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm
sure there's plenty of other practical use cases. I guess with btrfs I
should make separate subvolumes for these or something?

-- 
Matthew Miller

Fedora Project Leader
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: BTRFS, relatime vs. noatime

2020-09-05 Thread Neal Gompa
On Sat, Sep 5, 2020 at 8:30 AM Neal Becker  wrote:
>
>
> If BTRFS is to become fedora default, we should consider this?
>
> "BTRFS relatime vs. noatime - Huge Performance Difference - linux" 
> https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body
>

It's something that's being looked at, see
https://pagure.io/fedora-btrfs/project/issue/9



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


BTRFS, relatime vs. noatime

2020-09-05 Thread Neal Becker
If BTRFS is to become fedora default, we should consider this?

"BTRFS relatime vs. noatime - Huge Performance Difference - linux"
https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body

-- 
*Those who don't understand recursion are doomed to repeat it*
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org