Re: BTRFS, relatime vs. noatime
On 9/5/20 5:29 AM, Neal Becker wrote: If BTRFS is to become fedora default, we should consider this? "BTRFS relatime vs. noatime - Huge Performance Difference - linux" https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body The *performance* difference is probably vastly overstated by people who haven't actually compared btrfs to other filesystems. At least, I keep seeing this come up, but haven't yet seen anyone offer a comparison to any other filesystem, and I suspect that it is a myth. I copied 150,000 files from btrfs to ext4 with atime updates, and the same transfer with no atime updates. I did the same process with ext4 to ext4. The performance impact of atime updates on ext4 was 9.6% (31.65s -> 28.614s). The performance impact of atime updates on btrfs was 3.86% (29.626s -> 28.482s). This isn't a scientific test, and I'm not going to argue henceforth that btrfs performs better than ext4, but the test does suggest that btrfs with relatime isn't the "sky is falling" performance disaster that people are making it out to be. |#!/bin/sh bf1=/home/input bf2=/home/output dd if=/dev/zero of=$bf1 bs=1M count=1 dd if=/dev/zero of=$bf2 bs=1M count=1 losetup /dev/loop1 $bf1 losetup /dev/loop2 $bf2 mkfs.btrfs /dev/loop1 mkfs.ext4 /dev/loop2 mkdir /mnt/input mkdir /mnt/output mount /dev/loop1 /mnt/input mount /dev/loop2 /mnt/output rsync -aHS /home/flatpak /mnt/input find /mnt/input -exec touch -h -a -d 'Jan 1 2020' {} + time rsync -aHS /mnt/input/ /mnt/output umount /mnt/output mkfs.ext4 /dev/loop2 mount /dev/loop2 /mnt/output # atime is already updated on /mnt/input, so it's not strictly necessary to remount it with noatime time rsync -aHS /mnt/input/ /mnt/output umount /mnt/input umount /mnt/output rmdir /mnt/input rmdir /mnt/output losetup -d /dev/loop1 losetup -d /dev/loop2 rm $bf1 rm $bf2| ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Mon, Sep 7, 2020 at 6:30 AM Kamil Paral wrote: > > On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy wrote: >> >> But if you've got a snapshot once per day, times ten days, and this kind of >> aggressive search function touching every file? Maybe an extra 1-2G of >> metadata being pinned > > > I don't follow. If you have one master copy and 10 snapshots, and you change > the metadata on the master copy, why would it generate more metadata (than > when having a single snapshot)? All those snapshots can share the same > metadata block (provided the file wasn't changed in the meantime) and just > the master copy would get a new metadata block. So it should be the same > amount of newly written blocks, regardless of how many snapshots you have. > What am I missing? Take the case of no snapshots, and also I'll just use "blocks" to refer to both metadata and data extents. The normal "modify something" pattern for copy-on-write is to write updated blocks into free space. It's never an overwrite. Even deleting a file requires some free space to write blocks indicating the file's deletion. Only once the new blocks are committed to stable media, are stale blocks deallocated (turned into free space). The resulting write pattern is: write changes into free space -> free space is reduced -> delay -> remove references to the stale blocks -> free space increases. If it's just atime updates happening, the net change in used space is zero. Now, let's snapshot. The effect of a snapshot on a copy-on-write file system is that the "stale" blocks are preserved. They aren't deallocated. This is why Btrfs snapshots are cheap. The snapshot effectively prevents the deallocation and clean up steps. Since there's no deallocation step, free space does not go up. The net change in used space goes up upon metadata updates following the snapshot. If I take no further snapshots, this is just a one time hit. Subsequent changes resume the normal pattern: write new->delay->deallocate stale = no net change. But upon snapshotting again, I pin that subvolume's state at that moment in time. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Mon, Sep 7, 2020 at 8:30 AM Kamil Paral wrote: > > On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy wrote: >> >> But if you've got a snapshot once per day, times ten days, and this kind of >> aggressive search function touching every file? Maybe an extra 1-2G of >> metadata being pinned > > > I don't follow. If you have one master copy and 10 snapshots, and you change > the metadata on the master copy, why would it generate more metadata (than > when having a single snapshot)? All those snapshots can share the same > metadata block (provided the file wasn't changed in the meantime) and just > the master copy would get a new metadata block. So it should be the same > amount of newly written blocks, regardless of how many snapshots you have. > What am I missing? When you have snapshots, you initially share all the same blocks. However, as you continue to make changes, fewer of the blocks are shared as new blocks are allocated for new changes. This is also true for metadata, except that this situation results in the filesystem making new instances of the metadata for most of its files. However, because you have snapshots, the old instances cannot be deleted either. Thus, you wind up with essentially a new copy of the filesystem metadata. This is amplified as you take more snapshots. This is the *expensive* part of snapshots and the part that people don't necessarily realize: when you take a snapshot, you're asking for the preservation of the entire filesystem tree, with all its data. If you take a snapshot, then change the atimes of all the files, take a snapshot again, then change the atimes again, you wind up having two whole unshared instances of the filesystem snapshotted. It's essentially the worst case scenario for filesystem metadata. Of course, this is cheap to *make*, but expensive to *keep* as you run out of space due to just having so many unique copies of filesystem metadata. This is why it's often recommended that atimes are switched off on parts of the filesystem you wish to aggressively snapshot. For example, most of the time you don't really care about atimes for /usr and /etc, so you can turn atimes off for that, while leaving it on for /var and /home. Now, we are not setting up snapshotting automatically in Fedora like openSUSE does, so there's not as much pressure to deal with it now. But if we make a straightforward way for people to set up snapshotting, then part of that strategy needs to include dealing with that problem. -- 真実はいつも一つ!/ Always, there's only one truth! ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sun, Sep 6, 2020 at 1:37 AM Chris Murphy wrote: > But if you've got a snapshot once per day, times ten days, and this kind > of aggressive search function touching every file? Maybe an extra 1-2G of > metadata being pinned > I don't follow. If you have one master copy and 10 snapshots, and you change the metadata on the master copy, why would it generate more metadata (than when having a single snapshot)? All those snapshots can share the same metadata block (provided the file wasn't changed in the meantime) and just the master copy would get a new metadata block. So it should be the same amount of newly written blocks, regardless of how many snapshots you have. What am I missing? ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sun, Sep 6, 2020 at 3:00 AM Roberto Ragusa wrote: > I've found atime useful in several cases. If you are doubting about a > configuration file being > read or not by an application, you just check the atime before and after > running it > (way easier than strace). If you are investigating what a suspect script or > confused user > has just done, you can find for recent atime. Sure, but for that to be reliable, you need 'atime'. You're changing the default anyway. fatrace? auditctl watch? inotify IN_ACCESS? > After it took years to go back from noatime to a weak relatime, we are now > going to > lose it completely again. atime/relatime make read-only operations into ones that write. Possibly quite a lot of writes. Does it make sense for everyone to have it enabled by default everywhere? Is that on-going cost worth some regular benefit? > Did any filesystem developer ever think about storing atime in a different > way, instead > of usual inode metadata? Maybe a dedicated journal of overriding atime entries > (column based DB vs inode's row based DB) to cope with "access many files" > patterns. Yes. > And what happened to "lazytime"? It sounded like a great approach. Lazytime is a better default than relatime. But it just delays the inevitable, because it updates the on-disk metadata once every 24 hours. So it doesn't solve the problem under discussion. Are atime updates generally useful? For most files, most users, most of the time? If not, the default should be noatime. And programs that need this information should use inotify. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On 2020-09-06 01:35, Chris Murphy wrote: I figured nothing was using it these days and it was a complete waste. If tracker uses atime, maybe I'll get more worried. But if it uses mtime, I'm not. I've found atime useful in several cases. If you are doubting about a configuration file being read or not by an application, you just check the atime before and after running it (way easier than strace). If you are investigating what a suspect script or confused user has just done, you can find for recent atime. After it took years to go back from noatime to a weak relatime, we are now going to lose it completely again. Did any filesystem developer ever think about storing atime in a different way, instead of usual inode metadata? Maybe a dedicated journal of overriding atime entries (column based DB vs inode's row based DB) to cope with "access many files" patterns. And what happened to "lazytime"? It sounded like a great approach. Regards. -- Roberto Ragusamail at robertoragusa.it ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sat, Sep 5, 2020 at 11:22 AM Dominique Martinet wrote: > > Matthew Miller wrote on Sat, Sep 05, 2020: > > On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote: > > > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" > > > > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body > > > It's something that's being looked at, see > > > https://pagure.io/fedora-btrfs/project/issue/9 > > > > Huh. That's... unfortunate. > > > > I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm > > sure there's plenty of other practical use cases. I guess with btrfs I > > should make separate subvolumes for these or something? > > It really depends on what you plan on taking snapshots on -- for example > if you don't plan on taking snapshots for your home, it won't cost all > that much (basically where a classic filesystem would edit the atime in > place, btrfs needs to copy it and remove the old one, but overall there > really shouldn't be so much difference in how it feels) > > However if there are snapshots the metadata has to be copied over if the > atime changes, it's really a fundamental of cow and snapshots... It will > have to keep an extra copy of the metadata around everytime there's a > new snapshot with different atimes. > And at this point then yes it might make sense to have ~/tmp or whatever > in a different subvolume, but I don't suppose regular users would want > to have to think about this kind of things. Or even just 'chattr +A' on certain directories to exclude from atime updates. It's something the desktop can just do as a courtesy. > (note I'm also a big user of atimes, for cruft but also for pointless > reasons like just looking at what I was doing last year or sorting files > by access times in my home all the time... So that just means being > reasonable about snapshots for me :P) Yeah and at which point there is a snapshot/rollback regime in Fedora, it isn't going to be keeping 100's of snapshots around unless the user configures it. Silverblue defaults to two deployments. And traditional installs have three complete kernel installs, with a fourth partial one in the form of a rescue boot option (which is just a nohostonly initramfs). I figure somewhere around 3-5 root snapshots? And /home snapshots are up to the user, they can configure it however they want. I don't tend to have more than one per day, frequently only one per 2 week period. This is very lightweight, and almost always gets me the kind of "what if" retention behavior I want. I got ensnared quite badly recently with a restorecon behavior, where it insisted on relabeling everything in /mnt. And at the time I had the top-level of a ~1T Btrfs file system with 100's of snapshots in it, which to a program looks like 100 directories with 1TB each of unique data, so it tried to relabel 100TB. Fortunately almost all of those snapshots were read-only so the relabel caused no writes but it took me a while to figure out what was going on with a tiny dnf update that was taking an incredibly long time. (This is not an atime update but it's the same concept and effect in terms of exploding metadata updates.) -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sat, Sep 5, 2020 at 6:30 AM Neal Becker wrote: > > If BTRFS is to become fedora default, we should consider this? > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body > > Ordinary desktop workloads I doubt show a performance difference. An overwriting file system will have a bunch of random writes to update its metadata in place. Btrfs will delay writes and it'll be one fairly sequential write commit - it has no fixed metadata locations the writes just go into free space. There might be more net writes in this case with Btrfs, but it really depends on how many files are being updated and in what time frame. With relatime, it's hard to say. But also, it won't happen again right away either, at least to those same files. The contrived case is to snapshot your root subvolume. Just one snapshot is enough. 'btrfs fi us /' to check the data and metadata usage. Use -r if you want raw values for more precision. And now 'grep -r beer /usr'. Give it a minute *after* the command returns, again due to delayed metadata writes. And check usage again. So what's going on is, the prior atimes are pinned in the snapshot of root. While root has its atimes all or mostly updated. That's maybe 100-200 megabytes of metadata writes. Even on a hard drive you won't likely notice that write, it'll take a couple seconds over ~1 minute of commit time. But if you've got a snapshot once per day, times ten days, and this kind of aggressive search function touching every file? Maybe an extra 1-2G of metadata being pinned For what it's worth, same thing happens with thin provisioning snapshots. And ZFS. It's a case worth understanding. And solving with some selective noatime mounts, which as a VFS mount, can be done per bind mount (and subvolume mounts are a pseudo-bind mount behind the scenes). I'd say it's not a problem per se. It's a possible optimization opportunity if the problem is big enough to be worth carving out noatime mounts by default. I use noatime full time for / and /home. I just checked three computers' /var/tmp and they are all less than 1MiB. My laptop, which I use the most by far, has 64KiB on /var/tmp. *shrug* Something is cleaning it up without needing atime updates. And I'm certainly not cleaning it up. GNOME Shell trash management uses .trashinfo files to time stamp everything to track their aging I presume. I use it. And have forgotten about Trash entirely until just now. 88MiB. And nothing in it has been there more than 7 days. I figured nothing was using it these days and it was a complete waste. If tracker uses atime, maybe I'll get more worried. But if it uses mtime, I'm not. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
Matthew Miller wrote on Sat, Sep 05, 2020: > On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote: > > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" > > > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body > > It's something that's being looked at, see > > https://pagure.io/fedora-btrfs/project/issue/9 > > Huh. That's... unfortunate. > > I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm > sure there's plenty of other practical use cases. I guess with btrfs I > should make separate subvolumes for these or something? It really depends on what you plan on taking snapshots on -- for example if you don't plan on taking snapshots for your home, it won't cost all that much (basically where a classic filesystem would edit the atime in place, btrfs needs to copy it and remove the old one, but overall there really shouldn't be so much difference in how it feels) However if there are snapshots the metadata has to be copied over if the atime changes, it's really a fundamental of cow and snapshots... It will have to keep an extra copy of the metadata around everytime there's a new snapshot with different atimes. And at this point then yes it might make sense to have ~/tmp or whatever in a different subvolume, but I don't suppose regular users would want to have to think about this kind of things. (note I'm also a big user of atimes, for cruft but also for pointless reasons like just looking at what I was doing last year or sorting files by access times in my home all the time... So that just means being reasonable about snapshots for me :P) -- Dominique ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sat, Sep 05, 2020 at 08:46:37AM -0400, Neal Gompa wrote: > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" > > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body > It's something that's being looked at, see > https://pagure.io/fedora-btrfs/project/issue/9 Huh. That's... unfortunate. I use atimes to keep ~/Downloads and ~/tmp from building up with cruft. I'm sure there's plenty of other practical use cases. I guess with btrfs I should make separate subvolumes for these or something? -- Matthew Miller Fedora Project Leader ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: BTRFS, relatime vs. noatime
On Sat, Sep 5, 2020 at 8:30 AM Neal Becker wrote: > > > If BTRFS is to become fedora default, we should consider this? > > "BTRFS relatime vs. noatime - Huge Performance Difference - linux" > https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body > It's something that's being looked at, see https://pagure.io/fedora-btrfs/project/issue/9 -- 真実はいつも一つ!/ Always, there's only one truth! ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
BTRFS, relatime vs. noatime
If BTRFS is to become fedora default, we should consider this? "BTRFS relatime vs. noatime - Huge Performance Difference - linux" https://www.reddit.com/r/linux/comments/imgler/btrfs_relatime_vs_noatime_huge_performance/?utm_source=amp&utm_medium=&utm_content=post_body -- *Those who don't understand recursion are doomed to repeat it* ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org