Re: btrfs system slow down with 100GB file

Chris Murphy Fri, 26 Mar 2021 07:30:19 -0700

On Thu, Mar 25, 2021 at 8:59 AM Roberto Ragusa <m...@robertoragusa.it> wrote:
>
> On 3/25/21 4:25 AM, Chris Murphy wrote:
>
> > It might be appropriate to set dirty_bytes to 500M across the board,
> > desktop and server. And dirty_background to 1/4 that. But all of these
> > are kinda rudimentary guides. What we really want is something that
> > knows what the throughput of the storage is, and is making sure there
> > isn't more than a few seconds of writeback needed at any given time.
> >
> > The default, dirty_ratio 20%, is high by today's memory standards. But
> > upstream will not change it. All kernel knobs are distro
> > responsibility to change from the defaults.
>
> I don't agree with the base reasoning.
> There is nothing wrong in having many gigabytes of dirty data in memory,
> if the machine has enough RAM to do it. It is one of the things that make
> the difference between Linux and toy systems.


The problem is well understood for some time.
https://lwn.net/Articles/572911/

> "500M will be enough" sound like the historical "640k will be enough",
> because 500M could be flushed in a fraction of a second on modern SSDs.

Which would be an Ok combination. What you don't want is 1G of dirty
data accumulating before your 80 M/s drive starts writeback. If the
process fsync's a file, now you've got a blocked task that must write
out all dirty data right now, and the whole storage stack down to the
drive is not going to easily stop doing that just because some other
program fsync's a 100K cache file. There will be a delay.

Now, if this delay causes that program to stall from the user's
perspective, is that well behaved? I mean come on, we all know web
browser cache files are 100% throw away garbage, they're there to make
things faster, not to cause problems and yet here we are.

There's enough misuse of fsync by applications that there's a utility
to cause fsync to be dropped.
https://www.flamingspork.com/projects/libeatmydata/

In fact some folks run their web browser in a qemu-kvm with cache mode
"unsafe" to drop all the fsyncs.


> What you really want is that if there are 40GB of outstanding data going
> to the disk, processes are still:
> 1) able to write to the disks without heavy latency (and delaying it in
> memory is exactly achieving that)
> 2) able to read the disks without heavy latency, which is something the
> disk scheduling code will care to provide (reads have priority over writes).

If you have 40G of dirty data and your program says "fsync it" you've
got 40G of data that has been ordered flushed to stable media.
Everything else wanting access is going to come close to stopping.
That's the way it works. You don't get to "fsync this very important
thing..but oh yeah wait I wanna read a b.s. chrome cache file hold on
a sec. Ok thanks, now please continue syncing."

That's in effect something that multiqueue NVMe can do. So there's a
work around.


> The kernel has even got per-device queues to avoid a slow USB drive to stall
> the I/O for the other devices.
>
> If the filesystem is not able to read 50kB because there are 10GB of
> dirty data in memory, the problem is in the filesystem code.

The defaults are crazy.
https://lwn.net/Articles/572921/

Does this really make a difference though outside the slow USB stick
example? I don't know.  Seems like it won't for fsync heavy handedness
because that'll take precedence.

-- 
Chris Murphy
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Re: btrfs system slow down with 100GB file

Reply via email to