Re: [PATCH] recursive defrag cleanup

Austin S. Hemmelgarn Thu, 05 Jan 2017 10:15:21 -0800

On 2017-01-04 17:12, Janos Toth F. wrote:

I separated these 9 camera storages into 9 subvolumes (so now I have
10 subvols in total in this filesystem with the "root" subvol). It's
obviously way too early to talk about long term performance but now I
can tell that recursive defrag does NOT descend into "child"
subvolumes (it does not pick up the files located in these "child"
subvolumes when I point it to the "root" subvolume with the -r
option). That's very inconvenient (one might need to write a scrip
with a long static list of subvolumes and maintain it over time or
write a scrip which acquires the list from the subvolume list command
and feeds it to the defrag command one-by-one).

OK, that's good to know. You might look at some way to parse the outputof `btrfs subvol show` to simplify writing such a script. Also, it'sworth pointing out that there are other circumstances that will preventdefrag from operating on a file (I know it refuses to touch runningexecutables, and I think that it may also avoid files opened with O_DIRECT).

Because each subvolume is functionally it's own tree, it has it's own
locking for changes and other stuff, which means that splitting into
subvolumes will usually help with concurrency.  A lot of high concurrency
performance benchmarks do significantly better if you split things into
individual subvolumes (and this drives a couple of the other kernel
developers crazy to no end).  It's not well published, but this is actually
the recommended usage if you can afford the complexity and don't need
snapshots.


I am not a developer but this idea drives me crazy as well. I know
it's a silly reasoning but if you blindly extrapolate this idea you
come to the conclusion that every single file should be transparently
placed in it's own unique subvolume (by some automatic background
task) and every directory should automatically be a subvolume. I guess
there must be some inconveniently sub-optimal behavior in the tree
handling which could theoretically be optimized (or the observed
performance improvement of the subvolume segregation is some kind of
measurement error which does not really translate into actual real
life overall performance befit but only looks like that from some
specific perspective of the tests).

While it's annoying, it's also rather predictable from simple analysisof the code. Many metadata operations (and any append to a filerequires a metadata operation) require eventually locking part of thetree, and that ends up being a point of contention. In general, Iwouldn't say that _every_ file and _every_ directory would need this, asit's not often an issue on a lot of workloads either because thecontention doesn't happen (serialized data transfer, WORM accesspatterns, etc), or because it's not happening frequently enough that ithas a significant impact (most general desktop usage). That said, thereare other benefits to using subvolumes that make them attractive formany of the cases where this type of thing helps (for example, I usededicated subvolumes for any local VCS repositories I have, both becauseit isolates them from global contention on locks, and it lets me nukethem much quicker than rm -rf would).

As far as how much your buffering for write-back, that should depend
entirely on how fast your RAM is relative to your storage device.  The
smaller the gap between your storage and your RAM in terms of speed, the
more you should be buffering (up to a point).  FWIW, I find that with
DDR3-1600 RAM and a good (~540MB/s sequential write) SATA3 SSD, about
160-320MB gets a near ideal balance of performance, throughput, and
fragmentation, but of course YMMV.


I don't think I share your logic on this. I usually consider the write
load random and I don't like my softwares possibly stalling while
there is plenty of RAM laying around to be used as a buffer until some
other tasks might stop trashing the disks (e.g. "bigger is always
better").

Like I said, things may be different for you, but I find in general thatunless I'm 100% disk-bound, I actually have fewer latency issues when Ibuffer less (up to a point, anything less than about 64MB on my hardwaremakes latency worse). Stalls happen more frequently, but eachindividual stall has much less impact on overall performance because thetime is amortized across the whole operation. Throughput suffers a bit,but once you get past a certain point, increasing the buffering willactually hurt throughput because of how long things stall for. Lessbuffering also means you're less likely to trash read side of thepage-cache because you're write cache will fluctuate in size less.

Out of curiosity, just on this part, have you tried using cgroups to keep
the memory usage isolated better?


No, I didn't even know cgroups can control the pagecache based on the
process which generates the cache-able IO.

I'm pretty sure they can cap the write-back buffering usage, but thetunable is kernel memory usage, and some old kernels didn't work with it(I forget when it actually started working correctly).

To be honest, I don't think it's worth the effort for me (I would need
to learn how to use cgroups, I have zero experience with that).

FWIW, it's probably worth learning to use cgroups, they're a great toolfor isolating tasks from each other, and the memory controller is reallythe only one that's not all that intuitive.

Also, if you can get ffmpeg to spit out the stream on stdout, you could pipe
to dd and have that use Direct-IO.  The dd command should be something along
the lines of:
dd of=<filename> oflag=direct iflag=fullblock bs=<arbitrary large multiple
of node-size>
The oflag will force dd to open the output file with O_DIRECT, the iflag
will force it to collect full blocks of data before writing them (the size
is set by bs=, I'd recommend using a power of 2 that's a multiple of your
node-size, larger numbers will increase latency but reduce fragmentation and
improve throughput).  This may still use a significant amount of RAM (the
pipe is essentially an in-memory buffer), and may crowd out other
applications, but I have no idea how much it may or may not help.


This I can try (when I have no better things to play with). Thank you.

Glad I could help.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] recursive defrag cleanup

Reply via email to