Re: Btrfs/SSD

Chris Murphy Wed, 19 Apr 2017 11:11:24 -0700

On Mon, Apr 17, 2017 at 4:55 PM, Hans van Kranenburg
<hans.van.kranenb...@mendix.com> wrote:
> On 04/17/2017 09:22 PM, Imran Geriskovan wrote:
>> [...]
>>
>> Going over the thread following questions come to my mind:
>>
>> - What exactly does btrfs ssd option does relative to plain mode?
>
> There's quite an amount of information in the the very recent threads:
> - "About free space fragmentation, metadata write amplification and (no)ssd"
> - "BTRFS as a GlusterFS storage back-end, and what I've learned from
> using it as such."
> - "btrfs filesystem keeps allocating new chunks for no apparent reason"
> - ... and a few more
>
> I suspect there will be some "summary" mails at some point, but for now,
> I'd recommend crawling through these threads first.
>
> And now for your instant satisfaction, a short visual guide to the
> difference, which shows actual btrfs behaviour instead of our guesswork
> around it (taken from the second mail thread just mentioned):
>
> -o ssd:
>
> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-01-19-noautodefrag-ichiban.mp4
>
> -o nossd:
>
> https://syrinx.knorrie.org/~knorrie/btrfs/keep/2017-04-08-ichiban-walk-nossd.mp4


I'm uncertain from these if the option affects both metadata and data
writes, or just data. The latter makes some sense, if you think a
given data write event contains related files and thus increases the
chance when those files are deleted of having a mostly freed up erase
block. That way wear leveling is doing less work. For metadata writes
it makes less sense to me, and is inconsistent with what I've seen
from metadata chunk allocation. Pretty much anything means dozens or
more 16K nodes are being COWd. e.g. a 2KiB write to systemd journal,
even preallocated, means adding an EXTENT DATA item, one of maybe 200
per node, which means that whole node must be COWd, and whatever its
parent is must be written (ROOT ITEM I think) and then tree root, and
then super block. I see generally 30 16K nodes modified in about 4
minutes with average logging. Even if it's 1 change per 4 minutes, and
all 30 nodes get written to one 2MB block, and then that block isn't
ever written to again, the metadata chunk would be growing and I don't
see that. For weeks or months I see a 512MB metadata chunk and it
doesn't ever get bigger than this.

Anyway, I think ssd mount option still sounds plausibly useful. What
I'm skeptical of on SSD is defragmenting without compression, and also
nocow.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs/SSD

Reply via email to