Re: btrfs fi defrag does not defrag files >256kB?

2016-07-28 Thread Duncan
Nicholas D Steeves posted on Thu, 28 Jul 2016 13:53:31 -0400 as excerpted:

> Additionally, I've read that -o autodefrag doesn't yet work well for
> large databases.  Would a supplementary targeted defrag policy be useful
> here?  For example: a general cron/systemd.trigger default of "-t 32M",
> and then another job for /var/lib/mysql/ with a policy of "-f -t 1G"? 
> Or did your findings also show that large databases did not benefit from
> larger target extent defrags?

That the autodefrag mount option didn't work well with large rewrite-
pattern files like vm images and databases was the previous advice, yes, 
but that changed at some point.  I'm not sure if autodefrag has always 
worked this way and they simply weren't sure before, or if it changed, 
but in any case, these days, it doesn't rewrite the entire file, only a 
(relatively) larger block of it than the individual 4 KiB block that 
would otherwise be rewritten.  (I'm not sure what size, perhaps the same 
256 KiB that's the kernel default for manual defrag?)

As such, it scales better than it would if the full gig-size (or 
whatever) file was being rewritten, altho there will still be some 
fragmentation.

And for the same reason, it's actually not as bad with snapshots as it 
might have been otherwise, because it only cows/de-reflinks a bit more of 
the file than would otherwise be cowed due to the write in any case, so 
it doesn't duplicate the entire file as originally feared by some, either.

Tho the only way to be sure would be to try it.


Meanwhile, it's worth noting that autodefrag works best if on from the 
beginning, so fragmentation doesn't get ahead of it.  Here, I ensure 
autodefrag is on from the first time I mount it, while the filesystem is 
still empty.  That way, fragmentation should never get out of hand, 
fragmenting free space so badly that free large extents to defrag into
/can't/ be found, as may be the case if autodefrag isn't turned on until 
later and manual defrag hasn't been done regularly either.  There have 
been a few reports of people waiting to turn it on until the filesystem 
is highly fragmented, and then having several days of low performance as 
defrag tries to catch up.  If it's consistently on from the beginning, 
that shouldn't happen.

Of course that may mean backing up and recreating the filesystem fresh in 
ordered to have autodefrag on from the beginning, if you're looking at 
trying it on existing filesystems that are likely highly fragmented.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag does not defrag files >256kB?

2016-07-28 Thread Nicholas D Steeves
On 28 July 2016 at 06:55, David Sterba  wrote:
> On Wed, Jul 27, 2016 at 01:19:01PM -0400, Nicholas D Steeves wrote:
>> > In that regard a defrag -t 32M recommendation is reasonable for a
>> > converted filesystem, tho you can certainly go larger... to 1 GiB as I
>> > said.
>>
>> I only mentioned btrfs-convert.asciidoc, because that's what led me to
>> the discrepancy between the default target extent size value, and a
>> recommended value.  I was searching for everything I could find on
>> defrag, because I had begun to suspect that it wasn't functioning as
>> expected.
>
> Historically, the 256K size is from kernel. Defrag can create tons of
> data to write and this is noticeable on the system. However the results
> of defragmentation are not satisfactory to the user so the recommended
> value is 32M. I'd rather not change the kernel default but we can
> increase the default threshold (-t) in the userspace tools.

Thank you, I just saw that commit too!  For the purposes of minimizing
the impact btrfs fi defrag in a background cron or systemd.trigger job
has on a running system, I've read that "-f" (flush data after defrag
of each file) is beneficial.  Would it be even more beneficial to
ionice -c idle the defragmentation?

>> Is there any reason why defrag without -t cannot detect and default to
>> the data chunk size, or why it does not default to 1 GiB?
>
> The 1G value wouldn't be reached on an average filesystem where the free
> space is fragmented, besides there are some smaller internal limits on
> extent sizes that may not reach the user target size.  The value 32M has
> been experimentally found and tested on various systems and it proved to
> work well. With 64M the defragmentation was less successful but as it's
> only a hint, it's not wrong to use it.

Thank you for sharing these results :-)

>> In the same
>> way that balance's default behaviour is a full balance, shouldn't
>> defrag's default behaviour defrag whole chunks?  Does it not default
>> to 1 GiB because that would increase the number of cases where defrag
>> unreflinks and duplicates files--leading to an ENOSPC?
>
> Yes, this would also happen, unless the '-f' option is given (flush data
> after defragmenting each file).

When flushing data after defragmenting each file, one might still hit
an ENOSPC, right?  But because the writes are more atomic it will be
easier to recover from?

Additionally, I've read that -o autodefrag doesn't yet work well for
large databases.  Would a supplementary targeted defrag policy be
useful here?  For example: a general cron/systemd.trigger default of
"-t 32M", and then another job for /var/lib/mysql/ with a policy of
"-f -t 1G"?  Or did your findings also show that large databases did
not benefit from larger target extent defrags?

Best regards,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag does not defrag files >256kB?

2016-07-28 Thread Duncan
David Sterba posted on Thu, 28 Jul 2016 12:55:55 +0200 as excerpted:

> On Wed, Jul 27, 2016 at 01:19:01PM -0400, Nicholas D Steeves wrote:
>> > In that regard a defrag -t 32M recommendation is reasonable for a
>> > converted filesystem, tho you can certainly go larger... to 1 GiB as
>> > I said.

And... I see that the progs v4.7-rc1 release has the 32M default.

=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag does not defrag files >256kB?

2016-07-28 Thread David Sterba
On Wed, Jul 27, 2016 at 01:19:01PM -0400, Nicholas D Steeves wrote:
> > In that regard a defrag -t 32M recommendation is reasonable for a
> > converted filesystem, tho you can certainly go larger... to 1 GiB as I
> > said.
> 
> I only mentioned btrfs-convert.asciidoc, because that's what led me to
> the discrepancy between the default target extent size value, and a
> recommended value.  I was searching for everything I could find on
> defrag, because I had begun to suspect that it wasn't functioning as
> expected.

Historically, the 256K size is from kernel. Defrag can create tons of
data to write and this is noticeable on the system. However the results
of defragmentation are not satisfactory to the user so the recommended
value is 32M. I'd rather not change the kernel default but we can
increase the default threshold (-t) in the userspace tools.

> Is there any reason why defrag without -t cannot detect and default to
> the data chunk size, or why it does not default to 1 GiB?

The 1G value wouldn't be reached on an average filesystem where the free
space is fragmented, besides there are some smaller internal limits on
extent sizes that may not reach the user target size.  The value 32M has
been experimentally found and tested on various systems and it proved to
work well. With 64M the defragmentation was less successful but as it's
only a hint, it's not wrong to use it.

> In the same
> way that balance's default behaviour is a full balance, shouldn't
> defrag's default behaviour defrag whole chunks?  Does it not default
> to 1 GiB because that would increase the number of cases where defrag
> unreflinks and duplicates files--leading to an ENOSPC?

Yes, this would also happen, unless the '-f' option is given (flush data
after defragmenting each file).

> https://github.com/kdave/btrfsmaintenance/blob/master/btrfs-defrag.sh
> uses -t 32M ; if a default target extent size of 1GiB is too radical,
> why not set it it 32M?  If SLED ships btrfsmaintenance, then defrag -t
> 32M should be well-tested, no?

It is.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag does not defrag files >256kB?

2016-07-27 Thread Duncan
Nicholas D Steeves posted on Wed, 27 Jul 2016 13:19:01 -0400 as excerpted:

> Is there any reason why defrag without -t cannot detect and default to
> the data chunk size, or why it does not default to 1 GiB?

I don't know the answer, but have wondered that myself.  256 KiB seems a 
rather small default, to me.  I'd expect something in the MiB range, at 
least, maybe the same 2 MiB that modern partitioners tend to use for 
alignment, for the same reason, that tends to be a reasonable whole 
multiple of most erase-block sizes, and if the partition is aligned, 
should prevent unnecessary read-modify-write cycles on ssd, and help with 
tiled zones on SMR drives as well.

As to the question in the subject line, AFAIK btrfs fi defrag works on 
extents, not filesize per-se, so using the default 256 KiB target, yes 
it'll defrag files larger than that, but only for extents that are 
smaller than that.  If all the extents are 256 KiB plus, defrag won't do 
anything with it without a larger target option or unless the compress 
option is also used, in which case it rewrites everything it is pointed 
at, in ordered to recompress it.

Talking about compression, it's worth mentioning that filefrag doesn't 
understand btrfs compression either, and will count each 128 KiB 
(uncompressed size) compression block as a separate extent.  To get the 
true picture using filefrag, you need to use verbose and either eyeball 
the results manually or feed them into a script that processes the 
numbers and combines "extents" if they are reported as immediately 
consecutive on the filesystem.

As such, filefrag, given the regular opportunistic compression, turns out 
to be a good method of determining whether a file (of over 128 KiB in 
size) is actually compressed or not, since if it is filefrag will report 
multiple 128 KiB extents, while if it's not, extent size should be much 
less regular, likely with larger extents unless the file is often 
modified and rewritten in-place.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs fi defrag does not defrag files >256kB?

2016-07-27 Thread Nicholas D Steeves
On 26 July 2016 at 21:10, Duncan <1i5t5.dun...@cox.net> wrote:
> Nicholas D Steeves posted on Tue, 26 Jul 2016 19:03:53 -0400 as excerpted:
>
>> Hi,
>>
>> I've been using btrfs fi defrag with out the "-r -t 32M" option for
>> regular maintenance.  I just learned, in
>> Documentation/btrfs-convert.asciidoc, that there is a recommendation
>> to run with "-t 32M" after a conversion from ext2/3/4.  I then
>> cross-referenced this with btrfs-filesystem(8), and found that:
>>
>> Extents bigger than value given by -t will be skipped, otherwise
>> this value is used as a target extent size, but is only advisory
>> and may not be reached if the free space is too fragmented. Use 0
>> to take the kernel default, which is 256kB but may change in the
>> future.
>>
>> I understand the default behaviour of target extent size of 256kB to
>> mean only defragment small files and metadata.  Or does this mean that
>> the default behaviour is to defragment extent tree metadata >256kB,
>> and then defragment the (larger than 256kB) data from many extents
>> into a single extent?  I was surprised to read this!
>>
>> What's really happening with this default behaviour?  Should everyone
>> be using -t with a much larger value to actually defragment their
>> databases?
>
> Something about defrag's -t option should really be in the FAQ, as it is
> known to be somewhat confusing and to come up from time to time, tho this
> is the first time I've seen it in the context of convert.
>
> In general, you are correct in that the larger the value given to -t, the
> more defragging you should ultimately get.  There's a practical upper
> limit, however, the data chunk size, which is nominally 1 GiB (tho on
> tiny btrfs it's smaller and on TB-scale it can be larger, to 8 or 10 GiB
> IIRC).  32-bit btrfs-progs defrag also had a bug at one point that would
> (IIRC) kill the parameter if it was set to 2+ GiB -- that has been fixed
> by hard-coding the 32-bit max to 1 GiB, I believe.  The bug didn't affect
> 64-bit.  In any case, 1 GiB is fine, and often the largest btrfs can do
> anyway, due as I said to that being the normal data chunk size.
>
> And btrfs defrag only deals with data.  There's no metadata defrag, tho
> balance -m (or whole filesystem) will normally consolidate the metadata
> into the fewest (nominally 256 MiB) metadata chunks possible as it
> rewrites them.

Thank you for this metadata consolidation tip!

> In that regard a defrag -t 32M recommendation is reasonable for a
> converted filesystem, tho you can certainly go larger... to 1 GiB as I
> said.
>

I only mentioned btrfs-convert.asciidoc, because that's what led me to
the discrepancy between the default target extent size value, and a
recommended value.  I was searching for everything I could find on
defrag, because I had begun to suspect that it wasn't functioning as
expected.

Is there any reason why defrag without -t cannot detect and default to
the data chunk size, or why it does not default to 1 GiB?  In the same
way that balance's default behaviour is a full balance, shouldn't
defrag's default behaviour defrag whole chunks?  Does it not default
to 1 GiB because that would increase the number of cases where defrag
unreflinks and duplicates files--leading to an ENOSPC?

https://github.com/kdave/btrfsmaintenance/blob/master/btrfs-defrag.sh
uses -t 32M ; if a default target extent size of 1GiB is too radical,
why not set it it 32M?  If SLED ships btrfsmaintenance, then defrag -t
32M should be well-tested, no?

Thank you,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html