Re: defragmenting best practice?

2017-12-10 Thread Timofey Titovets
2017-12-11 8:18 GMT+03:00 Dave :
> On Tue, Oct 31, 2017 someone wrote:
>>
>>
>> > 2. Put $HOME/.cache on a separate BTRFS subvolume that is mounted
>> > nocow -- it will NOT be snapshotted
>
> I did exactly this. It servers the purpose of avoiding snapshots.
> However, today I saw the following at
> https://wiki.archlinux.org/index.php/Btrfs
>
> Note: From Btrfs Wiki Mount options: within a single file system, it
> is not possible to mount some subvolumes with nodatacow and others
> with datacow. The mount option of the first mounted subvolume applies
> to any other subvolumes.
>
> That makes me think my nodatacow mount option on $HOME/.cache is not
> effective. True?
>
> (My subjective performance results have not been as good as hoped for
> with the tweaks I have tried so far.)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

True, for magic dirs, that you may want mark as no cow, you need to
use chattr, like:
rm -rf ~/.cache
mkdir ~/.cache
chattr +C ~/.cache

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-12-10 Thread Dave
On Tue, Oct 31, 2017 someone wrote:
>
>
> > 2. Put $HOME/.cache on a separate BTRFS subvolume that is mounted
> > nocow -- it will NOT be snapshotted

I did exactly this. It servers the purpose of avoiding snapshots.
However, today I saw the following at
https://wiki.archlinux.org/index.php/Btrfs

Note: From Btrfs Wiki Mount options: within a single file system, it
is not possible to mount some subvolumes with nodatacow and others
with datacow. The mount option of the first mounted subvolume applies
to any other subvolumes.

That makes me think my nodatacow mount option on $HOME/.cache is not
effective. True?

(My subjective performance results have not been as good as hoped for
with the tweaks I have tried so far.)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-03 Thread Austin S. Hemmelgarn

On 2017-11-03 03:26, Kai Krakow wrote:

Am Thu, 2 Nov 2017 22:47:31 -0400
schrieb Dave :


On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow 
wrote:



You may want to try btrfs autodefrag mount option and see if it
improves things (tho, the effect may take days or weeks to apply if
you didn't enable it right from the creation of the filesystem).

Also, autodefrag will probably unshare reflinks on your snapshots.
You may be able to use bees[1] to work against this effect. Its
interaction with autodefrag is not well tested but it works fine
for me. Also, bees is able to reduce some of the fragmentation
during deduplication because it will rewrite extents back into
bigger chunks (but only for duplicated data).

[1]: https://github.com/Zygo/bees


I will look into bees. And yes, I plan to try autodefrag. (I already
have it enabled now.) However, I need to understand something about
how btrfs send-receive works in regard to reflinks and fragmentation.

Say I have 2 snapshots on my live volume. The earlier one of them has
already been sent to another block device by btrfs send-receive (full
backup). Now defrag runs on the live volume and breaks some percentage
of the reflinks. At this point I do an incremental btrfs send-receive
using "-p" (or "-c") with the diff going to the same other block
device where the prior snapshot was already sent.

Will reflinks be "made whole" (restored) on the receiving block
device? Or is the state of the source volume replicated so closely
that reflink status is the same on the target?

Also, is fragmentation reduced on the receiving block device?

My expectation is that fragmentation would be reduced and duplication
would be reduced too. In other words, does send-receive result in
defragmentation and deduplication too?


As far as I understand, btrfs send/receive doesn't create an exact
mirror. It just replays the block operations between generation
numbers. That is: If it finds new blocks referenced between
generations, it will write a _new_ block to the destination.
That is mostly correct, except it's not a block level copy.  To put it 
in a heavily simplified manner, send/receive will recreate the subvolume 
using nothing more than basic file manipulation syscalls (write(), 
chown(), chmod(), etc), the clone ioctl, and some extra logic to figure 
out the correct location to clone from.  IOW, it's functionally 
equivalent to using rsync to copy the data, and then deduplicating, 
albeit a bit smarter about when to deduplicate (and more efficient in 
that respect).


So, no, it won't reduce fragmentation or duplication. It just keeps
reflinks intact as long as such extents weren't touched within the
generation range. Otherwise they are rewritten as new extents.
A received subvolume will almost always be less fragmented than the 
source, since everything is received serially, and each file is written 
out one at a time.


Autodefrag and deduplication processes will as such probably increase
duplication at the destination. A developer may have a better clue, tho.
In theory, yes, but in practice, not so much.  Autodefrag generally 
operates on very small blocks of data (64k IIRC), and I'm pretty sure it 
has some heuristic that only triggers it on small random writes, so 
depending on the workload, it may not be triggering much (for example, 
it often won't trigger on cache directories, since those almost never 
have files rewritten in place).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-03 Thread Kai Krakow
Am Thu, 2 Nov 2017 22:47:31 -0400
schrieb Dave :

> On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow 
> wrote:
> 
> >
> > You may want to try btrfs autodefrag mount option and see if it
> > improves things (tho, the effect may take days or weeks to apply if
> > you didn't enable it right from the creation of the filesystem).
> >
> > Also, autodefrag will probably unshare reflinks on your snapshots.
> > You may be able to use bees[1] to work against this effect. Its
> > interaction with autodefrag is not well tested but it works fine
> > for me. Also, bees is able to reduce some of the fragmentation
> > during deduplication because it will rewrite extents back into
> > bigger chunks (but only for duplicated data).
> >
> > [1]: https://github.com/Zygo/bees  
> 
> I will look into bees. And yes, I plan to try autodefrag. (I already
> have it enabled now.) However, I need to understand something about
> how btrfs send-receive works in regard to reflinks and fragmentation.
> 
> Say I have 2 snapshots on my live volume. The earlier one of them has
> already been sent to another block device by btrfs send-receive (full
> backup). Now defrag runs on the live volume and breaks some percentage
> of the reflinks. At this point I do an incremental btrfs send-receive
> using "-p" (or "-c") with the diff going to the same other block
> device where the prior snapshot was already sent.
> 
> Will reflinks be "made whole" (restored) on the receiving block
> device? Or is the state of the source volume replicated so closely
> that reflink status is the same on the target?
> 
> Also, is fragmentation reduced on the receiving block device?
> 
> My expectation is that fragmentation would be reduced and duplication
> would be reduced too. In other words, does send-receive result in
> defragmentation and deduplication too?

As far as I understand, btrfs send/receive doesn't create an exact
mirror. It just replays the block operations between generation
numbers. That is: If it finds new blocks referenced between
generations, it will write a _new_ block to the destination.

So, no, it won't reduce fragmentation or duplication. It just keeps
reflinks intact as long as such extents weren't touched within the
generation range. Otherwise they are rewritten as new extents.

Autodefrag and deduplication processes will as such probably increase
duplication at the destination. A developer may have a better clue, tho.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-03 Thread Kai Krakow
Am Fri, 3 Nov 2017 08:58:22 +0300
schrieb Marat Khalili :

> On 02/11/17 04:39, Dave wrote:
> > I'm going to make this change now. What would be a good way to
> > implement this so that the change applies to the $HOME/.cache of
> > each user?  
> I'd make each user's .cache a symlink (should work but if it won't
> then bind mount) to a per-user directory in some separately mounted
> volume with necessary options.

On a systemd system, each user already has a private tmpfs location
at /run/user/$(id -u).

You could add to the central login script:

# CACHE_DIR="/run/user/$(id -u)/cache"
# mkdir -p $CACHE_DIR && ln -snf $CACHE_DIR $HOME/.cache

You should not run this as root (because of mkdir -p).

You could wrap it into an if statement:

# if [ "$(whoami)" -ne "root" ]; then
#   ...
# fi


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-03 Thread Kai Krakow
Am Thu, 2 Nov 2017 22:59:36 -0400
schrieb Dave :

> On Thu, Nov 2, 2017 at 7:07 AM, Austin S. Hemmelgarn
>  wrote:
> > On 2017-11-01 21:39, Dave wrote:  
> >> I'm going to make this change now. What would be a good way to
> >> implement this so that the change applies to the $HOME/.cache of
> >> each user?
> >>
> >> The simple way would be to create a new subvolume for each existing
> >> user and mount it at $HOME/.cache in /etc/fstab, hard coding that
> >> mount location for each user. I don't mind doing that as there are
> >> only 4 users to consider. One minor concern is that it adds an
> >> unexpected step to the process of creating a new user. Is there a
> >> better way?
> >>  
> > The easiest option is to just make sure nobody is logged in and run
> > the following shell script fragment:
> >
> > for dir in /home/* ; do
> > rm -rf $dir/.cache
> > btrfs subvolume create $dir/.cache
> > done
> >
> > And then add something to the user creation scripts to create that
> > subvolume.  This approach won't pollute /etc/fstab, will still
> > exclude the directory from snapshots, and doesn't require any
> > hugely creative work to integrate with user creation and deletion.
> >
> > In general, the contents of the .cache directory are just that,
> > cached data. Provided nobody is actively accessing it, it's
> > perfectly safe to just nuke the entire directory...  
> 
> I like this suggestion. Thank you. I had intended to mount the .cache
> subvolumes with the NODATACOW option. However, with this approach, I
> won't be explicitly mounting the .cache subvolumes. Is it possible to
> use "chattr +C $dir/.cache" in that loop even though it is a
> subvolume? And, is setting the .cache directory to NODATACOW the right
> choice given this scenario? From earlier comments, I believe it is,
> but I want to be sure I understood correctly.

It is important to apply "chattr +C" to the _empty_ directory, because
even if used recursively, it won't apply to already existing, non-empty
files. But the +C attribute is inherited by newly created files and
directory: So simply follow the "chattr +C on empty directory" and
you're all set.

BTW: You cannot mount subvolumes from an already mounted btrfs device
with different mount options. That is currently not implemented (except
for maybe a very few options). So the fstab approach probably wouldn't
have helped you (depending on your partition layout).

You can simply just create subvolumes within the location needed and
they are implicitly mounted. Then change the particular subvolume cow
behavior with chattr.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Marat Khalili

On 02/11/17 04:39, Dave wrote:

I'm going to make this change now. What would be a good way to
implement this so that the change applies to the $HOME/.cache of each
user?
I'd make each user's .cache a symlink (should work but if it won't then 
bind mount) to a per-user directory in some separately mounted volume 
with necessary options.


--

With Best Regards,
Marat Khalili
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Dave
On Thu, Nov 2, 2017 at 7:07 AM, Austin S. Hemmelgarn
 wrote:
> On 2017-11-01 21:39, Dave wrote:
>> I'm going to make this change now. What would be a good way to
>> implement this so that the change applies to the $HOME/.cache of each
>> user?
>>
>> The simple way would be to create a new subvolume for each existing
>> user and mount it at $HOME/.cache in /etc/fstab, hard coding that
>> mount location for each user. I don't mind doing that as there are
>> only 4 users to consider. One minor concern is that it adds an
>> unexpected step to the process of creating a new user. Is there a
>> better way?
>>
> The easiest option is to just make sure nobody is logged in and run the
> following shell script fragment:
>
> for dir in /home/* ; do
> rm -rf $dir/.cache
> btrfs subvolume create $dir/.cache
> done
>
> And then add something to the user creation scripts to create that
> subvolume.  This approach won't pollute /etc/fstab, will still exclude the
> directory from snapshots, and doesn't require any hugely creative work to
> integrate with user creation and deletion.
>
> In general, the contents of the .cache directory are just that, cached data.
> Provided nobody is actively accessing it, it's perfectly safe to just nuke
> the entire directory...

I like this suggestion. Thank you. I had intended to mount the .cache
subvolumes with the NODATACOW option. However, with this approach, I
won't be explicitly mounting the .cache subvolumes. Is it possible to
use "chattr +C $dir/.cache" in that loop even though it is a
subvolume? And, is setting the .cache directory to NODATACOW the right
choice given this scenario? From earlier comments, I believe it is,
but I want to be sure I understood correctly.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Dave
On Thu, Nov 2, 2017 at 5:16 PM, Kai Krakow  wrote:

>
> You may want to try btrfs autodefrag mount option and see if it
> improves things (tho, the effect may take days or weeks to apply if you
> didn't enable it right from the creation of the filesystem).
>
> Also, autodefrag will probably unshare reflinks on your snapshots. You
> may be able to use bees[1] to work against this effect. Its interaction
> with autodefrag is not well tested but it works fine for me. Also, bees
> is able to reduce some of the fragmentation during deduplication
> because it will rewrite extents back into bigger chunks (but only for
> duplicated data).
>
> [1]: https://github.com/Zygo/bees

I will look into bees. And yes, I plan to try autodefrag. (I already
have it enabled now.) However, I need to understand something about
how btrfs send-receive works in regard to reflinks and fragmentation.

Say I have 2 snapshots on my live volume. The earlier one of them has
already been sent to another block device by btrfs send-receive (full
backup). Now defrag runs on the live volume and breaks some percentage
of the reflinks. At this point I do an incremental btrfs send-receive
using "-p" (or "-c") with the diff going to the same other block
device where the prior snapshot was already sent.

Will reflinks be "made whole" (restored) on the receiving block
device? Or is the state of the source volume replicated so closely
that reflink status is the same on the target?

Also, is fragmentation reduced on the receiving block device?

My expectation is that fragmentation would be reduced and duplication
would be reduced too. In other words, does send-receive result in
defragmentation and deduplication too?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Kai Krakow
Am Tue, 31 Oct 2017 20:37:27 -0400
schrieb Dave :

> > Also, you can declare the '.firefox/default/' directory to be
> > NOCOW, and that "just works".  
> 
> The cache is in a separate location from the profiles, as I'm sure you
> know.  The reason I suggested a separate BTRFS subvolume for
> $HOME/.cache is that this will prevent the cache files for all
> applications (for that user) from being included in the snapshots. We
> take frequent snapshots and (afaik) it makes no sense to include cache
> in backups or snapshots. The easiest way I know to exclude cache from
> BTRFS snapshots is to put it on a separate subvolume. I assumed this
> would make several things related to snapshots more efficient too.
> 
> As far as the Firefox profile being declared NOCOW, as soon as we take
> the first snapshot, I understand that it will become COW again. So I
> don't see any point in making it NOCOW.

Ah well, not really. The files and directories will still be nocow -
however, the next write to any such file after a snapshot will make a
cow operation. So you still see the fragmentation effect but to a much
lesser extent. But the files itself will remain in nocow format.

You may want to try btrfs autodefrag mount option and see if it
improves things (tho, the effect may take days or weeks to apply if you
didn't enable it right from the creation of the filesystem).

Also, autodefrag will probably unshare reflinks on your snapshots. You
may be able to use bees[1] to work against this effect. Its interaction
with autodefrag is not well tested but it works fine for me. Also, bees
is able to reduce some of the fragmentation during deduplication
because it will rewrite extents back into bigger chunks (but only for
duplicated data).

[1]: https://github.com/Zygo/bees


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Austin S. Hemmelgarn

On 2017-11-02 14:09, Dave wrote:

On Thu, Nov 2, 2017 at 7:17 AM, Austin S. Hemmelgarn
 wrote:


And the worst performing machine was the one with the most RAM and a
fast NVMe drive and top of the line hardware.


Somewhat nonsensically, I'll bet that NVMe is a contributing factor in this
particular case.  NVMe has particularly bad performance with the old block
IO schedulers (though it is NVMe, so it should still be better than a SATA
or SAS SSD), and the new blk-mq framework just got scheduling support in
4.12, and only got reasonably good scheduling options in 4.13.  I doubt it's
the entirety of the issue, but it's probably part of it.


Thanks for that news. Based on that, I assume the advice here (to use
noop for NVMe) is now outdated?
https://stackoverflow.com/a/27664577/463994

Is the solution as simple as running a kernel >= 4.13? Or do I need to
specify which scheduler to use?

I just checked one computer:

uname -a
Linux morpheus 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 09:58:47 CEST
2017 x86_64 GNU/Linux

$ sudo find /sys -name scheduler -exec grep . {} +
/sys/devices/pci:00/:00:1d.0/:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler:[none]
mq-deadline kyber bfq

 From this article, it sounds like (maybe) I should use kyber. I see
kyber listed in the output above, so I assume that means it is
available. I also think [none] is the current scheduler being used, as
it is in brackets.

I checked this:
https://www.kernel.org/doc/Documentation/block/switching-sched.txt
Based on that, I assume I would do this at runtime:

echo kyber > 
/sys/devices/pci:00/:00:1d.0/:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler

I assume this is equivalent:

echo kyber > /sys/block/nvme0n1/queue/scheduler

How would I set it permanently at boot time?
It's kind of complicated overall.  As of 4.14, there are four options 
for the blk-mq path.  The 'none' scheduler is the old behavior prior to 
4.13, and does no scheduling.  'mq-deadline' is the default AFAIK, and 
behaves like the old deadline I/O scheduler (not sure if it supports I/O 
priorities).  'bfq' is a blk-mq port of a scheduler originally designed 
to replace the default CFQ scheduler from the old block layer.  'kyber' 
I know essentially nothing about, I never saw the patches on LKML (not 
sure if I just missed them, or they only went to topic lists), and I've 
not tried it myself.


I have no personal experience with anything but the none scheduler on 
NVMe devices, so i can't really comment much more than saying that I've 
seen a huge difference on the SATA SSD's I use first when the deadline 
scheduler became the default and then again when I switched to BFQ on my 
systems, and the fact that I've seen reports of using the deadline 
scheduler improving things on NVMe.


As far as setting it at boot time, there's currently no kernel 
configuration option to set a default like there is for the old block 
interface, and I don't know of any kernel command line option to set it 
either, but a udev rule setting it as a attribute works reliably.  I'm 
using something like the following to set all my SATA devices to use BFQ 
by default:


KERNEL=="sd?", SUBSYSTEM=="block", ACTION=="add", 
ATTR{queue/scheduler}="bfq"



While Firefox and Linux in general have their performance "issues",
that's not relevant here. I'm comparing the same distros, same Firefox
versions, same Firefox add-ons, etc. I eventually tested many hardware
configurations: different CPU's, motherboards, GPU's, SSD's, RAM, etc.
The only remaining difference I can find is that the computer with
acceptable performance uses LVM + EXT4 while all the others use BTRFS.

With all the great feedback I have gotten here, I'm now ready to
retest this after implementing all the BTRFS-related suggestions I
have received. Maybe that will solve the problem or maybe this mystery
will continue...


Hmm, if you're only using SSD's, that may partially explain things.  I don't
remember if it was mentioned earlier in this thread, but you might try
adding 'nossd' to the mount options.  The 'ssd' mount option (which gets set
automatically if the device reports as non-rotational) impacts how the block
allocator works, and that can have a pretty insane impact on performance.


I will test the "nossd" mount option.
If you're not seeing any difference on the newest kernels (I hadn't 
realized you were running 4.13 on anything), you might not see any 
impact from doing this.  I'd also suggest running a full balance prior 
to testing _after_ switching the option, part of the performance impact 
is due to the resultant on-disk layout.



Additionally, independently from that, try toggling the 'discard' mount
option.  If you have it enabled, disable it, if you have it disabled, enable
it.  Inline discards can be very expensive on some hardware, especially
older SSD's, and discards happen pretty frequently in a COW filesystem.


I have been following this advice, so I have never enabled discard 

Re: defragmenting best practice?

2017-11-02 Thread Dave
On Thu, Nov 2, 2017 at 7:17 AM, Austin S. Hemmelgarn
 wrote:

>> And the worst performing machine was the one with the most RAM and a
>> fast NVMe drive and top of the line hardware.
>
> Somewhat nonsensically, I'll bet that NVMe is a contributing factor in this
> particular case.  NVMe has particularly bad performance with the old block
> IO schedulers (though it is NVMe, so it should still be better than a SATA
> or SAS SSD), and the new blk-mq framework just got scheduling support in
> 4.12, and only got reasonably good scheduling options in 4.13.  I doubt it's
> the entirety of the issue, but it's probably part of it.

Thanks for that news. Based on that, I assume the advice here (to use
noop for NVMe) is now outdated?
https://stackoverflow.com/a/27664577/463994

Is the solution as simple as running a kernel >= 4.13? Or do I need to
specify which scheduler to use?

I just checked one computer:

uname -a
Linux morpheus 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 09:58:47 CEST
2017 x86_64 GNU/Linux

$ sudo find /sys -name scheduler -exec grep . {} +
/sys/devices/pci:00/:00:1d.0/:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler:[none]
mq-deadline kyber bfq

>From this article, it sounds like (maybe) I should use kyber. I see
kyber listed in the output above, so I assume that means it is
available. I also think [none] is the current scheduler being used, as
it is in brackets.

I checked this:
https://www.kernel.org/doc/Documentation/block/switching-sched.txt
Based on that, I assume I would do this at runtime:

echo kyber > 
/sys/devices/pci:00/:00:1d.0/:08:00.0/nvme/nvme0/nvme0n1/queue/scheduler

I assume this is equivalent:

echo kyber > /sys/block/nvme0n1/queue/scheduler

How would I set it permanently at boot time?

>> While Firefox and Linux in general have their performance "issues",
>> that's not relevant here. I'm comparing the same distros, same Firefox
>> versions, same Firefox add-ons, etc. I eventually tested many hardware
>> configurations: different CPU's, motherboards, GPU's, SSD's, RAM, etc.
>> The only remaining difference I can find is that the computer with
>> acceptable performance uses LVM + EXT4 while all the others use BTRFS.
>>
>> With all the great feedback I have gotten here, I'm now ready to
>> retest this after implementing all the BTRFS-related suggestions I
>> have received. Maybe that will solve the problem or maybe this mystery
>> will continue...
>
> Hmm, if you're only using SSD's, that may partially explain things.  I don't
> remember if it was mentioned earlier in this thread, but you might try
> adding 'nossd' to the mount options.  The 'ssd' mount option (which gets set
> automatically if the device reports as non-rotational) impacts how the block
> allocator works, and that can have a pretty insane impact on performance.

I will test the "nossd" mount option.

> Additionally, independently from that, try toggling the 'discard' mount
> option.  If you have it enabled, disable it, if you have it disabled, enable
> it.  Inline discards can be very expensive on some hardware, especially
> older SSD's, and discards happen pretty frequently in a COW filesystem.

I have been following this advice, so I have never enabled discard for
an NVMe drive. Do you think it is worth testing?

Solid State Drives/NVMe - ArchWiki
https://wiki.archlinux.org/index.php/Solid_State_Drives/NVMe

Discards:
Note: Although continuous TRIM is an option (albeit not recommended)
for SSDs, NVMe devices should not be issued discards.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-02 Thread Austin S. Hemmelgarn

On 2017-11-01 20:09, Dave wrote:

On Wed, Nov 1, 2017 at 1:48 PM, Peter Grandi  wrote:

When defragmenting individual files on a BTRFS filesystem with
COW, I assume reflinks between that file and all snapshots are
broken. So if there are 30 snapshots on that volume, that one
file will suddenly take up 30 times more space... [ ... ]


Defragmentation works by effectively making a copy of the file
contents (simplistic view), so the end result is one copy with
29 reflinked contents, and one copy with defragmented contents.


The clarification is much appreciated.


Can you also give an example of using find, as you suggested
above? [ ... ]


Well, one way is to use 'find' as a filtering replacement for
'defrag' option '-r', as in for example:

   find "$HOME" -xdev '(' -name '*.sqlite' -o -name '*.mk4' ')' \
 -type f  -print0 | xargs -0 btrfs fi defrag

Another one is to find the most fragmented files first or all
files of at least 1M with with at least say 100 fragments as in:

   find "$HOME" -xdev -type f -size +1M -print0 | xargs -0 filefrag \
 | perl -n -e 'print "$1\0" if (m/(.*): ([0-9]+) extents/ && $1 > 100)' \
 | xargs -0 btrfs fi defrag

But there are many 'find' web pages and that is not quite a
Btrfs related topic.


Your examples were perfect. I have experience using find in similar
ways. I can take it from there. :-)


Background: I'm not sure why our Firefox performance is so terrible


As I always say, "performance" is not the same as "speed", and
probably your Firefox "performance" is sort of OKish even if the
"speed" is terrile, and neither is likely related to the profile
or the cache being on Btrfs.


Here's what happened. Two years ago I installed Kubuntu (with Firefox)
on two desktop computers. One machine performed fine. Like you said,
"sort of OKish" and that's what we expect with the current state of
Linux. The other machine was substantially worse. We ran side-by-side
real-world tests on these two machines for months.

Initially I did a lot of testing, troubleshooting and reconfiguration
trying to get the second machine to perform as well as the first. I
never had success. At first I thought it was related to the GPU (or
driver). Then I thought it was because the first machine used the z170
chipset and the second was X99 based. But that wasn't it. I have never
solved the problem and I have been coming back to it periodically
these last two years. In that time I have tried different distros from
opensuse to Arch, and a lot of different hardware.

Furthermore, my new machines have the same performance problem. The
most interesting example is a high end machine with 256 GB of RAM. It
showed substantially worse desktop application performance than any
other computer here. All are running the exact same version of Firefox
with the exact same add-ons. (The installations are carbon copies of
each other.)

What originally caught my attention was earlier information in this thread:

Am Wed, 20 Sep 2017 07:46:52 -0400
schrieb "Austin S. Hemmelgarn" :


  Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)

Almost certainly.  Most modern web browsers are brain-dead and insist
on using SQLite databases (or traditional DB files) for everything,
including the cache, and the usage for the cache in particular kills
performance when fragmentation is an issue.


It turns out the the first machine (which performed well enough) was
the last one which was installed using LVM + EXT4. The second machine
(the one with the original performance problem) and all subsequent
machines have used BTRFS.

And the worst performing machine was the one with the most RAM and a
fast NVMe drive and top of the line hardware.
Somewhat nonsensically, I'll bet that NVMe is a contributing factor in 
this particular case.  NVMe has particularly bad performance with the 
old block IO schedulers (though it is NVMe, so it should still be better 
than a SATA or SAS SSD), and the new blk-mq framework just got 
scheduling support in 4.12, and only got reasonably good scheduling 
options in 4.13.  I doubt it's the entirety of the issue, but it's 
probably part of it.


While Firefox and Linux in general have their performance "issues",
that's not relevant here. I'm comparing the same distros, same Firefox
versions, same Firefox add-ons, etc. I eventually tested many hardware
configurations: different CPU's, 

Re: defragmenting best practice?

2017-11-02 Thread Austin S. Hemmelgarn

On 2017-11-01 21:39, Dave wrote:

On Wed, Nov 1, 2017 at 8:21 AM, Austin S. Hemmelgarn
 wrote:


The cache is in a separate location from the profiles, as I'm sure you
know.  The reason I suggested a separate BTRFS subvolume for
$HOME/.cache is that this will prevent the cache files for all
applications (for that user) from being included in the snapshots. We
take frequent snapshots and (afaik) it makes no sense to include cache
in backups or snapshots. The easiest way I know to exclude cache from
BTRFS snapshots is to put it on a separate subvolume. I assumed this
would make several things related to snapshots more efficient too.


Yes, it will, and it will save space long-term as well since $HOME/.cache is
usually the most frequently modified location in $HOME. In addition to not
including this in the snapshots, it may also improve performance.  Each
subvolume is it's own tree, with it's own locking, which means that you can
generally improve parallel access performance by splitting the workload
across multiple subvolumes.  Whether it will actually provide any real
benefit in that respect is heavily dependent on the exact workload however,
but it won't hurt performance.


I'm going to make this change now. What would be a good way to
implement this so that the change applies to the $HOME/.cache of each
user?

The simple way would be to create a new subvolume for each existing
user and mount it at $HOME/.cache in /etc/fstab, hard coding that
mount location for each user. I don't mind doing that as there are
only 4 users to consider. One minor concern is that it adds an
unexpected step to the process of creating a new user. Is there a
better way?

The easiest option is to just make sure nobody is logged in and run the 
following shell script fragment:


for dir in /home/* ; do
rm -rf $dir/.cache
btrfs subvolume create $dir/.cache
done

And then add something to the user creation scripts to create that 
subvolume.  This approach won't pollute /etc/fstab, will still exclude 
the directory from snapshots, and doesn't require any hugely creative 
work to integrate with user creation and deletion.


In general, the contents of the .cache directory are just that, cached 
data.  Provided nobody is actively accessing it, it's perfectly safe to 
just nuke the entire directory (I actually do this on a semi-regular 
basis on my systems just because it helps save space).  In fact, based 
on the FreeDesktop.org standards, if this does break anything, it's a 
bug in the software in question.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Dave
On Wed, Nov 1, 2017 at 8:21 AM, Austin S. Hemmelgarn
 wrote:

>> The cache is in a separate location from the profiles, as I'm sure you
>> know.  The reason I suggested a separate BTRFS subvolume for
>> $HOME/.cache is that this will prevent the cache files for all
>> applications (for that user) from being included in the snapshots. We
>> take frequent snapshots and (afaik) it makes no sense to include cache
>> in backups or snapshots. The easiest way I know to exclude cache from
>> BTRFS snapshots is to put it on a separate subvolume. I assumed this
>> would make several things related to snapshots more efficient too.
>
> Yes, it will, and it will save space long-term as well since $HOME/.cache is
> usually the most frequently modified location in $HOME. In addition to not
> including this in the snapshots, it may also improve performance.  Each
> subvolume is it's own tree, with it's own locking, which means that you can
> generally improve parallel access performance by splitting the workload
> across multiple subvolumes.  Whether it will actually provide any real
> benefit in that respect is heavily dependent on the exact workload however,
> but it won't hurt performance.

I'm going to make this change now. What would be a good way to
implement this so that the change applies to the $HOME/.cache of each
user?

The simple way would be to create a new subvolume for each existing
user and mount it at $HOME/.cache in /etc/fstab, hard coding that
mount location for each user. I don't mind doing that as there are
only 4 users to consider. One minor concern is that it adds an
unexpected step to the process of creating a new user. Is there a
better way?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Peter Grandi
> Another one is to find the most fragmented files first or all
> files of at least 1M with with at least say 100 fragments as in:

> find "$HOME" -xdev -type f -size +1M -print0 | xargs -0 filefrag \
> | perl -n -e 'print "$1\0" if (m/(.*): ([0-9]+) extents/ && $1 > 100)' \
> | xargs -0 btrfs fi defrag

That should have "&& $2 > 100".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Dave
On Wed, Nov 1, 2017 at 1:48 PM, Peter Grandi  wrote:
>> When defragmenting individual files on a BTRFS filesystem with
>> COW, I assume reflinks between that file and all snapshots are
>> broken. So if there are 30 snapshots on that volume, that one
>> file will suddenly take up 30 times more space... [ ... ]
>
> Defragmentation works by effectively making a copy of the file
> contents (simplistic view), so the end result is one copy with
> 29 reflinked contents, and one copy with defragmented contents.

The clarification is much appreciated.

>> Can you also give an example of using find, as you suggested
>> above? [ ... ]
>
> Well, one way is to use 'find' as a filtering replacement for
> 'defrag' option '-r', as in for example:
>
>   find "$HOME" -xdev '(' -name '*.sqlite' -o -name '*.mk4' ')' \
> -type f  -print0 | xargs -0 btrfs fi defrag
>
> Another one is to find the most fragmented files first or all
> files of at least 1M with with at least say 100 fragments as in:
>
>   find "$HOME" -xdev -type f -size +1M -print0 | xargs -0 filefrag \
> | perl -n -e 'print "$1\0" if (m/(.*): ([0-9]+) extents/ && $1 > 100)' \
> | xargs -0 btrfs fi defrag
>
> But there are many 'find' web pages and that is not quite a
> Btrfs related topic.

Your examples were perfect. I have experience using find in similar
ways. I can take it from there. :-)

>> Background: I'm not sure why our Firefox performance is so terrible
>
> As I always say, "performance" is not the same as "speed", and
> probably your Firefox "performance" is sort of OKish even if the
> "speed" is terrile, and neither is likely related to the profile
> or the cache being on Btrfs.

Here's what happened. Two years ago I installed Kubuntu (with Firefox)
on two desktop computers. One machine performed fine. Like you said,
"sort of OKish" and that's what we expect with the current state of
Linux. The other machine was substantially worse. We ran side-by-side
real-world tests on these two machines for months.

Initially I did a lot of testing, troubleshooting and reconfiguration
trying to get the second machine to perform as well as the first. I
never had success. At first I thought it was related to the GPU (or
driver). Then I thought it was because the first machine used the z170
chipset and the second was X99 based. But that wasn't it. I have never
solved the problem and I have been coming back to it periodically
these last two years. In that time I have tried different distros from
opensuse to Arch, and a lot of different hardware.

Furthermore, my new machines have the same performance problem. The
most interesting example is a high end machine with 256 GB of RAM. It
showed substantially worse desktop application performance than any
other computer here. All are running the exact same version of Firefox
with the exact same add-ons. (The installations are carbon copies of
each other.)

What originally caught my attention was earlier information in this thread:

Am Wed, 20 Sep 2017 07:46:52 -0400
schrieb "Austin S. Hemmelgarn" :

> >  Fragmentation: Files with a lot of random writes can become
> > heavily fragmented (1+ extents) causing excessive multi-second
> > spikes of CPU load on systems with an SSD or large amount a RAM. On
> > desktops this primarily affects application databases (including
> > Firefox). Workarounds include manually defragmenting your home
> > directory using btrfs fi defragment. Auto-defragment (mount option
> > autodefrag) should solve this problem.
> >
> > Upon reading that I am wondering if fragmentation in the Firefox
> > profile is part of my issue. That's one thing I never tested
> > previously. (BTW, this system has 256 GB of RAM and 20 cores.)
> Almost certainly.  Most modern web browsers are brain-dead and insist
> on using SQLite databases (or traditional DB files) for everything,
> including the cache, and the usage for the cache in particular kills
> performance when fragmentation is an issue.

It turns out the the first machine (which performed well enough) was
the last one which was installed using LVM + EXT4. The second machine
(the one with the original performance problem) and all subsequent
machines have used BTRFS.

And the worst performing machine was the one with the most RAM and a
fast NVMe drive and top of the line hardware.

While Firefox and Linux in general have their performance "issues",
that's not relevant here. I'm comparing the same distros, same Firefox
versions, same Firefox add-ons, etc. I eventually tested many hardware
configurations: different CPU's, motherboards, GPU's, SSD's, RAM, etc.
The only remaining difference I can find is that the computer with
acceptable performance uses LVM + EXT4 while all the others use BTRFS.

With all the great feedback I have gotten here, I'm now ready to
retest this after implementing all the BTRFS-related suggestions I
have received. Maybe that will solve the problem or maybe this mystery

Re: defragmenting best practice?

2017-11-01 Thread Dave
On Wed, Nov 1, 2017 at 9:31 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> Dave posted on Tue, 31 Oct 2017 17:47:54 -0400 as excerpted:
>
>> 6. Make sure Firefox is running in multi-process mode. (Duncan's
>> instructions, while greatly appreciated and very useful, left me
>> slightly confused about pulseaudio's compatibility with multi-process
>> mode.)
>
> Just to clarify:
>
> There's no problem with native pulseaudio and firefox multi-process
> mode.

Thank you for clarifying. And I appreciate your detailed explanation.

> Back when I posted that, a not e10s-enabled extension was actually quite
> likely, as e10s was still rather new.  It's probably somewhat less so
> now, and firefox is of course on to the next big change, dropping the old
> "legacy chrome" extension support, in favor of the newer and generally
> Chromium-compatible WebExtensions/WE API, with firefox 57, to be released
> mid-month (Nov 14).

I am now running Firefox 57 beta and I'll be doing my testing with
that using only WebExtensions.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Peter Grandi
> When defragmenting individual files on a BTRFS filesystem with
> COW, I assume reflinks between that file and all snapshots are
> broken. So if there are 30 snapshots on that volume, that one
> file will suddenly take up 30 times more space... [ ... ]

Defragmentation works by effectively making a copy of the file
contents (simplistic view), so the end result is one copy with
29 reflinked contents, and one copy with defragmented contents.

> Can you also give an example of using find, as you suggested
> above? [ ... ]

Well, one way is to use 'find' as a filtering replacement for
'defrag' option '-r', as in for example:

  find "$HOME" -xdev '(' -name '*.sqlite' -o -name '*.mk4' ')' \
-type f  -print0 | xargs -0 btrfs fi defrag

Another one is to find the most fragmented files first or all
files of at least 1M with with at least say 100 fragments as in:

  find "$HOME" -xdev -type f -size +1M -print0 | xargs -0 filefrag \
| perl -n -e 'print "$1\0" if (m/(.*): ([0-9]+) extents/ && $1 > 100)' \
| xargs -0 btrfs fi defrag

But there are many 'find' web pages and that is not quite a
Btrfs related topic.

> [ ... ] The easiest way I know to exclude cache from
> BTRFS snapshots is to put it on a separate subvolume. I assumed this
> would make several things related to snapshots more efficient too.

Only slightly.

> Background: I'm not sure why our Firefox performance is so terrible

As I always say, "performance" is not the same as "speed", and
probably your Firefox "performance" is sort of OKish even if the
"speed" is terrile, and neither is likely related to the profile
or the cache being on Btrfs: most JavaScript based sites are
awfully horrible regardless of browser:

  http://www.sabi.co.uk/blog/13-two.html?130817#130817

and if Firefox makes a special contribution it tends to leak
memory on several odd but common cases:

  
https://utcc.utoronto.ca/~cks/space/blog/web/FirefoxResignedToLeaks?showcomments

Plus it tends to cache too much, e.g. recently close tabs.

But Firefox is not special because most web browsers are not
designed to run for a long time without a restart, and
Chromium/Chrome simply have a different set of problem sites.
Maybe the new "Quantum" Firefox 57 will improve matters because
it has a far more restrictive plugin API.

The overall problem is insoluble, hipster UX designers will be
the second the the wall when the revolution comes :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Duncan
Dave posted on Tue, 31 Oct 2017 17:47:54 -0400 as excerpted:

> 6. Make sure Firefox is running in multi-process mode. (Duncan's
> instructions, while greatly appreciated and very useful, left me
> slightly confused about pulseaudio's compatibility with multi-process
> mode.)

Just to clarify:

There's no problem with native pulseaudio and firefox multi-process 
mode.  As that's what most people will be using, and what firefox 
upstream ships for, chances are very high that you're just fine there, 
tho there's some small chance you have some other problem.

My specific problem was that I do *NOT* have pulseaudio installed here, 
as I've never found I needed it and it adds more complication to my 
configuration than the limited benefit I'd get out of it justifies.  
Straight alsa has been fine for me.

(Explanatory note: Being on gentoo/~amd64, aka testing, I do a lot more 
updating than stable users, and because it's gentoo, all those updates 
are build from sources, so every single extra package I have installed 
has a very real cost in terms of repeated update builds over time.  Put a 
bit differently, building and updating from sources tends to rather 
strongly encourage the best security practice of only installing what you 
actually need, because you have to rebuild it at every update.  And I 
don't need pulseaudio enough to be worth the cost to keep it updated, so 
I don't have it installed.  It really is that simple.  Binary-based 
distro users have rather trivial update costs in comparison, so having a 
few extra packages installed that they don't actually use, isn't such a 
big deal for them.  Which is of course fortunate, since dependencies are 
often determined at build-time, and binary-based distros tend to enable 
relatively more of them because /someone/ uses them, even if it's a 
minority, so they tend to carry around more dependencies than the normal 
user will need, simply to support the few that do.  And because the cost 
is relatively lower, users, except for the ones that pay enough attention 
to the security aspect of the wider attack surface, don't generally care 
as much as they would if they were forced to build and update all of them 
from sources!)

So when firefox upstream dropped support for alsa and began requiring 
pulseaudio for users that actually wanted their browser to play sound, I 
had two choices.  I could try to find a workaround that would fake firefox 
into believing that I had pulseaudio, or I could switch back to building 
firefox from sources instead of simply installing the upstream provided 
binaries, since gentoo's firefox build scripts still have the alsa 
support option that upstream firefox refused to support or ship any 
longer.

As with most people and their browsers, firefox is the most security-
exposed app I run, and it sometimes takes gentoo a few days after an 
upstream firefox release to get a working build out, during which users 
waiting on gentoo's package build are exposed to already widely known and 
patched by upstream security issues.  That was more risk than I wanted to 
take, thus my choice of switching to the upstream firefox binaries in the 
first place, since they were available, indeed, autoupdated, on release 
day.  Additionally, a firefox build takes awhile, much longer than most 
other packages, and now requires rust, itself an expensive to build 
package (tho fortunately it doesn't upgrade on the fast cycle that firefox 
does).

So I wasn't particularly happy about being forced back to waiting for 
gentoo to get around to updating its firefox builds several days after 
upstream, and then taking the time to build them myself, making it 
worthwhile to look for a workaround.

And as it happens, there's a /sort/ of workaround called apulse, a much 
simpler and smaller package than pulseaudio itself, that's basically just 
a pulseaudio API wrapper around alsa.

And when I first installed apulse and tested firefox with it, sure 
enough, I got firefox sound back! =:^)  I thought I had my workaround and 
that it was a satisfactory solution.

Unfortunately, apulse appears not to be multi-process-safe, and as firefox 
went more and more multi-process in the announcements, etc, at first I 
couldn't figure out what was keeping firefox single-process for me.

After some research on the web, I found the settings to /force/ firefox-
multi-process, and tried them.  But firefox would then only work in local 
mode (about: pages, basically).  Every time I tried to actually go to a 
normal URL, the multi-process tabs would crash before it rendered a 
thing!  The original firefox UI shell was still running, but with an 
error message indicating the tab crash instead of the page I wanted.

After some troubleshooting I figured out it was apulse.  If I moved the 
apulse library out of the way so firefox couldn't find it, I could browse 
the web in multiprocess mode just fine... except I was of course missing 
audio again. =:^(

So apulse wasn't the 

Re: defragmenting best practice?

2017-11-01 Thread Austin S. Hemmelgarn
nks of COW data (for example files copied with
  cp --reflink, snapshots or de-duplicated data). This may cause
  considerable increase of space usage depending on the broken up
  ref-links.

I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?

What is the defragmenting best practice?
Avoid it completly?


My question is the same as the OP in this thread, so I came here to
read the answers before asking.

Based on the answers here, it sounds like I should not run defrag at
all. However, I have a performance problem I need to solve, so if I
don't defrag, I need to do something else.

Here's my scenario. Some months ago I built an over-the-top powerful
desktop computer / workstation and I was looking forward to really
fantastic performance improvements over my 6 year old Ubuntu machine.
I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
shock, it was no faster than my old machine. I focused a lot on
Firefox performance because I use Firefox a lot and that was one of
the applications in which I was most looking forward to better
performance.

I tried everything I could think of and everything recommended to me
in various forums (except switching to Windows) and the performance
remained very disappointing.

Then today I read the following:

 Gotchas - btrfs Wiki
 https://btrfs.wiki.kernel.org/index.php/Gotchas

 Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)

Furthermore, on the same BTRFS Wiki page, it mentions the performance
penalties of many snapshots. I am keeping 30 to 50 snapshots of the
volume that contains the Firefox profile.

Would these two things be enough to turn top-of-the-line hardware into
a mediocre-preforming desktop system? (The system performs fine on
benchmarks -- it's real life usage, particularly with Firefox where it
is disappointing.)

After reading the info here, I am wondering if I should make a new
subvolume just for my Firefox profile(s) and not use COW and/or not
keep snapshots on it and mount it with the autodefrag option.

As part of this strategy, I could send snapshots to another disk using
btrfs send-receive. That way I would have the benefits of snapshots
(which are important to me), but by not keeping any snapshots on the
live subvolume I could avoid the performance problems.

What would you guys do in this situation?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-11-01 Thread Sean Greenslade
On Tue, Oct 31, 2017 at 05:47:54PM -0400, Dave wrote:
> I'm following up on all the suggestions regarding Firefox performance
> on BTRFS. 
>
> 
>
> 5. Firefox profile sync has not worked well for us in the past, so we
> don't use it.
> 6. Our machines generally have plenty of RAM so we could put the
> Firefox cache (and maybe profile) into RAM using a technique such as
> https://wiki.archlinux.org/index.php/Firefox/Profile_on_RAM. However,
> profile persistence is important.

> 4. Put the Firefox cache in RAM
> 
> 5. If needed, consider putting the Firefox profile in RAM

Have you looked into profile-sync-daemon?

https://wiki.archlinux.org/index.php/profile-sync-daemon

It basically does the "keep the profile in RAM but also sync it to HDD"
for you. I've used it for years, it works quite well.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-10-31 Thread Dave
On Tue, Oct 31, 2017 at 7:06 PM, Peter Grandi <p...@btrfs.list.sabi.co.uk> 
wrote:
>
> Also nothing forces you to defragment a whole filesystem, you
> can just defragment individual files or directories by using
> 'find' with it.

Thanks for that info. When defragmenting individual files on a BTRFS
filesystem with COW, I assume reflinks between that file and all
snapshots are broken. So if there are 30 snapshots on that volume,
that one file will suddenly take up 30 times more space... Is that
correct? Or are the reflinks only broken between the live file and the
latest snapshot? Or is it something between, based on how many times
the file has changed?

>
> My top "$HOME" fragmented files are the aKregator RSS feed
> databases, usually a few hundred fragments each, and the
> '.sqlite' files for Firefox. Occasionally like just now I do
> this:
>
>   tree$  sudo filefrag .firefox/default/*.sqlite | sort -t: -k 2n | tail -4
>   .firefox/default/cleanup.sqlite: 43 extents found
>   .firefox/default/content-prefs.sqlite: 67 extents found
>   .firefox/default/formhistory.sqlite: 87 extents found
>   .firefox/default/places.sqlite: 3879 extents found
>
>   tree$  sudo btrfs fi defrag .firefox/default/*.sqlite
>
>   tree$  sudo filefrag .firefox/default/*.sqlite | sort -t: -k 2n | tail -4
>   .firefox/default/webappsstore.sqlite: 1 extent found
>   .firefox/default/favicons.sqlite: 2 extents found
>   .firefox/default/kinto.sqlite: 2 extents found
>   .firefox/default/places.sqlite: 44 extents found

That's a very helpful example.

Can you also give an example of using find, as you suggested above?
I'm generally familiar with using find to execute specific commands,
but an example is appreciated in this case.

> > 2. Put $HOME/.cache on a separate BTRFS subvolume that is mounted nocow -- 
> > it will NOT be snapshotted

> Also, you can declare the '.firefox/default/' directory to be NOCOW, and that 
> "just works".

The cache is in a separate location from the profiles, as I'm sure you
know.  The reason I suggested a separate BTRFS subvolume for
$HOME/.cache is that this will prevent the cache files for all
applications (for that user) from being included in the snapshots. We
take frequent snapshots and (afaik) it makes no sense to include cache
in backups or snapshots. The easiest way I know to exclude cache from
BTRFS snapshots is to put it on a separate subvolume. I assumed this
would make several things related to snapshots more efficient too.

As far as the Firefox profile being declared NOCOW, as soon as we take
the first snapshot, I understand that it will become COW again. So I
don't see any point in making it NOCOW.

Thanks for your reply. I appreciate any other feedback or suggestions.

Background: I'm not sure why our Firefox performance is so terrible
but here's my original post from Sept 20. (I could repost the earlier
replies too if needed.) I've been waiting to have a window of
opportunity to try to fix our Firefox performance again, and now I
have that chance.

>On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
>> When I do a
>> btrfs filesystem defragment -r /directory
>> does it defragment really all files in this directory tree, even if it
>> contains subvolumes?
>> The man page does not mention subvolumes on this topic.
>
>No answer so far :-(
>
>But I found another problem in the man-page:
>
>  Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
>  with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
>  will break up the ref-links of COW data (for example files copied with
>  cp --reflink, snapshots or de-duplicated data). This may cause
>  considerable increase of space usage depending on the broken up
>  ref-links.
>
>I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
>snapshots.
>Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
>
>What is the defragmenting best practice?
>Avoid it completly?

My question is the same as the OP in this thread, so I came here to
read the answers before asking.

Based on the answers here, it sounds like I should not run defrag at
all. However, I have a performance problem I need to solve, so if I
don't defrag, I need to do something else.

Here's my scenario. Some months ago I built an over-the-top powerful
desktop computer / workstation and I was looking forward to really
fantastic performance improvements over my 6 year old Ubuntu machine.
I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
shock, it was no faster than my old machine. I focused a lot on
Firefox performance because I use Firefox a lot and that was one of
the applications in which I was most looking forward to better
performance.

I tried everything I could think of and ever

Re: defragmenting best practice?

2017-10-31 Thread Peter Grandi
> I'm following up on all the suggestions regarding Firefox performance
> on BTRFS. [ ... ]

I haven't read that yet, so maybe I am missing something, but I
use Firefox with Btrfs all the time and I haven't got issues.

[ ... ]
> 1. BTRFS snapshots have proven to be too useful (and too important to
>our overall IT approach) to forego.
[ ... ]
> 3. We have large amounts of storage space (and can add more), but not
>enough to break all reflinks on all snapshots.

Firefox profiles get fragmented only in the databases containes
in them, and they are tiny, as in dozens of MB. That's usually
irrelevant.

Also nothing forces you to defragment a whole filesystem, you
can just defragment individual files or directories by using
'find' with it.

My top "$HOME" fragmented files are the aKregator RSS feed
databases, usually a few hundred fragments each, and the
'.sqlite' files for Firefox. Occasionally like just now I do
this:

  tree$  sudo filefrag .firefox/default/*.sqlite | sort -t: -k 2n | tail -4
  .firefox/default/cleanup.sqlite: 43 extents found
  .firefox/default/content-prefs.sqlite: 67 extents found
  .firefox/default/formhistory.sqlite: 87 extents found
  .firefox/default/places.sqlite: 3879 extents found

  tree$  sudo btrfs fi defrag .firefox/default/*.sqlite

  tree$  sudo filefrag .firefox/default/*.sqlite | sort -t: -k 2n | tail -4
  .firefox/default/webappsstore.sqlite: 1 extent found
  .firefox/default/favicons.sqlite: 2 extents found
  .firefox/default/kinto.sqlite: 2 extents found
  .firefox/default/places.sqlite: 44 extents found

> 2. Put $HOME/.cache on a separate BTRFS subvolume that is mounted
> nocow -- it will NOT be snapshotted

The cache can be simply deleted, and usually files in it are not
updated in place, so don't get fragmented, so no worry.

Also, you can declare the '.firefox/default/' directory to be
NOCOW, and that "just works". I haven't even bothered with that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-10-31 Thread Dave
I'm following up on all the suggestions regarding Firefox performance
on BTRFS. I have time to make these changes now, but I am having
trouble figuring out what to do. The constraints are:

1. BTRFS snapshots have proven to be too useful (and too important to
our overall IT approach) to forego.
2. We do not see any practical alternative (for us) to the incremental
backup strategy
(https://btrfs.wiki.kernel.org/index.php/Incremental_Backup)
3. We have large amounts of storage space (and can add more), but not
enough to break all reflinks on all snapshots.
4. We can transfer snapshots to backup storage (and thereby retain
minimal snapshots on the live volume)
3. Our team is standardized on Firefox. (Switching to Chromium is not
an option for us.)
5. Firefox profile sync has not worked well for us in the past, so we
don't use it.
6. Our machines generally have plenty of RAM so we could put the
Firefox cache (and maybe profile) into RAM using a technique such as
https://wiki.archlinux.org/index.php/Firefox/Profile_on_RAM. However,
profile persistence is important.

The most common recommendations were to switch to Chromium, defragment
and don't use snapshots. As the constraints above illustrate, we
cannot do those things.

The tentative solution I have come up with is:

1. Continue using snapshots, but retain the minimal number possible on
the live volume. Move historical snapshots to a backup device using
btrfs send-receive.
(https://btrfs.wiki.kernel.org/index.php/Incremental_Backup)

2. Put $HOME/.cache on a separate BTRFS subvolume that is mounted
nocow -- it will NOT be snapshotted

3. Put most of $HOME on a "home" volume but separate all user
documents to another volume (i.e., "documents").

3.a. The "home" volume will retain only the one most recent snapshot
on that live volume. (More backup history will be retained on a backup
volume. ) This home volume can be defragmented. With one snapshot,
that will double our space usage, which is acceptable.

3.b. The documents volume will be snapshotted hourly and 36 hourly
snapshots plus daily, weekly and monthly snapshots retained. Therefore
it will NOT be defragmented, as that would not be practical or
space-wise possible.

3.c. The root volume (operating system, etc.) will follow a strategy
similar to home, but will also retain pre- and post- update snapshots.

4. Put the Firefox cache in RAM

5. If needed, consider putting the Firefox profile in RAM

6. Make sure Firefox is running in multi-process mode. (Duncan's
instructions, while greatly appreciated and very useful, left me
slightly confused about pulseaudio's compatibility with multi-process
mode.)

7. Check various Firefox performance tweaks such as these:
https://wiki.archlinux.org/index.php/Firefox/Tweaks

Can anyone guess whether this will be sufficient to solve our severe
performance problems? Do these steps make sense? Will any of these
steps lead to new problems? Should I proceed to give them a try? Or
can anyone suggest a better set of steps to test?

Notes:

In regard to snapshots, we must retain about 36 hourly snapshots of
user documents, for example. We have to have pre- and post- package
upgrade snapshots from at least the most recent operating system &
application package update. And we have to retain several daily,
weekly and monthly snapshots of system directories and some other
locations.) Most of these snapshots can be retained on backup storage
devices.

Regarding Firefox profile sync, it does not have an intelligent method
for resolving conflicts, for example. We found too many unexpected
changes when using sync, so we do not use it.

On Thu, Sep 21, 2017 at 7:09 AM, Duncan <1i5t5.dun...@cox.net> wrote:
> Dave posted on Wed, 20 Sep 2017 02:38:13 -0400 as excerpted:
>
>> Here's my scenario. Some months ago I built an over-the-top powerful
>> desktop computer / workstation and I was looking forward to really
>> fantastic performance improvements over my 6 year old Ubuntu machine. I
>> installed Arch Linux on BTRFS on the new computer (on an SSD). To my
>> shock, it was no faster than my old machine. I focused a lot on Firefox
>> performance because I use Firefox a lot and that was one of the
>> applications in which I was most looking forward to better performance.
>>
>> I tried everything I could think of and everything recommended to me in
>> various forums (except switching to Windows) and the performance
>> remained very disappointing.
>>
>> Then today I read the following:
>>
>> Gotchas - btrfs Wiki https://btrfs.wiki.kernel.org/index.php/Gotchas
>>
>> Fragmentation: Files with a lot of random writes can become
>> heavily fragmented (1+ extents) causing excessive multi-second
>> spikes of CPU load on systems with an SSD or large amount a RAM. On
>> desktops this primarily affects application databases (including
>> Firefox). Workarounds include manually defragmenting your home directory
>> using btrfs fi defragment. Auto-defragment (mount option autodefrag)
>> should 

Re: defragmenting best practice?

2017-09-22 Thread Marc Joliet
Am Freitag, 22. September 2017, 13:22:52 CEST schrieb Austin S. Hemmelgarn:
> > I'm not sure where Firefox puts its cache, I only use it on very rare
> > occasions. But I think it's going to .cache/mozilla last time looked
> > at it.
> 
> I'm pretty sure that is correct.

FWIW, on my system Firefox's cache looks like this:

% du -hsc (find .cache/mozilla/firefox/ -type f) | wc -l
9008


   
% du -hsc (find .cache/mozilla/firefox/ -type f) | sort -h | tail
5,4M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
83CEC8ADA08D9A9658458AB872BE107A216E71C6
5,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
C60061B33D3BB91ED45951C922BAA1BB40022CB7
5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
0900D9EA8E3222EB8690348C2482C69308B15A20
5,7M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
F8E90D121B884360E36BCB1735CC5A8B1B7A744B
5,8M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
903C4CD01ABD74E353C7484C6E21A053AAC5DCC2
6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
3A0D4193B009700155811D14A28DBE38C37C0067
6,1M.cache/mozilla/firefox/cb236e4s.default-1464421886682/startupCache/
scriptCache-current.bin
6,5M.cache/mozilla/firefox/cb236e4s.default-1464421886682/cache2/entries/
304405168662C3624D57AF98A74345464F32A0DB
8,8M.cache/mozilla/firefox/ik7qsfwb.Temp/cache2/entries/
BD7CA4125B3AA87D6B16C987741F33C65DBFFFDD
427Minsgesamt

So lots of files, many of which are (I suppose) relatively large, but do not 
look "everything in one database" large to me.

(This is with Firefox 55.0.2.)

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: defragmenting best practice?

2017-09-22 Thread Austin S. Hemmelgarn

On 2017-09-21 16:10, Kai Krakow wrote:

Am Wed, 20 Sep 2017 07:46:52 -0400
schrieb "Austin S. Hemmelgarn" :


  Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)

Almost certainly.  Most modern web browsers are brain-dead and insist
on using SQLite databases (or traditional DB files) for everything,
including the cache, and the usage for the cache in particular kills
performance when fragmentation is an issue.


At least in Chrome, you can turn on simple cache backend, which, I
think, is using many small instead of one huge file. This suit btrfs
much better:
That's correct.  The traditional cache in Chrome and Chromium uses a 
single SQLite database for storing all the cache data and metadata (just 
like FIrefox did last time I checked).  The simple cache backend instead 
uses the filesystem to handle allocations and uses directory hashing to 
speed up look ups of items, which actually means that even without BTRFS 
involved, it will usually be faster (both because it allows concurrent 
access unlike SQLite, and because it's generally faster to parse a 
multi-level directory hash than an SQL statement).


chrome://flags/#enable-simple-cache-backend


And then I suggest also doing this (as your login user):

$ cd $HOME
$ mv .cache .cache.old
$ mkdir .cache
$ lsattr +C .cache
$ rsync -av .cache.old/ .cache/
$ rm -Rf .cache.old

This makes caches for most applications nocow. Chrome performance was
completely fixed for me by doing this.

I'm not sure where Firefox puts its cache, I only use it on very rare
occasions. But I think it's going to .cache/mozilla last time looked
at it.

I'm pretty sure that is correct.


You may want to close all apps before converting the cache directory.
At a minimum, you'll have to restart them to get them to use the new 
location.


Also, I don't see any downsides in making this nocow. That directory
could easily be also completely volatile. If something breaks due to no
longer protected by data csum, just clean it out.
Indeed, anything that is storing data here that can't be regenerated 
from some other source is asking for trouble, sane backup systems don't 
include ~/.cache, and it's quite often one of the first things 
recommended for deletion when trying to free up disk space.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-21 Thread Kai Krakow
Am Thu, 21 Sep 2017 22:10:13 +0200
schrieb Kai Krakow :

> Am Wed, 20 Sep 2017 07:46:52 -0400
> schrieb "Austin S. Hemmelgarn" :
> 
> > >  Fragmentation: Files with a lot of random writes can become
> > > heavily fragmented (1+ extents) causing excessive multi-second
> > > spikes of CPU load on systems with an SSD or large amount a RAM.
> > > On desktops this primarily affects application databases
> > > (including Firefox). Workarounds include manually defragmenting
> > > your home directory using btrfs fi defragment. Auto-defragment
> > > (mount option autodefrag) should solve this problem.
> > > 
> > > Upon reading that I am wondering if fragmentation in the Firefox
> > > profile is part of my issue. That's one thing I never tested
> > > previously. (BTW, this system has 256 GB of RAM and 20 cores.)
> > Almost certainly.  Most modern web browsers are brain-dead and
> > insist on using SQLite databases (or traditional DB files) for
> > everything, including the cache, and the usage for the cache in
> > particular kills performance when fragmentation is an issue.  
> 
> At least in Chrome, you can turn on simple cache backend, which, I
> think, is using many small instead of one huge file. This suit btrfs
> much better:
> 
> chrome://flags/#enable-simple-cache-backend
> 
> 
> And then I suggest also doing this (as your login user):
> 
> $ cd $HOME
> $ mv .cache .cache.old
> $ mkdir .cache
> $ lsattr +C .cache

Oops, of course that's chattr, not lsattr

> $ rsync -av .cache.old/ .cache/
> $ rm -Rf .cache.old
> 
> This makes caches for most applications nocow. Chrome performance was
> completely fixed for me by doing this.
> 
> I'm not sure where Firefox puts its cache, I only use it on very rare
> occasions. But I think it's going to .cache/mozilla last time looked
> at it.
> 
> You may want to close all apps before converting the cache directory.
> 
> Also, I don't see any downsides in making this nocow. That directory
> could easily be also completely volatile. If something breaks due to
> no longer protected by data csum, just clean it out.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-21 Thread Dave
These are great suggestions. I will test several of them (or all of
them) and report back with my results once I have done the testing.
Thank you! This is a fantastic mailing list.

P.S. I'm inclined to stay with Firefox, but I will definitely test
Chromium vs Firefox after making a series of changes based on the
suggestions here. I would hate to see the market lose the option of
Firefox because everyone goes to Chrome/Chromium.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-21 Thread Kai Krakow
Am Wed, 20 Sep 2017 07:46:52 -0400
schrieb "Austin S. Hemmelgarn" :

> >  Fragmentation: Files with a lot of random writes can become
> > heavily fragmented (1+ extents) causing excessive multi-second
> > spikes of CPU load on systems with an SSD or large amount a RAM. On
> > desktops this primarily affects application databases (including
> > Firefox). Workarounds include manually defragmenting your home
> > directory using btrfs fi defragment. Auto-defragment (mount option
> > autodefrag) should solve this problem.
> > 
> > Upon reading that I am wondering if fragmentation in the Firefox
> > profile is part of my issue. That's one thing I never tested
> > previously. (BTW, this system has 256 GB of RAM and 20 cores.)  
> Almost certainly.  Most modern web browsers are brain-dead and insist
> on using SQLite databases (or traditional DB files) for everything, 
> including the cache, and the usage for the cache in particular kills 
> performance when fragmentation is an issue.

At least in Chrome, you can turn on simple cache backend, which, I
think, is using many small instead of one huge file. This suit btrfs
much better:

chrome://flags/#enable-simple-cache-backend


And then I suggest also doing this (as your login user):

$ cd $HOME
$ mv .cache .cache.old
$ mkdir .cache
$ lsattr +C .cache
$ rsync -av .cache.old/ .cache/
$ rm -Rf .cache.old

This makes caches for most applications nocow. Chrome performance was
completely fixed for me by doing this.

I'm not sure where Firefox puts its cache, I only use it on very rare
occasions. But I think it's going to .cache/mozilla last time looked
at it.

You may want to close all apps before converting the cache directory.

Also, I don't see any downsides in making this nocow. That directory
could easily be also completely volatile. If something breaks due to no
longer protected by data csum, just clean it out.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-21 Thread Sean Greenslade
On September 19, 2017 11:38:13 PM PDT, Dave  wrote:
>>On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
> 
>Here's my scenario. Some months ago I built an over-the-top powerful
>desktop computer / workstation and I was looking forward to really
>fantastic performance improvements over my 6 year old Ubuntu machine.
>I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
>shock, it was no faster than my old machine. I focused a lot on
>Firefox performance because I use Firefox a lot and that was one of
>the applications in which I was most looking forward to better
>performance.
>
> 
>
>What would you guys do in this situation?

Check out profile sync daemon:

https://wiki.archlinux.org/index.php/profile-sync-daemon

It keeps the active profile files in a ramfs, periodically syncing them back to 
disk. It works quite well on my 7 year old netbook.

--Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-21 Thread Duncan
Dave posted on Wed, 20 Sep 2017 02:38:13 -0400 as excerpted:

> Here's my scenario. Some months ago I built an over-the-top powerful
> desktop computer / workstation and I was looking forward to really
> fantastic performance improvements over my 6 year old Ubuntu machine. I
> installed Arch Linux on BTRFS on the new computer (on an SSD). To my
> shock, it was no faster than my old machine. I focused a lot on Firefox
> performance because I use Firefox a lot and that was one of the
> applications in which I was most looking forward to better performance.
> 
> I tried everything I could think of and everything recommended to me in
> various forums (except switching to Windows) and the performance
> remained very disappointing.
> 
> Then today I read the following:
> 
> Gotchas - btrfs Wiki https://btrfs.wiki.kernel.org/index.php/Gotchas
> 
> Fragmentation: Files with a lot of random writes can become
> heavily fragmented (1+ extents) causing excessive multi-second
> spikes of CPU load on systems with an SSD or large amount a RAM. On
> desktops this primarily affects application databases (including
> Firefox). Workarounds include manually defragmenting your home directory
> using btrfs fi defragment. Auto-defragment (mount option autodefrag)
> should solve this problem.
> 
> Upon reading that I am wondering if fragmentation in the Firefox profile
> is part of my issue. That's one thing I never tested previously. (BTW,
> this system has 256 GB of RAM and 20 cores.)
> 
> Furthermore, on the same BTRFS Wiki page, it mentions the performance
> penalties of many snapshots. I am keeping 30 to 50 snapshots of the
> volume that contains the Firefox profile.
> 
> Would these two things be enough to turn top-of-the-line hardware into a
> mediocre-preforming desktop system? (The system performs fine on
> benchmarks -- it's real life usage, particularly with Firefox where it
> is disappointing.)
> 
> After reading the info here, I am wondering if I should make a new
> subvolume just for my Firefox profile(s) and not use COW and/or not keep
> snapshots on it and mount it with the autodefrag option.
> 
> As part of this strategy, I could send snapshots to another disk using
> btrfs send-receive. That way I would have the benefits of snapshots
> (which are important to me), but by not keeping any snapshots on the
> live subvolume I could avoid the performance problems.
> 
> What would you guys do in this situation?

[FWIW this is my second try at a reply, my first being way too detailed 
and going off into the weeds somewhere, so I killed it.]

That's an interesting scenario indeed, and perhaps I can help, since my 
config isn't near as high end as yours, but I run firefox on btrfs on 
ssds, and have no performance complaints.  The difference is very likely 
due to one or more of the following (FWIW I'd suggest a 4-3-1-2 order, 
tho only 1 and 2 are really btrfs related):

1) I make sure I consistently mount with autodefrag, from the first mount 
after the filesystem is created in ordered to first populate it, on.  The 
filesystem never gets fragmented, forcing writes to highly fragmented 
free space, in the first place.  (With the past and current effect of the 
ssd mount option under discussion to change, it's possible I'll get more 
fragmentation in the future after ssd doesn't try so hard to find 
reasonably large free-space chunks to write into, but it has been fine so 
far.)

2) Subvolumes and snapshots seemed to me more trouble than they were 
worth, particularly since it's the same filesystem anyway, and if it's 
damaged, it'll take all the subvolumes and snapshots with it.  So I don't 
use them, preferring instead to use real partitioning and more smaller 
fully separate filesystems, some of which aren't mounted by default (and 
root mounted read-only by default), so there's little chance they'll be 
damaged in a crash or filesystem bug damage scenario.  And if there /is/ 
any damage, it's much more limited in scope since all my data eggs aren't 
in the same basket, so maintenance such as btrfs check and scrub take far 
less time (and check far less memory) than they would were it one big 
pool with snapshots.  And if recovery fails too, the backups are likewise 
small filesystems the same size as the working copies, so copying the 
data back over takes far less time as well (not to mention making the 
backups takes less time in the first place, so it's easier to regularly 
update them).

3) Austin mentioned the firefox cache.  I honestly wouldn't know on it, 
since I have firefox configured to use a tmpfs for its cache, so it 
operates at memory speed and gets cleared along with its memory at every 
reboot or tmpfs umount.  My inet speed is fast enough I don't really need 
cache anyway, but it's nice to have it, operating at memory speed, within 
a single boot session... and to have it cleared on reboot.


4) This one was the biggest one for me for awhile.

Is firefox running in multi-process mode?  

Re: defragmenting best practice?

2017-09-20 Thread Austin S. Hemmelgarn

On 2017-09-20 02:38, Dave wrote:

On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:

When I do a
btrfs filesystem defragment -r /directory
does it defragment really all files in this directory tree, even if it
contains subvolumes?
The man page does not mention subvolumes on this topic.


No answer so far :-(

But I found another problem in the man-page:

  Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
  with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
  will break up the ref-links of COW data (for example files copied with
  cp --reflink, snapshots or de-duplicated data). This may cause
  considerable increase of space usage depending on the broken up
  ref-links.

I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?

What is the defragmenting best practice?
Avoid it completly?


My question is the same as the OP in this thread, so I came here to
read the answers before asking. However, it turns out that I still
need to ask something. Should I ask here or start a new thread? (I'll
assume here, since the topic is the same.)

Based on the answers here, it sounds like I should not run defrag at
all. However, I have a performance problem I need to solve, so if I
don't defrag, I need to do something else.

Here's my scenario. Some months ago I built an over-the-top powerful
desktop computer / workstation and I was looking forward to really
fantastic performance improvements over my 6 year old Ubuntu machine.
I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
shock, it was no faster than my old machine. I focused a lot on
Firefox performance because I use Firefox a lot and that was one of
the applications in which I was most looking forward to better
performance.

I tried everything I could think of and everything recommended to me
in various forums (except switching to Windows) and the performance
remained very disappointing.
Switching to Windows won't help any more than switching to ext4 would. 
If you were running Chrome, it might (Chrome actually has better 
performance on Windows than Linux by a small margin last time I 
checked), but Firefox gets pretty much the same performance on both 
platforms.


Then today I read the following:

 Gotchas - btrfs Wiki
 https://btrfs.wiki.kernel.org/index.php/Gotchas

 Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)
Almost certainly.  Most modern web browsers are brain-dead and insist on 
using SQLite databases (or traditional DB files) for everything, 
including the cache, and the usage for the cache in particular kills 
performance when fragmentation is an issue.


Furthermore, on the same BTRFS Wiki page, it mentions the performance
penalties of many snapshots. I am keeping 30 to 50 snapshots of the
volume that contains the Firefox profile.

Would these two things be enough to turn top-of-the-line hardware into
a mediocre-preforming desktop system? (The system performs fine on
benchmarks -- it's real life usage, particularly with Firefox where it
is disappointing.)
Even ignoring fragmentation and reflink issues (it's reflinks, not 
snapshots that are the issue, snapshots just have tons of reflinks), 
BTRFS is slower than ext4 or XFS simply because of the fact that it's 
doing way more work.  The difference should have limited impact on an 
SSD if you get a handle on the other issues though.


After reading the info here, I am wondering if I should make a new
subvolume just for my Firefox profile(s) and not use COW and/or not
keep snapshots on it and mount it with the autodefrag option.

As part of this strategy, I could send snapshots to another disk using
btrfs send-receive. That way I would have the benefits of snapshots
(which are important to me), but by not keeping any snapshots on the
live subvolume I could avoid the performance problems.

What would you guys do in this situation?
Personally?  Use Chrome or Chromium and turn on the simple cache backend 
(chrome://flags/#enable-simple-cache-backend) which doesn't have issues 
with fragmentation because it doesn't use a database file to store the 
cache and lets the filesystem handle the allocations.  The difference in 
performance in Chrome itself from flipping this switch is pretty amazing 
to be honest.  They're also faster than Firefox in 

Re: defragmenting best practice?

2017-09-20 Thread Dmitry Kudriavtsev
I've had a very similar issue with the performance of my laptop dropping to 
very low levels, eventually solved by uninstalling Snapper, deleting snapshots, 
and then defragmenting the drive.

This seems to be a common concern, I also had it happen on my desktop.

Dmitry

---

Thank you,
Dmitry Kudriavtsev

https://dkudriavtsev.xyz
inexpensivecomputers.net

⠀⠀⠀⣸⣧⠀⠀⠀
⠀⠀⣰⣿⣿⣆⠀⠀
⠀⣀⡙⠿⣿⣿⣆⠀Hey, did you hear about that cool new OS? It's called
⣰⣿⣿⣷⣿⣿⣿⣆Arch Linux. I use Arch Linux. Have you ever used Arch
⠀⠀⠀⣰⣿⣿⣿⡿⢿⣿⣿⣿⣆⠀⠀⠀Linux? You should use Arch Linux. Everyone uses Arch!
⠀⠀⣰⣿⣿⣿⡏⠀⠀⢹⣿⣿⠿⡆⠀⠀Check out i3wm too!
⠀⣰⣿⣿⣿⡿⠇⠀⠀⠸⢿⣿⣷⣦⣄⠀
⣼⠿⠛⠉⠉⠛⠿⣦
September 19 2017 11:38 PM, "Dave" <davestechs...@gmail.com> wrote:
>> On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
>>> When I do a
>>> btrfs filesystem defragment -r /directory
>>> does it defragment really all files in this directory tree, even if it
>>> contains subvolumes?
>>> The man page does not mention subvolumes on this topic.
>> 
>> No answer so far :-(
>> 
>> But I found another problem in the man-page:
>> 
>> Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
>> with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
>> will break up the ref-links of COW data (for example files copied with
>> cp --reflink, snapshots or de-duplicated data). This may cause
>> considerable increase of space usage depending on the broken up
>> ref-links.
>> 
>> I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
>> snapshots.
>> Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
>> 
>> What is the defragmenting best practice?
>> Avoid it completly?
> 
> My question is the same as the OP in this thread, so I came here to
> read the answers before asking. However, it turns out that I still
> need to ask something. Should I ask here or start a new thread? (I'll
> assume here, since the topic is the same.)
> 
> Based on the answers here, it sounds like I should not run defrag at
> all. However, I have a performance problem I need to solve, so if I
> don't defrag, I need to do something else.
> 
> Here's my scenario. Some months ago I built an over-the-top powerful
> desktop computer / workstation and I was looking forward to really
> fantastic performance improvements over my 6 year old Ubuntu machine.
> I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
> shock, it was no faster than my old machine. I focused a lot on
> Firefox performance because I use Firefox a lot and that was one of
> the applications in which I was most looking forward to better
> performance.
> 
> I tried everything I could think of and everything recommended to me
> in various forums (except switching to Windows) and the performance
> remained very disappointing.
> 
> Then today I read the following:
> 
> Gotchas - btrfs Wiki
> https://btrfs.wiki.kernel.org/index.php/Gotchas
> 
> Fragmentation: Files with a lot of random writes can become
> heavily fragmented (1+ extents) causing excessive multi-second
> spikes of CPU load on systems with an SSD or large amount a RAM. On
> desktops this primarily affects application databases (including
> Firefox). Workarounds include manually defragmenting your home
> directory using btrfs fi defragment. Auto-defragment (mount option
> autodefrag) should solve this problem.
> 
> Upon reading that I am wondering if fragmentation in the Firefox
> profile is part of my issue. That's one thing I never tested
> previously. (BTW, this system has 256 GB of RAM and 20 cores.)
> 
> Furthermore, on the same BTRFS Wiki page, it mentions the performance
> penalties of many snapshots. I am keeping 30 to 50 snapshots of the
> volume that contains the Firefox profile.
> 
> Would these two things be enough to turn top-of-the-line hardware into
> a mediocre-preforming desktop system? (The system performs fine on
> benchmarks -- it's real life usage, particularly with Firefox where it
> is disappointing.)
> 
> After reading the info here, I am wondering if I should make a new
> subvolume just for my Firefox profile(s) and not use COW and/or not
> keep snapshots on it and mount it with the autodefrag option.
> 
> As part of this strategy, I could send snapshots to another disk using
> btrfs send-receive. That way I would have the benefits of snapshots
> (which are important to me), but by not keeping any snapshots on the
> live subvolume I could avoid the performance problems.
> 
> What would you guys do in this situation?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-20 Thread Dave
>On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
>> When I do a
>> btrfs filesystem defragment -r /directory
>> does it defragment really all files in this directory tree, even if it
>> contains subvolumes?
>> The man page does not mention subvolumes on this topic.
>
>No answer so far :-(
>
>But I found another problem in the man-page:
>
>  Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
>  with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
>  will break up the ref-links of COW data (for example files copied with
>  cp --reflink, snapshots or de-duplicated data). This may cause
>  considerable increase of space usage depending on the broken up
>  ref-links.
>
>I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
>snapshots.
>Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
>
>What is the defragmenting best practice?
>Avoid it completly?

My question is the same as the OP in this thread, so I came here to
read the answers before asking. However, it turns out that I still
need to ask something. Should I ask here or start a new thread? (I'll
assume here, since the topic is the same.)

Based on the answers here, it sounds like I should not run defrag at
all. However, I have a performance problem I need to solve, so if I
don't defrag, I need to do something else.

Here's my scenario. Some months ago I built an over-the-top powerful
desktop computer / workstation and I was looking forward to really
fantastic performance improvements over my 6 year old Ubuntu machine.
I installed Arch Linux on BTRFS on the new computer (on an SSD). To my
shock, it was no faster than my old machine. I focused a lot on
Firefox performance because I use Firefox a lot and that was one of
the applications in which I was most looking forward to better
performance.

I tried everything I could think of and everything recommended to me
in various forums (except switching to Windows) and the performance
remained very disappointing.

Then today I read the following:

Gotchas - btrfs Wiki
https://btrfs.wiki.kernel.org/index.php/Gotchas

Fragmentation: Files with a lot of random writes can become
heavily fragmented (1+ extents) causing excessive multi-second
spikes of CPU load on systems with an SSD or large amount a RAM. On
desktops this primarily affects application databases (including
Firefox). Workarounds include manually defragmenting your home
directory using btrfs fi defragment. Auto-defragment (mount option
autodefrag) should solve this problem.

Upon reading that I am wondering if fragmentation in the Firefox
profile is part of my issue. That's one thing I never tested
previously. (BTW, this system has 256 GB of RAM and 20 cores.)

Furthermore, on the same BTRFS Wiki page, it mentions the performance
penalties of many snapshots. I am keeping 30 to 50 snapshots of the
volume that contains the Firefox profile.

Would these two things be enough to turn top-of-the-line hardware into
a mediocre-preforming desktop system? (The system performs fine on
benchmarks -- it's real life usage, particularly with Firefox where it
is disappointing.)

After reading the info here, I am wondering if I should make a new
subvolume just for my Firefox profile(s) and not use COW and/or not
keep snapshots on it and mount it with the autodefrag option.

As part of this strategy, I could send snapshots to another disk using
btrfs send-receive. That way I would have the benefits of snapshots
(which are important to me), but by not keeping any snapshots on the
live subvolume I could avoid the performance problems.

What would you guys do in this situation?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Tomasz Kłoczko
On 15 September 2017 at 18:08, Kai Krakow  wrote:
[..]
> According to Tomasz, your tests should not run at vastly different
> speeds because fragmentation has no impact on performance, quod est
> demonstrandum... I think we will not get to the "erat" part.

No. This is not precisely what I'm trying to tell.
Now however seeing that there is no precise/fully repeatable
methodology of performing proposed test I have huge doubts about what
is reported has effect has anything to do do with fragmentation or it
is side effect of using COW (which allow glue some number random
updates into larger sequential write IOs).

kloczek
-- 
Tomasz Kłoczko  LinkedIn: http://lnkd.in/FXPWxH
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Kai Krakow
Am Fri, 15 Sep 2017 16:11:50 +0200
schrieb Michał Sokołowski :

> On 09/15/2017 03:07 PM, Tomasz Kłoczko wrote:
> > [...]
> > Case #1
> > 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2
> > storage -> guest BTRFS filesystem  
> > SQL table row insertions per second: 1-2
> >
> > Case #2
> > 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu raw
> > storage -> guest EXT4 filesystem
> > SQL table row insertions per second: 10-15
> > Q -1) why you are comparing btrfs against ext4 on top of the btrfs
> > which is doing own COW operations on bottom of such sandwiches ..
> > if we SUPPOSE to be talking about impact of the fragmentation on
> > top of btrfs?  
> 
> Tomasz,
> you seem to be convinced that fragmentation does not matter. I found
> this (extremely bad, true) example says otherwise.

Sorry to jump this, but did you at least set the qemu image to nocow?
Otherwise this example is totally flawed because you're testing qemu
storage layer mostly and not btrfs.

A better test would've been to test qemu raw on btrfs cow vs on btrfs
nocow, with both the same file system inside the qemu image.

But you are modifying multiple parameters at once during the test, and
I expect then everyone has a huge impact on performance but only one is
specific to btrfs which you apparently did not test this way.

Personally, running qemu cow2 on btrfs cow really helps nothing except
really bad performance. Make one of both layers nocow and it should
become better.

If you want to give some better numbers, please reduce this test to
just one cow layer, the one at the top layer: btrfs host fs. Copy the
image somewhere else to restore from, and ensure (using filefrag) that
the starting situation matches each test run.

Don't change any parameters of the qemu layer at each test. And run a
file system inside which doesn't do any fancy stuff, like ext2 or ext3
without journal. Use qemu raw storage.

Then test again with cow vs nocow on the host side.

Create a nocow copy of your image (use size of the source image for
truncate):

# rm -f qemu-image-nocow.raw
# touch qemu-image-nocow.raw
# chattr +C -c qemu-image-nocow.raw
# dd if=source-image.raw of=qemu-image-nocow.raw bs=1M
# btrfs fi defrag -f qemu-image-nocow.raw
# filefrag -v qemu-image-nocow.raw

Create a cow copy of your image:

# rm -f qemu-image-cow.raw
# touch qemu-image-cow.raw
# chattr -C -c qemu-image-cow.raw
# dd if=source-image.raw of=qemu-image-cow.raw bs=1M
# btrfs fi defrag -f qemu-image-cow.raw
# filefrag -v qemu-image-cow.raw

Given that host btrfs is mounted datacow,compress=none and without
autodefrag, and you don't touch the source image contents during tests.

Now run your test script inside both qemu machines, take your
measurements and check fragmentation again after the run.

filefrag should report no more fragments than before the test for the
first test, but should report a magnitude more for the second test.

Now copy (cp) both one at a time to a new file and measure the time. It
should be slower for the highly fragmented version.

Don't forget to run tests with and without flushed caches so we get
cold and warm numbers.

In this scenario, qemu would only be the application to modify the raw
image files and you're actually testing the impact of fragmentation of
btrfs.

You could also make a reflink copy of the nocow test image and do a
third test to see that it introduces fragmentation now, tho probably
much lower than for the cow test image. You can verify the numbers with
filefrag.

According to Tomasz, your tests should not run at vastly different
speeds because fragmentation has no impact on performance, quod est
demonstrandum... I think we will not get to the "erat" part.


-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Peter Grandi
[ ... ]
 Case #1
 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs
 -> qemu cow2 storage -> guest BTRFS filesystem
 SQL table row insertions per second: 1-2

 Case #2
 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs
 -> qemu raw storage -> guest EXT4 filesystem
 SQL table row insertions per second: 10-15
[ ... ]

>> Q 0) what do you think that you measure here?

> Cow's fragmentation impact on SQL write performance.

That's not what you are measuring, you are measing the impact on
speed of configurations "designed" (perhaps unintentionally) for
maximum flexibility, lowest cost, and complete disregard for
speed.

[ ... ]

> It was quick and dirty task to find, prove and remove
> performance bottleneck at minimal cost.

This is based on the usual confusion between "performance" (the
result of several tradeoffs) and "speed". When you report "row
insertions per second" you are reporting a rate, that is a
"speed", not "performance", which is always multi-dimensional.
http://www.sabi.co.uk/blog/15-two.html?151023#151023

In the cases above speed is low, but I think that, taking into
account flexibility and cost, performance is pretty good.

> AFAIR removing storage cow2 and guest BTRFS storage gave us ~
> 10 times boost.

"Oh doctor, if I stop stabbing my hand with a fork it no longer
hurts, but running while carrying a rucksack full of bricks is
still slower than with a rucksack full of feathers".

[ ... ]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Michał Sokołowski
On 09/15/2017 03:07 PM, Tomasz Kłoczko wrote:
> [...]
> Case #1
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2 storage
> -> guest BTRFS filesystem
> SQL table row insertions per second: 1-2
>
> Case #2
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu raw storage ->
> guest EXT4 filesystem
> SQL table row insertions per second: 10-15
> Q -1) why you are comparing btrfs against ext4 on top of the btrfs
> which is doing own COW operations on bottom of such sandwiches ..  if
> we SUPPOSE to be talking about impact of the fragmentation on top of
> btrfs?

Tomasz,
you seem to be convinced that fragmentation does not matter. I found
this (extremely bad, true) example says otherwise.

> Q 0) what do you think that you measure here?

Cow's fragmentation impact on SQL write performance.

> Q 1) how did you produce those time measurements? time command?
> looking on the watch?

Time command (real) of bash script inserting 1000 rows (index and 128B
random string).

> Q 2) why there are ranges of timings? did you repeat some operations
> few times (how many times and with or without dropping caches or doing
> reboots?)

Yes, we've repeated it. With and without flushing cache (it didn't seem
to have any impact). I cannot remember whenever there were any reboots.
Those big time ranges are because, I don't have exact numbers on me. It
was quick and dirty task to find, prove and remove performance
bottleneck at minimal cost. AFAIR removing storage cow2 and guest BTRFS
storage gave us ~ 10 times boost. Surprisingly for us this boost seems
to be consistent (it does not degrade noticeably over time - 2 months
from the change).

> Q 3) What kind of SQL engine? with what kind of settings? with what
> kind of tables? (indexes? foreign keys?) What kind of transactions
> semantics?

PostgreSQL and MySQL both gave us those results. *

> Q 4) where is the example set of inserts which I can replay in my
> setup? did you drop caches before batch of inserts? (do you know that
> every insert generates as well some number of read IOs so information
> is something is already cached before batch of inserts is *crucial*)
> Did you restart SQL engine?
> Q 5) are both test have been executed on the same box? if not which
> one version of the kernel(s) have been used?

Same distribution, machine and kernel. *

> Q 6) ) effectively how many IOs have been done during those tests? how
> did you measured those numbers (dtrace? perf? systemtap?)

I didn't check that. *

> Q7) why you are running your tests over qemu? Is it anything more
> running on the host system during those tests?

Because of "production" environment location. No, there was not.

*) If you're really interested in (which I doubt), then I can put
example environment somewhere and gather more data.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: defragmenting best practice?

2017-09-15 Thread Tomasz Kłoczko
On 15 September 2017 at 11:54, Michał Sokołowski  wrote:
[..]
>> Just please some example which I can try to replay which ill be
>> showing that we have similar results.
>
> Case #1
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2 storage
> -> guest BTRFS filesystem
> SQL table row insertions per second: 1-2
>
> Case #2
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu raw storage ->
> guest EXT4 filesystem
> SQL table row insertions per second: 10-15

Q -1) why you are comparing btrfs against ext4 on top of the btrfs
which is doing own COW operations on bottom of such sandwiches ..  if
we SUPPOSE to be talking about impact of the fragmentation on top of
btrfs?
Q 0) what do you think that you measure here?
Q 1) how did you produce those time measurements? time command?
looking on the watch?
Q 2) why there are ranges of timings? did you repeat some operations
few times (how many times and with or without dropping caches or doing
reboots?)
Q 3) What kind of SQL engine? with what kind of settings? with what
kind of tables? (indexes? foreign keys?) What kind of transactions
semantics?
Q 4) where is the example set of inserts which I can replay in my
setup? did you drop caches before batch of inserts? (do you know that
every insert generates as well some number of read IOs so information
is something is already cached before batch of inserts is *crucial*)
Did you restart SQL engine?
Q 5) are both test have been executed on the same box? if not which
one version of the kernel(s) have been used?
Q 6) ) effectively how many IOs have been done during those tests? how
did you measured those numbers (dtrace? perf? systemtap?)
Q7) why you are running your tests over qemu? Is it anything more
running on the host system during those tests?
.
.
.
I can probably make this list of questions 2 or 3 times longer.

koczek
--
Tomasz Kłoczko | LinkedIn: http://lnkd.in/FXPWxH
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Austin S. Hemmelgarn

On 2017-09-14 22:26, Tomasz Kłoczko wrote:

On 14 September 2017 at 19:53, Austin S. Hemmelgarn
 wrote:
[..]

While it's not for BTRFS< a tool called e4rat might be of interest to you
regarding this.  It reorganizes files on an ext4 filesystem so that stuff
used by the boot loader is right at the beginning of the device, and I've
know people to get insane performance improvements (on the order of 20x in
some pathologicallyb ad cases) in the time taken from the BIOS handing
things off to GRUB to GRUB handing execution off to the kernel.


Do you know that what you've just wrote has nothing to do with fragmentation?
Intentionally or not you just trying to change the subject.
As hard as it may be to believe, this _is_ relevant to the part of your 
reply that I was responding to, namely:


> By how much it is possible to improve boot time?

Note that discussion of file ordering impacting boot times is almost 
always centered around the boot loader, and _not_ userspace (because as 
you choose to focus on in changing the subject for the rest of this 
message, it's trivially possible to improve performance in userspace 
with some really simple tweaks).


You wanted examples regarding reordering of data in a localized manner 
improving boot time, so I gave _the_ reference for this on Linux (e4rat 
is the only publicly available tool I know of that does this).


[..]

This shouldn't need examples.  It's trivial math combined with basic
knowledge of hardware behavior.  Every request to a device has a minimum
amount of overhead.  On traditional hard drives, this is usually dominated
by seek latency, but on SSD's, the request setup, dispatch, and completion
are the dominant factor.  Assumign you have a 2 micro-second overhead
per-request (not an exact number, just chosen for demonstration purposes
because it makes the math easy), and a 1GB file, the time difference between
reading ten 100MB extents and reading ten thousand 100kB extents is just
short of 0.02 seconds, or a factor of about one thousand (which, no surprise
here, is the factor of difference between the number of extents).


So to produce few seconds delay during boot you need to make few
hundreds thousands if not millions more IOs  and on reading everything
using ideal long sequential reads.
No, that isn't what I was talking about.  Quit taking things out of 
context and assuming all of someone's reply is about only part of yours.


This was responding solely to this:

> That it may be an issue with using extents.
> Again: please show some results of some test unit which anyone will be
> able to reply and confirm or not that this effect really exist.

And has nothing to do with boot time.


Almost every package upgrade on rewrite some files in 100% will
produce by using COW fully continuous areas per file.
You know .. there is no so many files in typical distribution
installation to produce such measurable impact. > On my current laptop I have a 
lot of devel and debug stuff installed
and still I have only

$ rpm -qal | wc -l
276489

files (from which only small fractions are ELF DSOs or executables)
installed by:

$ rpm -qa | wc -l
2314

packages.

I can bet that during even very complicated boot process it will be
touched (by read IOs) only few hundreds files. None of those files
will be read sequentially because this is not how executable content
is usually loaded into the buffer cache. Simple change block device
read ahead may improve boot time enough without putting all blocks in
perfect order. All what you need is start enough early "blockdev
--setra N" where N is greater than default 256 blocks. All this can be
done without thinking about fragmentation.
As I mentioned above, the primary argument for reordering data for boot 
is largely related to the boot-loader, which doesn't have intelligent 
I/O scheduling and doesn't do read ahead, and is primarily about usage 
with traditional hard drives, where seek latency caused by lack of data 
locality actually does have a significant (and well documented) impact.



Seems you don't know that Linux by default is reading data from block
dev using at least 256 blocks (1KB each one) chunks because such IO
size is part of default RA settings, You can change those settings
just for boot time and you will have way lower number of IOs and sill
no significant improvement like few times shorter time. Fragmentation
will be in such case secondary factor.
All this could be done without bothering about fragmentation.
The block-level read-ahead done by the kernel has near zero impact on 
performance unless your data is already highly local (not necessarily 
ordered, but at least all in the same place), which will almost never be 
the case on BTRFS when dealing with an active data set because of its 
copy on write semantics.


In other words still you are talking about some institutionally
possible results which will be falsified if you will try at least one
time do some real tests and 

Re: defragmenting best practice?

2017-09-15 Thread Peter Grandi
> Case #1
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2 storage
> -> guest BTRFS filesystem
> SQL table row insertions per second: 1-2

"Doctor, if I stab my hand with a fork it hurts a lot: can you
cure that?"

> Case #2
> 2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu raw
> storage -> guest EXT4 filesystem
> SQL table row insertions per second: 10-15

"Doctor, I can't run as fast with a backpack full of bricks as
without it: can you cure that?"

:-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-15 Thread Michał Sokołowski
On 09/14/2017 07:48 PM, Tomasz Kłoczko wrote:
> On 14 September 2017 at 16:24, Kai Krakow  wrote:
> [..]
>> > Getting e.g. boot files into read order or at least nearby improves
>> > boot time a lot. Similar for loading applications.
> [...]
> Just please some example which I can try to replay which ill be
> showing that we have similar results.

Case #1
2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu cow2 storage
-> guest BTRFS filesystem
SQL table row insertions per second: 1-2

Case #2
2x 7200 rpm HDD -> md raid 1 -> host BTRFS rootfs -> qemu raw storage ->
guest EXT4 filesystem
SQL table row insertions per second: 10-15




smime.p7s
Description: S/MIME Cryptographic Signature


Re: defragmenting best practice?

2017-09-14 Thread Tomasz Kłoczko
On 14 September 2017 at 19:53, Austin S. Hemmelgarn
 wrote:
[..]
> While it's not for BTRFS< a tool called e4rat might be of interest to you
> regarding this.  It reorganizes files on an ext4 filesystem so that stuff
> used by the boot loader is right at the beginning of the device, and I've
> know people to get insane performance improvements (on the order of 20x in
> some pathologicallyb ad cases) in the time taken from the BIOS handing
> things off to GRUB to GRUB handing execution off to the kernel.

Do you know that what you've just wrote has nothing to do with fragmentation?
Intentionally or not you just trying to change the subject.

[..]
> This shouldn't need examples.  It's trivial math combined with basic
> knowledge of hardware behavior.  Every request to a device has a minimum
> amount of overhead.  On traditional hard drives, this is usually dominated
> by seek latency, but on SSD's, the request setup, dispatch, and completion
> are the dominant factor.  Assumign you have a 2 micro-second overhead
> per-request (not an exact number, just chosen for demonstration purposes
> because it makes the math easy), and a 1GB file, the time difference between
> reading ten 100MB extents and reading ten thousand 100kB extents is just
> short of 0.02 seconds, or a factor of about one thousand (which, no surprise
> here, is the factor of difference between the number of extents).

So to produce few seconds delay during boot you need to make few
hundreds thousands if not millions more IOs  and on reading everything
using ideal long sequential reads.
Almost every package upgrade on rewrite some files in 100% will
produce by using COW fully continuous areas per file.
You know .. there is no so many files in typical distribution
installation to produce such measurable impact.
On my current laptop I have a lot of devel and debug stuff installed
and still I have only

$ rpm -qal | wc -l
276489

files (from which only small fractions are ELF DSOs or executables)
installed by:

$ rpm -qa | wc -l
2314

packages.

I can bet that during even very complicated boot process it will be
touched (by read IOs) only few hundreds files. None of those files
will be read sequentially because this is not how executable content
is usually loaded into the buffer cache. Simple change block device
read ahead may improve boot time enough without putting all blocks in
perfect order. All what you need is start enough early "blockdev
--setra N" where N is greater than default 256 blocks. All this can be
done without thinking about fragmentation.
Seems you don't know that Linux by default is reading data from block
dev using at least 256 blocks (1KB each one) chunks because such IO
size is part of default RA settings, You can change those settings
just for boot time and you will have way lower number of IOs and sill
no significant improvement like few times shorter time. Fragmentation
will be in such case secondary factor.
All this could be done without bothering about fragmentation.

In other words still you are talking about some institutionally
possible results which will be falsified if you will try at least one
time do some real tests and measurements.
Last time when I've been doing some boot time measurements it was
about using sequential start of all services vs. maximum
palatalization. And yes by this it was possible to improve boot time
by few times. All without bothering about fragmentation.

Current fedora systemd base services definition can be improved in
many places by add more dependencies and execute many small services
in parallel. All those corrections can be done without even thinking
about fragmentation. Because these base sett of systemd services comes
with systemd source code those improvements can be done for almost all
Linux systemd based distros.

kloczek
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-14 Thread Kai Krakow
Am Thu, 14 Sep 2017 18:48:54 +0100
schrieb Tomasz Kłoczko :

> On 14 September 2017 at 16:24, Kai Krakow 
> wrote: [..]
> > Getting e.g. boot files into read order or at least nearby improves
> > boot time a lot. Similar for loading applications.  
> 
> By how much it is possible to improve boot time?
> Just please some example which I can try to replay which ill be
> showing that we have similar results.
> I still have one one of my laptops with spindle on btrfs root fs ( and
> no other FSess in use) so I could be able to confirm that my numbers
> are enough close to your numbers.

I need to create a test setup because this system uses bcache. The
difference (according to systemd-analyze) between warm bcache and no
bcache at all ranges from 16-30s boot time vs. 3+ minutes boot time.

I could turn off bcache, do a boot trace, try to rearrange boot files,
boot again. However, that is not very reproducible as the current file
layout is not defined. It'd be better to setup a separate machine where
I could start over from a "well defined" state before applying
optimization steps to see the differences between different strategies.
At least readahead is not very helpful, I tested that in the past. It
reduces boot time just by a few seconds, maybe 20-30, thus going from
3+ minutes to 2+ minutes.

I still have an old laptop lying around: Single spindle, should make a
good test scenario. I'll have to see if I can get it back into shape.
It will take me some time.


> > Shake tries to
> > improve this by rewriting the files - and this works because file
> > systems (given enough free space) already do a very good job at
> > doing this. But constant system updates degrade this order over
> > time.  
> 
> OK. Please prepare some database, import some data which size will be
> few times of not used RAM (best if this multiplication factor will be
> at least 10). Then do some batch of selects measuring distribution
> latencies of those queries.

Well, this is pretty easy. Systemd-journald is a real beast when it
comes to cow fragmentation. Results can be easily generated and
reproduced. There are long traces of discussions in the systemd mailing
list and I simply decided to make the files nocow right from the start
and that fixed it for me. I can simply revert it and create benchmarks.


> This will give you some data about. not fragmented data.

Well, I would probably do it the other way around: Generate a
fragmented journal file (as that is how journald creates the file over
time), then rewrite it by some manner to reduce extents, then run
journal operations again on this file. Does it bother you to turn this
around?


> Then on next stage try to apply some number of update queries and
> after reboot the system or drop all caches. and repeat the same set of
> selects.
> After this all what you need to do is compare distribution of the
> latencies.

Which tool to use to measure which latencies?

Speaking of latencies: What's of interest here is perceived
performance resulting mostly from seek overhead (except probably in the
journal file case which just overwhelmes by the pure amount of
extents). I'm not sure if measuring VFS latencies would provide any
useful insights here. VFS probably works fast enough still in this
case.


> > It really doesn't matter if some big file is laid out in 1
> > allocation of 1 GB or in 250 allocations of 4MB: It really doesn't
> > make a big difference.
> >
> > Recombining extents into bigger once, tho, can make a big
> > difference in an aging btrfs, even on SSDs.  
> 
> That it may be an issue with using extents.

I can't follow why you argue that a file with thousands of extents vs
a file of same size but only a few extents would makes no difference to
operate on. And of course this has to do with extents. But btrfs uses
extents. Do you suggest to use ZFS instead?

Due to how cow works, the effect would probably be less or barely
noticable for writes, but read scanning through the file becomes slow
with clearly more "noise" from the moving heads.


> Again: please show some results of some test unit which anyone will be
> able to reply and confirm or not that this effect really exist.
> 
> If problem really exist and is related ot extents you should have real
> scenario explanation why ZFS is not using extents.

That was never the discussion. You brought in the ZFS point. I read
about the design reasoning behind ZFS when it appeared and started gain
public interest years back.


> btrfs is not to far from classic approach do FS because it srill uses
> allocation structures.
> This is not the case in context of ZFS because this technology has no
> information about what is already allocates.

What about btrfs free space tree? Isn't that more or less the same? But
I don't believe that makes a significant difference for desktop-sized
storages. I think introduction of free space tree was due to
performance of many-TB file systems up to 

Re: defragmenting best practice?

2017-09-14 Thread Austin S. Hemmelgarn

On 2017-09-14 13:48, Tomasz Kłoczko wrote:

On 14 September 2017 at 16:24, Kai Krakow  wrote:
[..]

Getting e.g. boot files into read order or at least nearby improves
boot time a lot. Similar for loading applications.


By how much it is possible to improve boot time?
Just please some example which I can try to replay which ill be
showing that we have similar results.
I still have one one of my laptops with spindle on btrfs root fs ( and
no other FSess in use) so I could be able to confirm that my numbers
are enough close to your numbers.
While it's not for BTRFS< a tool called e4rat might be of interest to 
you regarding this.  It reorganizes files on an ext4 filesystem so that 
stuff used by the boot loader is right at the beginning of the device, 
and I've know people to get insane performance improvements (on the 
order of 20x in some pathologicallyb ad cases) in the time taken from 
the BIOS handing things off to GRUB to GRUB handing execution off to the 
kernel.



Shake tries to
improve this by rewriting the files - and this works because file
systems (given enough free space) already do a very good job at doing
this. But constant system updates degrade this order over time.


OK. Please prepare some database, import some data which size will be
few times of not used RAM (best if this multiplication factor will be
at least 10). Then do some batch of selects measuring distribution
latencies of those queries.
This will give you some data about. not fragmented data.
Then on next stage try to apply some number of update queries and
after reboot the system or drop all caches. and repeat the same set of
selects.
After this all what you need to do is compare distribution of the latencies.


It really doesn't matter if some big file is laid out in 1 allocation
of 1 GB or in 250 allocations of 4MB: It really doesn't make a big
difference.

Recombining extents into bigger once, tho, can make a big difference in
an aging btrfs, even on SSDs.


That it may be an issue with using extents.
Again: please show some results of some test unit which anyone will be
able to reply and confirm or not that this effect really exist.
This shouldn't need examples.  It's trivial math combined with basic 
knowledge of hardware behavior.  Every request to a device has a minimum 
amount of overhead.  On traditional hard drives, this is usually 
dominated by seek latency, but on SSD's, the request setup, dispatch, 
and completion are the dominant factor.  Assumign you have a 2 
micro-second overhead per-request (not an exact number, just chosen for 
demonstration purposes because it makes the math easy), and a 1GB file, 
the time difference between reading ten 100MB extents and reading ten 
thousand 100kB extents is just short of 0.02 seconds, or a factor of 
about one thousand (which, no surprise here, is the factor of difference 
between the number of extents).


If problem really exist and is related ot extents you should have real
scenario explanation why ZFS is not using extents.
Extents have nothing to do with it.  What matters is how much of the 
file data is contiguous (and therefore can be read as a single request) 
and how smart the FS is about figuring that out.  Extents help figure 
that out, but the primary reason to use them is to save space encoding 
block allocations within a file (go take a look at how ext2 handles 
allocations, and then compare that to ext4, the difference is insane in 
terms of space savings).

btrfs is not to far from classic approach do FS because it srill uses
allocation structures.
This is not the case in context of ZFS because this technology has no
information about what is already allocates.
ZFS uses free lists so by negation whatever is not on free list is
already allocated.
I'm not trying to point that ZFS is better but only point that by
changing allocation strategy you may not be blasted by something like
some extents bottleneck (which sill needs to be proven)

There are at least few very good reason why it is even necessary to
change sometimes strategy from allocations structures to free lists.
First: ZFS free list management is very similar to known from Linux
memory SLAB allocator.
Did you heard that someone needs to do system memory defragnentation
because fragmented memory adds some additional latency to memory
access?
Other consequence is that with growing size of the files and number of
files or directories FS metadata are growing exponentially with size
and numbers of such objects. In case of free lists there is no such
growth and all structures are growing with linear correlation.
Caching in memory free list data takes much less than caching b-trees.
Last thing is effort on deallocating something in FS with allocation
structure and with free lists.
In classic approach number of such operations is growing with depth of b-trees.
In case free list all hat you need to do is compare ctime of the
allocated block with volume or snapshot ctime to make 

Re: defragmenting best practice?

2017-09-14 Thread Tomasz Kłoczko
On 14 September 2017 at 16:24, Kai Krakow  wrote:
[..]
> Getting e.g. boot files into read order or at least nearby improves
> boot time a lot. Similar for loading applications.

By how much it is possible to improve boot time?
Just please some example which I can try to replay which ill be
showing that we have similar results.
I still have one one of my laptops with spindle on btrfs root fs ( and
no other FSess in use) so I could be able to confirm that my numbers
are enough close to your numbers.

> Shake tries to
> improve this by rewriting the files - and this works because file
> systems (given enough free space) already do a very good job at doing
> this. But constant system updates degrade this order over time.

OK. Please prepare some database, import some data which size will be
few times of not used RAM (best if this multiplication factor will be
at least 10). Then do some batch of selects measuring distribution
latencies of those queries.
This will give you some data about. not fragmented data.
Then on next stage try to apply some number of update queries and
after reboot the system or drop all caches. and repeat the same set of
selects.
After this all what you need to do is compare distribution of the latencies.

> It really doesn't matter if some big file is laid out in 1 allocation
> of 1 GB or in 250 allocations of 4MB: It really doesn't make a big
> difference.
>
> Recombining extents into bigger once, tho, can make a big difference in
> an aging btrfs, even on SSDs.

That it may be an issue with using extents.
Again: please show some results of some test unit which anyone will be
able to reply and confirm or not that this effect really exist.

If problem really exist and is related ot extents you should have real
scenario explanation why ZFS is not using extents.
btrfs is not to far from classic approach do FS because it srill uses
allocation structures.
This is not the case in context of ZFS because this technology has no
information about what is already allocates.
ZFS uses free lists so by negation whatever is not on free list is
already allocated.
I'm not trying to point that ZFS is better but only point that by
changing allocation strategy you may not be blasted by something like
some extents bottleneck (which sill needs to be proven)

There are at least few very good reason why it is even necessary to
change sometimes strategy from allocations structures to free lists.
First: ZFS free list management is very similar to known from Linux
memory SLAB allocator.
Did you heard that someone needs to do system memory defragnentation
because fragmented memory adds some additional latency to memory
access?
Other consequence is that with growing size of the files and number of
files or directories FS metadata are growing exponentially with size
and numbers of such objects. In case of free lists there is no such
growth and all structures are growing with linear correlation.
Caching in memory free list data takes much less than caching b-trees.
Last thing is effort on deallocating something in FS with allocation
structure and with free lists.
In classic approach number of such operations is growing with depth of b-trees.
In case free list all hat you need to do is compare ctime of the
allocated block with volume or snapshot ctime to make decision about
return or not block to free list.
No matter how many snapshots, volumes, files or directories allays it
will be *just one compare* of the block or vol/snapshot ctime.
With necessity to do just only one compare comes way better
predictable behavior of whole FS and simplicity of the code making
such decisions.
In other words ZFS internally uses well know SLAB allocator with
caching some data about best possible location to allocate some
different sizes allocation unit size multiplied by n^2 like you can
see on Linux in /proc/slabinfo in case of *kmalloc* SLABs.
This is why in case of ZFS number of volumes, snapshots has zero
impact on avg speed of interactions over VFS layer.

If you will be able present real impact of the fragmentation (again
*if*) this may trigger other actions.
So AFAIK no one been able to deliver real numbers or scenarios about
such impact.
And *if* such impact really exist one of the solutions may be just
mimic what ZFS is doing (maybe there are other solutions).

So please show us test unit exposing problem with measurement
methodology presenting pathology related to fragmentation.

> Bees is, btw, not about defragmentation: I have some OS containers
> running and I want to deduplicate data after updates.

Deduplication done in userspace has natural consequences in form of
security issues.
executable doing such things will need full access to everything and
needs to have exposed some API/ABI allowing fiddle with content of the
btrfs. Which adds second batch of security related risks.

Try to have look how deduplication is working in case of ZFS without
offline deduplication.

>> In other words if someone 

Re: defragmenting best practice?

2017-09-14 Thread Kai Krakow
Am Thu, 14 Sep 2017 17:24:34 +0200
schrieb Kai Krakow :

Errors corrected, see below...


> Am Thu, 14 Sep 2017 14:31:48 +0100
> schrieb Tomasz Kłoczko :
> 
> > On 14 September 2017 at 12:38, Kai Krakow 
> > wrote: [..]  
> > >
> > > I suggest you only ever defragment parts of your main subvolume or
> > > rely on autodefrag, and let bees do optimizing the snapshots.  
> 
> Please read that again including the parts you omitted.
> 
> 
> > > Also, I experimented with adding btrfs support to shake, still
> > > working on better integration but currently lacking time... :-(
> > >
> > > Shake is an adaptive defragger which rewrites files. With my
> > > current patches it clones each file, and then rewrites it to its
> > > original location. This approach is currently not optimal as it
> > > simply bails out if some other process is accessing the file and
> > > leaves you with an (intact) temporary copy you need to move back
> > > in place manually.
> > 
> > If you really want to have real and *ideal* distribution of the data
> > across physical disk first you need to build time travel device.
> > This device will allow you to put all blocks which needs to be read
> > in perfect order (to read all data only sequentially without seek).
> > However it will be working only in case of spindles because in case
> > of SSDs there is no seek time.
> > Please let us know when you will write drivers/timetravel/ Linux
> > kernel driver. When such driver will be available I promise I'll
> > write all necessary btrfs code by myself in matter of few days (it
> > will be piece of cake compare to build such device).
> > 
> > But seriously ..  
> 
> Seriously: Defragmentation on spindles is IMHO not about getting the
> perfect continuous allocation but providing better spatial layout of
> the files you work with.
> 
> Getting e.g. boot files into read order or at least nearby improves
> boot time a lot. Similar for loading applications. Shake tries to
> improve this by rewriting the files - and this works because file
> systems (given enough free space) already do a very good job at doing
> this. But constant system updates degrade this order over time.
> 
> It really doesn't matter if some big file is laid out in 1 allocation
> of 1 GB or in 250 allocations of 4MB: It really doesn't make a big
> difference.
> 
> Recombining extents into bigger once, tho, can make a big difference
> in an aging btrfs, even on SSDs.
> 
> Bees is, btw, not about defragmentation: I have some OS containers
> running and I want to deduplicate data after updates. It seems to do a
> good job here, better than other deduplicators I found. And if some
> defrag tools destroyed your snapshot reflinks, bees can also help
> here. On its way it may recombine extents so it may improve
> fragmentation. But usually it probably defragments because it needs
 ^^^
It fragments!

> to split extents that a defragger combined.
> 
> But well, I think getting 100% continuous allocation is really not the
> achievement you want to get, especially when reflinks are a primary
> concern.
> 
> 
> > Only context/scenario when you may want to lower defragmentation is
> > when you are something needs to allocate continuous area lower than
> > free space and larger than largest free chunk. Something like this
> > happens only when volume is working on almost 100% allocated space.
> > In such scenario even you bees cannot do to much as it may be not
> > enough free space to move some other data in larger chunks to
> > defragment FS physical space.  
> 
> Bees does not do that.
> 
> 
> > If your workload will be still writing
> > new data to FS such defragmentation may give you (maybe) few more
> > seconds and just after this FS will be 100% full,
> > 
> > In other words if someone is thinking that such defragmentation
> > daemon is solving any problems he/she may be 100% right .. such
> > person is only *thinking* that this is truth.  
> 
> Bees is not about that.
> 
> 
> > kloczek
> > PS. Do you know first McGyver rule? -> "If it ain't broke, don't fix
> > it".  
> 
> Do you know the saying "think first, then act"?
> 
> 
> > So first show that fragmentation is hurting latency of the
> > access to btrfs data and it will be possible to measurable such
> > impact. Before you will start measuring this you need to learn how o
> > sample for example VFS layer latency. Do you know how to do this to
> > deliver such proof?  
> 
> You didn't get the point. You only read "defragmentation" and your
> alarm lights lid up. You even think bees would be a defragmenter. It
> probably is more the opposite because it introduces more fragments in
> exchange for more reflinks.
> 
> 
> > PS2. The same "discussions" about fragmentation where in the past
> > about +10 years ago after ZFS has been introduced. Just to let you
> > know that after initial ZFS introduction up to now was 

Re: defragmenting best practice?

2017-09-14 Thread Kai Krakow
Am Thu, 14 Sep 2017 14:31:48 +0100
schrieb Tomasz Kłoczko :

> On 14 September 2017 at 12:38, Kai Krakow 
> wrote: [..]
> >
> > I suggest you only ever defragment parts of your main subvolume or
> > rely on autodefrag, and let bees do optimizing the snapshots.

Please read that again including the parts you omitted.


> > Also, I experimented with adding btrfs support to shake, still
> > working on better integration but currently lacking time... :-(
> >
> > Shake is an adaptive defragger which rewrites files. With my current
> > patches it clones each file, and then rewrites it to its original
> > location. This approach is currently not optimal as it simply bails
> > out if some other process is accessing the file and leaves you with
> > an (intact) temporary copy you need to move back in place
> > manually.  
> 
> If you really want to have real and *ideal* distribution of the data
> across physical disk first you need to build time travel device. This
> device will allow you to put all blocks which needs to be read in
> perfect order (to read all data only sequentially without seek).
> However it will be working only in case of spindles because in case of
> SSDs there is no seek time.
> Please let us know when you will write drivers/timetravel/ Linux
> kernel driver. When such driver will be available I promise I'll
> write all necessary btrfs code by myself in matter of few days (it
> will be piece of cake compare to build such device).
> 
> But seriously ..

Seriously: Defragmentation on spindles is IMHO not about getting the
perfect continuous allocation but providing better spatial layout of
the files you work with.

Getting e.g. boot files into read order or at least nearby improves
boot time a lot. Similar for loading applications. Shake tries to
improve this by rewriting the files - and this works because file
systems (given enough free space) already do a very good job at doing
this. But constant system updates degrade this order over time.

It really doesn't matter if some big file is laid out in 1 allocation
of 1 GB or in 250 allocations of 4MB: It really doesn't make a big
difference.

Recombining extents into bigger once, tho, can make a big difference in
an aging btrfs, even on SSDs.

Bees is, btw, not about defragmentation: I have some OS containers
running and I want to deduplicate data after updates. It seems to do a
good job here, better than other deduplicators I found. And if some
defrag tools destroyed your snapshot reflinks, bees can also help here.
On its way it may recombine extents so it may improve fragmentation.
But usually it probably defragments because it needs to split extents
that a defragger combined.

But well, I think getting 100% continuous allocation is really not the
achievement you want to get, especially when reflinks are a primary
concern.


> Only context/scenario when you may want to lower defragmentation is
> when you are something needs to allocate continuous area lower than
> free space and larger than largest free chunk. Something like this
> happens only when volume is working on almost 100% allocated space.
> In such scenario even you bees cannot do to much as it may be not
> enough free space to move some other data in larger chunks to
> defragment FS physical space.

Bees does not do that.


> If your workload will be still writing
> new data to FS such defragmentation may give you (maybe) few more
> seconds and just after this FS will be 100% full,
> 
> In other words if someone is thinking that such defragmentation daemon
> is solving any problems he/she may be 100% right .. such person is
> only *thinking* that this is truth.

Bees is not about that.


> kloczek
> PS. Do you know first McGyver rule? -> "If it ain't broke, don't fix
> it".

Do you know the saying "think first, then act"?


> So first show that fragmentation is hurting latency of the
> access to btrfs data and it will be possible to measurable such
> impact. Before you will start measuring this you need to learn how o
> sample for example VFS layer latency. Do you know how to do this to
> deliver such proof?

You didn't get the point. You only read "defragmentation" and your
alarm lights lid up. You even think bees would be a defragmenter. It
probably is more the opposite because it introduces more fragments in
exchange for more reflinks.


> PS2. The same "discussions" about fragmentation where in the past
> about +10 years ago after ZFS has been introduced. Just to let you
> know that after initial ZFS introduction up to now was not written
> even single line of ZFS code to handle active fragmentation and no one
> been able to prove that something about active defragmentation needs
> to be done in case of ZFS.

Btrfs has autodefrag to reduce the number of fragments by rewriting
small portions of the file being written to. This is needed, otherwise
the feature won't be there. Why? Have you tried working with 1GB files
broken into 

Re: defragmenting best practice?

2017-09-14 Thread Tomasz Kłoczko
On 14 September 2017 at 12:38, Kai Krakow  wrote:
[..]
>
> I suggest you only ever defragment parts of your main subvolume or rely
> on autodefrag, and let bees do optimizing the snapshots.
>
> Also, I experimented with adding btrfs support to shake, still working
> on better integration but currently lacking time... :-(
>
> Shake is an adaptive defragger which rewrites files. With my current
> patches it clones each file, and then rewrites it to its original
> location. This approach is currently not optimal as it simply bails out
> if some other process is accessing the file and leaves you with an
> (intact) temporary copy you need to move back in place manually.

If you really want to have real and *ideal* distribution of the data
across physical disk first you need to build time travel device. This
device will allow you to put all blocks which needs to be read in
perfect order (to read all data only sequentially without seek).
However it will be working only in case of spindles because in case of
SSDs there is no seek time.
Please let us know when you will write drivers/timetravel/ Linux kernel driver.
When such driver will be available I promise I'll write all necessary
btrfs code by myself in matter of few days (it will be piece of cake
compare to build such device).

But seriously ..
Only context/scenario when you may want to lower defragmentation is
when you are something needs to allocate continuous area lower than
free space and larger than largest free chunk. Something like this
happens only when volume is working on almost 100% allocated space.
In such scenario even you bees cannot do to much as it may be not
enough free space to move some other data in larger chunks to
defragment FS physical space. If your workload will be still writing
new data to FS such defragmentation may give you (maybe) few more
seconds and just after this FS will be 100% full,

In other words if someone is thinking that such defragmentation daemon
is solving any problems he/she may be 100% right .. such person is
only *thinking* that this is truth.

kloczek
PS. Do you know first McGyver rule? -> "If it ain't broke, don't fix it".
So first show that fragmentation is hurting latency of the access to
btrfs data and it will be possible to measurable such impact.
Before you will start measuring this you need to learn how o sample
for example VFS layer latency. Do you know how to do this to deliver
such proof?
PS2. The same "discussions" about fragmentation where in the past
about +10 years ago after ZFS has been introduced. Just to let you
know that after initial ZFS introduction up to now was not written
even single line of ZFS code to handle active fragmentation and no one
been able to prove that something about active defragmentation needs
to be done in case of ZFS.
Why? Because all stands on the shoulders of enough cleaver *allocation
algorithm*. Only this and nothing more.
PS3. Please can we stop this/EOT?
--
Tomasz Kłoczko | LinkedIn: http://lnkd.in/FXPWxH
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-14 Thread Austin S. Hemmelgarn

On 2017-09-14 03:54, Duncan wrote:

Austin S. Hemmelgarn posted on Tue, 12 Sep 2017 13:27:00 -0400 as
excerpted:


The tricky part though is that differing workloads are impacted
differently by fragmentation.  Using just four generic examples:

* Mostly sequential write focused workloads (like security recording
systems) tend to be impacted by free space fragmentation more than data
fragmentation.  Balancing filesystems used for such workloads is likely
to give a noticeable improvement, but defragmenting probably won't give
much.
* Mostly sequential read focused workloads (like a streaming media
server)
tend to be the most impacted by data fragmentation, but aren't generally
impacted by free space fragmentation.  As a result, defrag will help
here a lot, but balance won't as much.
* Mostly random write focused workloads (like most database systems or
virtual machines) are often impacted by both free space and data
fragmentation, and are a pathological case for CoW filesystems.  Balance
and defrag will help here, but they won't help for long.
* Mostly random read focused workloads (like most non-multimedia desktop
usage) are not impacted much by either aspect, but if you're on a
traditional hard drive they can be impacted significantly by how the
data is spread across the disk.  Balance can help here, but only because
it improves data locality, not because it compacts free space.


This is a very useful analysis, particularly given the examples.  Maybe
put it on the wiki under the defrag discussion?  (Assuming something like
it isn't already there.  I've not looked in awhile.)

I've actually been meaning to write up something more thoroughly about 
this online (probably as a Gist).  When finally get around to that 
(probably in the next few weeks), I'll try to make sure a link ends up 
on the defrag page on the wiki.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-14 Thread Kai Krakow
Am Tue, 12 Sep 2017 18:28:43 +0200
schrieb Ulli Horlacher <frams...@rus.uni-stuttgart.de>:

> On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
> > When I do a 
> > btrfs filesystem defragment -r /directory
> > does it defragment really all files in this directory tree, even if
> > it contains subvolumes?
> > The man page does not mention subvolumes on this topic.  
> 
> No answer so far :-(
> 
> But I found another problem in the man-page:
> 
>   Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as
> well as with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or
> >= 3.13.4 will break up the ref-links of COW data (for example files
> >copied with
>   cp --reflink, snapshots or de-duplicated data). This may cause
>   considerable increase of space usage depending on the broken up
>   ref-links.
> 
> I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
> snapshots.
> Therefore, I better should avoid calling "btrfs filesystem defragment
> -r"?
> 
> What is the defragmenting best practice?
> Avoid it completly?

You may want to try https://github.com/Zygo/bees. It is a daemon
watching the file system generation changes, scanning the blocks and
then recombines them. Of course, this process somewhat defeats the
purpose of defragging in the first place as it will undo some of the
defragmenting.

I suggest you only ever defragment parts of your main subvolume or rely
on autodefrag, and let bees do optimizing the snapshots.

Also, I experimented with adding btrfs support to shake, still working
on better integration but currently lacking time... :-(

Shake is an adaptive defragger which rewrites files. With my current
patches it clones each file, and then rewrites it to its original
location. This approach is currently not optimal as it simply bails out
if some other process is accessing the file and leaves you with an
(intact) temporary copy you need to move back in place manually.

Shake works very well with the idea of detecting how defragmented, how
old, and how far away from an "ideal" position a file is and exploits
standard Linux file systems behavior to optimally placing files by
rewriting them. It then records its status per file in extended
attributes. It also works with non-btrfs file systems. My patches try
to avoid defragging files with shared extents, so this may help your
situation. However, it will still shuffle files around if they are too
far from their ideal position, thus destroying shared extents. A future
patch could use extent recombining and skip shared extents in that
process. But first I'd like to clean out some of the rough edges
together with the original author of shake.

Look here: https://github.com/unbrice/shake and also check out the pull
requests and comments there. You shouldn't currently run shake
unattended and only on specific parts of your FS you feel need
defragmenting.


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-14 Thread Duncan
Austin S. Hemmelgarn posted on Tue, 12 Sep 2017 13:27:00 -0400 as
excerpted:

> The tricky part though is that differing workloads are impacted
> differently by fragmentation.  Using just four generic examples:
> 
> * Mostly sequential write focused workloads (like security recording
> systems) tend to be impacted by free space fragmentation more than data
> fragmentation.  Balancing filesystems used for such workloads is likely
> to give a noticeable improvement, but defragmenting probably won't give
> much.
> * Mostly sequential read focused workloads (like a streaming media
> server)
> tend to be the most impacted by data fragmentation, but aren't generally
> impacted by free space fragmentation.  As a result, defrag will help
> here a lot, but balance won't as much.
> * Mostly random write focused workloads (like most database systems or
> virtual machines) are often impacted by both free space and data
> fragmentation, and are a pathological case for CoW filesystems.  Balance
> and defrag will help here, but they won't help for long.
> * Mostly random read focused workloads (like most non-multimedia desktop
> usage) are not impacted much by either aspect, but if you're on a
> traditional hard drive they can be impacted significantly by how the
> data is spread across the disk.  Balance can help here, but only because
> it improves data locality, not because it compacts free space.

This is a very useful analysis, particularly given the examples.  Maybe 
put it on the wiki under the defrag discussion?  (Assuming something like 
it isn't already there.  I've not looked in awhile.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: defragmenting best practice?

2017-09-12 Thread Austin S. Hemmelgarn

On 2017-09-12 12:28, Ulli Horlacher wrote:

On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:

When I do a
btrfs filesystem defragment -r /directory
does it defragment really all files in this directory tree, even if it
contains subvolumes?
The man page does not mention subvolumes on this topic.


No answer so far :-(
I hadn't seen your original mail, otherwise I probably would have 
responded.  Sorry about that.


On the note of the original question:
I'm pretty sure that it does recursively operate on nested subvolumes. 
The documentation doesn't say otherwise, and not doing so would be 
non-intuitive to people who don't know anything about subvolumes.


But I found another problem in the man-page:

   Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
   with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
   will break up the ref-links of COW data (for example files copied with
   cp --reflink, snapshots or de-duplicated data). This may cause
   considerable increase of space usage depending on the broken up
   ref-links.

I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?

What is the defragmenting best practice?

That really depends on what you're doing.

First, you need to understand that defrag won't break _all_ reflinks, 
just the particular instances you point it at.  So, if you have 
subvolume A, and snapshots S1 and S2 of that subvolume A, then running 
defrag on _just_ subvolume A will break the reflinks between it and the 
snapshots, but S1 and S2 will still share any data they were originally 
with each other.  If you then take a third snapshot of A, it will share 
data with A, but not with S1 or S2 (because A is no longer sharing data 
with S1 or S2).


Given this behavior, you have in turn three potential cases when talking 
about persistent snapshots:


1. You care about minimizing space used, but aren't as worried about 
performance.  In this case, the only option is to not run defrag at all.
2. You care about performance, but not space usage.  In this case, 
defragment everything.
3. You care about both space usage and performance.  In this case, I 
would personally suggest defragmenting only the source subvolume (so 
only subvolume A in the above explanation), and doing so on a schedule 
that coincides with snapshot rotation.  The idea is to defrag just 
before you take a snapshot, and at a frequency that gives a good balance 
between space usage and performance.  As a general rule, if you take 
this route, start by doing the defrag on either a monthly basis if 
you're doing daily or weekly snapshots, or with every fourth snapshot if 
not, and then adjust the interval based on how that impacts your space 
usage.


Additionally, you can compact free space without defragmenting data or 
breaking reflinks by running a full balance on the filesystem.


The tricky part though is that differing workloads are impacted 
differently by fragmentation.  Using just four generic examples:


* Mostly sequential write focused workloads (like security recording 
systems) tend to be impacted by free space fragmentation more than data 
fragmentation.  Balancing filesystems used for such workloads is likely 
to give a noticeable improvement, but defragmenting probably won't give 
much.
* Mostly sequential read focused workloads (like a streaming media 
server) tend to be the most impacted by data fragmentation, but aren't 
generally impacted by free space fragmentation.  As a result, defrag 
will help here a lot, but balance won't as much.
* Mostly random write focused workloads (like most database systems or 
virtual machines) are often impacted by both free space and data 
fragmentation, and are a pathological case for CoW filesystems.  Balance 
and defrag will help here, but they won't help for long.
* Mostly random read focused workloads (like most non-multimedia desktop 
usage) are not impacted much by either aspect, but if you're on a 
traditional hard drive they can be impacted significantly by how the 
data is spread across the disk.  Balance can help here, but only because 
it improves data locality, not because it compacts free space.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


defragmenting best practice?

2017-09-12 Thread Ulli Horlacher
On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
> When I do a 
> btrfs filesystem defragment -r /directory
> does it defragment really all files in this directory tree, even if it
> contains subvolumes?
> The man page does not mention subvolumes on this topic.

No answer so far :-(

But I found another problem in the man-page:

  Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
  with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
  will break up the ref-links of COW data (for example files copied with
  cp --reflink, snapshots or de-duplicated data). This may cause
  considerable increase of space usage depending on the broken up
  ref-links.

I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?

What is the defragmenting best practice?
Avoid it completly?



-- 
Ullrich Horlacher  Server und Virtualisierung
Rechenzentrum TIK 
Universitaet Stuttgart E-Mail: horlac...@tik.uni-stuttgart.de
Allmandring 30aTel:++49-711-68565868
70569 Stuttgart (Germany)  WWW:http://www.tik.uni-stuttgart.de/
REF:<20170831070558.gb5...@rus.uni-stuttgart.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html