On Sun, Mar 22, 2015 at 09:11:03AM +0100, Marc Cousin wrote:
> I've noticed this problem for a while (I started to use snapper a
> while ago): while destroying snapshots, it's almost impossible to do
> IO on the volume.

The snapshots get cleaned in the background, which usuall touches lots
of data (depending on the "age" of the extents, IOW the level of sharing
among the live and deleted snapshots).

The slowdown is caused due to contention on the metadata (locking,
readig from disk, scattered blocks, lots of seeking).

Snapper might add to that if you have

EMPTY_PRE_POST_CLEANUP="yes"

as it reads the pre/post snapshots and deletes them if the diff is
empty. This adds some IO stress.

> The btrfs cleaner is 100% active:
> 
>  1501 root      20   0       0      0      0 R 100,0  0,0   9:10.40 
> [btrfs-cleaner]    

That points to the snapshot cleaning, but the cleaner thread does more
than that. It may also process delayed file deletion and work scheduled
if 'autodefrag' is on.

> As soon as it terminates its job, the filesystem becomes usable again.
> But while waiting it is extremely unresponsive. Any program doing a
> write there hangs.
> 
> Some more information: the 3 disks are 2.7TB, the raid is RAID1
> 
> # btrfs fi df /mnt/btrfs
> Data, RAID1: total=3.18TiB, used=3.14TiB
> System, RAID1: total=32.00MiB, used=480.00KiB
> Metadata, RAID1: total=6.00GiB, used=4.32GiB

Space looks ok, enough data and metadata so it's not affected by low
space conditions.

> What is "funny" is that the filesystem seems to be working again when
> there is some IO activity and btrfs-cleaner gets to a lower cpu usage
> (around 70%).

Possibly a behaviour caused by scheduling (both cpu and io), the other
process gets a slice and slows down cleaner that hogs the system.

> By the way, there are quite a few snapshots there:
> 
> # btrfs subvolume  list /mnt/btrfs | wc -l
> 142
> 
> and I think snapper tries to destroy around 10 of them on one go.

The snapshots get cleaned in the order of deletion, and if there is some
amount of sharing, the metadata blocks are probably cached. So it may
actually help to delete them in a group.

> I can do whatever test you want, as long as I keep the data on my disks :)

So far it looks like effects of filesystem aging in the presence of
snapshots. Right now, I think we could try to somehow adjust the io
scheduling priority in case the cleaner processes the deleted
subvolumes, but this is unfortunatelly done in an asynchronous manner
and the metadata are read by other threads so this could be fairly
intrusive patch.

Or, introduce an interface that would set the subvolume cleaning
behaviour, eg. only if idle, or normal or immediate. The dead and
uncleaned subvolumes pin the space so we cannot make it 'idle' by
default.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to