On 25/03/2015 02:19, David Sterba wrote:
> 
> The snapshots get cleaned in the background, which usuall touches lots
> of data (depending on the "age" of the extents, IOW the level of sharing
> among the live and deleted snapshots).
> 
> The slowdown is caused due to contention on the metadata (locking,
> readig from disk, scattered blocks, lots of seeking).
> 
> Snapper might add to that if you have
> 
> EMPTY_PRE_POST_CLEANUP="yes"
> 
> as it reads the pre/post snapshots and deletes them if the diff is
> empty. This adds some IO stress.

I couldn't find a clear explanation in the documentation. Does it mean
that when there is absolutely no difference between two snapshots, one
of them is deleted ? And that snapper does a diff between them to
determine that ?

If so, yes, I can remove it, I don't care about that :)

> 
>> The btrfs cleaner is 100% active:
>>
>>  1501 root      20   0       0      0      0 R 100,0  0,0   9:10.40 
>> [btrfs-cleaner]    
> 
> That points to the snapshot cleaning, but the cleaner thread does more
> than that. It may also process delayed file deletion and work scheduled
> if 'autodefrag' is on.

autodefrag is activated. These are mechanical drives, so I'd rather keep
it on, shouldn't I ?

> 
>> What is "funny" is that the filesystem seems to be working again when
>> there is some IO activity and btrfs-cleaner gets to a lower cpu usage
>> (around 70%).
> 
> Possibly a behaviour caused by scheduling (both cpu and io), the other
> process gets a slice and slows down cleaner that hogs the system.

I have almost no IO on these disks during the problem (I had put an
iostat on the first email). Only one CPU core at 100% load. That's why I
felt it looked more like a locking or serialization issue.

> 
>> By the way, there are quite a few snapshots there:
>>
>> # btrfs subvolume  list /mnt/btrfs | wc -l
>> 142
>>
>> and I think snapper tries to destroy around 10 of them on one go.
> 
> The snapshots get cleaned in the order of deletion, and if there is some
> amount of sharing, the metadata blocks are probably cached. So it may
> actually help to delete them in a group.

There is a lot of sharing between the snapshots. Only a few files are
altered between them. I think I only have the slowdown while the kernel
thread is at 100%. When it is lower (and I have disk activity), I have a
slight slowdown, but it is completely bearable.

> 
>> I can do whatever test you want, as long as I keep the data on my disks :)
> 
> So far it looks like effects of filesystem aging in the presence of
> snapshots. Right now, I think we could try to somehow adjust the io
> scheduling priority in case the cleaner processes the deleted
> subvolumes, but this is unfortunatelly done in an asynchronous manner
> and the metadata are read by other threads so this could be fairly
> intrusive patch.
I have almost no IO when the problem occurs.


Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to