Re: [ceph-users] CephFS deletion performance

2019-09-18 Thread Hector Martin
On 17/09/2019 17.46, Yan, Zheng wrote:
> when a snapshoted directory is deleted, mds moves the directory into
> to stray directory.  You have 57k strays, each time mds have a cache
> miss for stray, mds needs to load a stray dirfrag. This is very
> inefficient because a stray dirfrag contains lots of items, most items
> are useless.

Okay, clearly the current snapshot solution won't work for us then, so
I'm moving the snapshotting to the target filesystem we use for backups
so that the production filesystem at least doesn't suffer from this
issue. Are there any plans to change this to make it more efficient?

How does the MDS decide when to clear out stray files after snapshots
are deleted? I've been removing a bunch, and while the stray count has
gone down, it hasn't been going down as fast as I expect... I'm worried
we may have a leak and some strays are never getting cleaned up. I guess
I'll see once I catch up on snapshot deletions.

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS deletion performance

2019-09-17 Thread Yan, Zheng
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin  wrote:
>
> On 13/09/2019 16.25, Hector Martin wrote:
> > Is this expected for CephFS? I know data deletions are asynchronous, but
> > not being able to delete metadata/directories without an undue impact on
> > the whole filesystem performance is somewhat problematic.
>
> I think I'm getting a feeling for who the culprit is here. I just
> noticed that listing directories in a snapshot that were subsequently
> deleted *also* performs horribly, and kills cluster performance too.
>
> We just had a partial outage due to this; a snapshot+rsync triggered
> while a round of deletions were happening, and as far as I can tell,
> when it caught up to newly deleted files, MDS performance tanked as it
> repeatedly had to open stray dirs under the hood. In fact, the
> inode/dentry metrics (opened/closed) skyrocketed during that period,
> from the normal ~1Kops from multiple parallel rsyncs to ~15Kops.
>
> As I mentioned in a prior message to the list, we have ~570k stray files
> due to snapshots. It makes sense that deleting a directory/file means
> moving it to a stray directory (each holding ~57k files already), and
> accessing a deleted file via a snapshot means accessing the stray
> directory. Am I right in thinking that these operations are at least
> O(n) in the amount of strays, and in fact may iterate or otherwise touch
> every single file in the stray directories? (This would explain the
> sudden 15Kops spike in indoe/dentry activity). It seems that with such
> bloated stray dirs, anything that involves them under the scenes just
> make the MDS completely hiccup and grind away, affecting performance for
> any other clients.
>
> I guess at this point we'll have to drastically cut down the time span
> for which we keep CephFS snapshots. Maybe I'll move the snapshot history
> keeping to the backup target, at least then it won't affect production
> data. But since we plan on using the other cluster for production too
> eventually, that would mean we need to use multi-FS in order to isolate
> the workloads...
>

when a snapshoted directory is deleted, mds moves the directory into
to stray directory.  You have 57k strays, each time mds have a cache
miss for stray, mds needs to load a stray dirfrag. This is very
inefficient because a stray dirfrag contains lots of items, most items
are useless.


> --
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS deletion performance

2019-09-14 Thread Hector Martin
On 13/09/2019 16.25, Hector Martin wrote:
> Is this expected for CephFS? I know data deletions are asynchronous, but 
> not being able to delete metadata/directories without an undue impact on 
> the whole filesystem performance is somewhat problematic.

I think I'm getting a feeling for who the culprit is here. I just
noticed that listing directories in a snapshot that were subsequently
deleted *also* performs horribly, and kills cluster performance too.

We just had a partial outage due to this; a snapshot+rsync triggered
while a round of deletions were happening, and as far as I can tell,
when it caught up to newly deleted files, MDS performance tanked as it
repeatedly had to open stray dirs under the hood. In fact, the
inode/dentry metrics (opened/closed) skyrocketed during that period,
from the normal ~1Kops from multiple parallel rsyncs to ~15Kops.

As I mentioned in a prior message to the list, we have ~570k stray files
due to snapshots. It makes sense that deleting a directory/file means
moving it to a stray directory (each holding ~57k files already), and
accessing a deleted file via a snapshot means accessing the stray
directory. Am I right in thinking that these operations are at least
O(n) in the amount of strays, and in fact may iterate or otherwise touch
every single file in the stray directories? (This would explain the
sudden 15Kops spike in indoe/dentry activity). It seems that with such
bloated stray dirs, anything that involves them under the scenes just
make the MDS completely hiccup and grind away, affecting performance for
any other clients.

I guess at this point we'll have to drastically cut down the time span
for which we keep CephFS snapshots. Maybe I'll move the snapshot history
keeping to the backup target, at least then it won't affect production
data. But since we plan on using the other cluster for production too
eventually, that would mean we need to use multi-FS in order to isolate
the workloads...

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS deletion performance

2019-09-13 Thread Hector Martin
We have a cluster running CephFS with metadata on SSDs and data split 
between SSDs and OSDs (main pool is on HDDs, some subtrees are on an SSD 
pool).


We're seeing quite poor deletion performance, especially for 
directories. It seems that previously empty directories are often 
deleted quickly, but unlinkat() on any directory that used to contain 
data often takes upwards of a second. Stracing a simple `rm -r`:


unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.002668>
unlinkat(6, "INBOX", AT_REMOVEDIR)  = 0 <2.045551>
unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.005872>
unlinkat(6, "Trash", AT_REMOVEDIR)  = 0 <1.918497>
unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.012609>
unlinkat(6, "Spam", AT_REMOVEDIR)   = 0 <1.743648>
unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.016548>
unlinkat(6, "Sent", AT_REMOVEDIR)   = 0 <2.295136>
unlinkat(5, "mailboxes", AT_REMOVEDIR)  = 0 <0.735630>
unlinkat(4, "mdbox", AT_REMOVEDIR)  = 0 <0.686786>

(all those dbox-Mails subdirectories are empty children of the 
folder-name directories)


It also seems that these deletions have a huge impact on cluster 
performance, across hosts. This is the global MDS op latency impact of 
doing first 1, then 6 parallel 'rm -r' instances from a host that is 
otherwise not doing anything else:


https://mrcn.st/t/Screenshot_20190913_161500.png

(I had to stop the 6-parallel run because it was completely trashing 
cluster performance for live serving machines; I wound up with load 
average >900 on one of them).


The OSD SSDs/HDDs are not significantly busier during the deletions, nor 
is CPU usage on the MDS much at that time, so I'm not sure what the 
bottleneck is here.


Is this expected for CephFS? I know data deletions are asynchronous, but 
not being able to delete metadata/directories without an undue impact on 
the whole filesystem performance is somewhat problematic.



--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com