Re: [ceph-users] CephFS deletion performance
On 17/09/2019 17.46, Yan, Zheng wrote: > when a snapshoted directory is deleted, mds moves the directory into > to stray directory. You have 57k strays, each time mds have a cache > miss for stray, mds needs to load a stray dirfrag. This is very > inefficient because a stray dirfrag contains lots of items, most items > are useless. Okay, clearly the current snapshot solution won't work for us then, so I'm moving the snapshotting to the target filesystem we use for backups so that the production filesystem at least doesn't suffer from this issue. Are there any plans to change this to make it more efficient? How does the MDS decide when to clear out stray files after snapshots are deleted? I've been removing a bunch, and while the stray count has gone down, it hasn't been going down as fast as I expect... I'm worried we may have a leak and some strays are never getting cleaned up. I guess I'll see once I catch up on snapshot deletions. -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS deletion performance
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin wrote: > > On 13/09/2019 16.25, Hector Martin wrote: > > Is this expected for CephFS? I know data deletions are asynchronous, but > > not being able to delete metadata/directories without an undue impact on > > the whole filesystem performance is somewhat problematic. > > I think I'm getting a feeling for who the culprit is here. I just > noticed that listing directories in a snapshot that were subsequently > deleted *also* performs horribly, and kills cluster performance too. > > We just had a partial outage due to this; a snapshot+rsync triggered > while a round of deletions were happening, and as far as I can tell, > when it caught up to newly deleted files, MDS performance tanked as it > repeatedly had to open stray dirs under the hood. In fact, the > inode/dentry metrics (opened/closed) skyrocketed during that period, > from the normal ~1Kops from multiple parallel rsyncs to ~15Kops. > > As I mentioned in a prior message to the list, we have ~570k stray files > due to snapshots. It makes sense that deleting a directory/file means > moving it to a stray directory (each holding ~57k files already), and > accessing a deleted file via a snapshot means accessing the stray > directory. Am I right in thinking that these operations are at least > O(n) in the amount of strays, and in fact may iterate or otherwise touch > every single file in the stray directories? (This would explain the > sudden 15Kops spike in indoe/dentry activity). It seems that with such > bloated stray dirs, anything that involves them under the scenes just > make the MDS completely hiccup and grind away, affecting performance for > any other clients. > > I guess at this point we'll have to drastically cut down the time span > for which we keep CephFS snapshots. Maybe I'll move the snapshot history > keeping to the backup target, at least then it won't affect production > data. But since we plan on using the other cluster for production too > eventually, that would mean we need to use multi-FS in order to isolate > the workloads... > when a snapshoted directory is deleted, mds moves the directory into to stray directory. You have 57k strays, each time mds have a cache miss for stray, mds needs to load a stray dirfrag. This is very inefficient because a stray dirfrag contains lots of items, most items are useless. > -- > Hector Martin (hec...@marcansoft.com) > Public Key: https://mrcn.st/pub > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS deletion performance
On 13/09/2019 16.25, Hector Martin wrote: > Is this expected for CephFS? I know data deletions are asynchronous, but > not being able to delete metadata/directories without an undue impact on > the whole filesystem performance is somewhat problematic. I think I'm getting a feeling for who the culprit is here. I just noticed that listing directories in a snapshot that were subsequently deleted *also* performs horribly, and kills cluster performance too. We just had a partial outage due to this; a snapshot+rsync triggered while a round of deletions were happening, and as far as I can tell, when it caught up to newly deleted files, MDS performance tanked as it repeatedly had to open stray dirs under the hood. In fact, the inode/dentry metrics (opened/closed) skyrocketed during that period, from the normal ~1Kops from multiple parallel rsyncs to ~15Kops. As I mentioned in a prior message to the list, we have ~570k stray files due to snapshots. It makes sense that deleting a directory/file means moving it to a stray directory (each holding ~57k files already), and accessing a deleted file via a snapshot means accessing the stray directory. Am I right in thinking that these operations are at least O(n) in the amount of strays, and in fact may iterate or otherwise touch every single file in the stray directories? (This would explain the sudden 15Kops spike in indoe/dentry activity). It seems that with such bloated stray dirs, anything that involves them under the scenes just make the MDS completely hiccup and grind away, affecting performance for any other clients. I guess at this point we'll have to drastically cut down the time span for which we keep CephFS snapshots. Maybe I'll move the snapshot history keeping to the backup target, at least then it won't affect production data. But since we plan on using the other cluster for production too eventually, that would mean we need to use multi-FS in order to isolate the workloads... -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS deletion performance
We have a cluster running CephFS with metadata on SSDs and data split between SSDs and OSDs (main pool is on HDDs, some subtrees are on an SSD pool). We're seeing quite poor deletion performance, especially for directories. It seems that previously empty directories are often deleted quickly, but unlinkat() on any directory that used to contain data often takes upwards of a second. Stracing a simple `rm -r`: unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.002668> unlinkat(6, "INBOX", AT_REMOVEDIR) = 0 <2.045551> unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.005872> unlinkat(6, "Trash", AT_REMOVEDIR) = 0 <1.918497> unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.012609> unlinkat(6, "Spam", AT_REMOVEDIR) = 0 <1.743648> unlinkat(7, "dbox-Mails", AT_REMOVEDIR) = 0 <0.016548> unlinkat(6, "Sent", AT_REMOVEDIR) = 0 <2.295136> unlinkat(5, "mailboxes", AT_REMOVEDIR) = 0 <0.735630> unlinkat(4, "mdbox", AT_REMOVEDIR) = 0 <0.686786> (all those dbox-Mails subdirectories are empty children of the folder-name directories) It also seems that these deletions have a huge impact on cluster performance, across hosts. This is the global MDS op latency impact of doing first 1, then 6 parallel 'rm -r' instances from a host that is otherwise not doing anything else: https://mrcn.st/t/Screenshot_20190913_161500.png (I had to stop the 6-parallel run because it was completely trashing cluster performance for live serving machines; I wound up with load average >900 on one of them). The OSD SSDs/HDDs are not significantly busier during the deletions, nor is CPU usage on the MDS much at that time, so I'm not sure what the bottleneck is here. Is this expected for CephFS? I know data deletions are asynchronous, but not being able to delete metadata/directories without an undue impact on the whole filesystem performance is somewhat problematic. -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com