[ceph-users] Clients failing to respond to capability release
Hi, I've seen this issue mentioned in the past, but with older releases. So I'm wondering if anybody has any pointers. The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all clients are working fine, with the exception of our backup server. This is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1] (so I suspect a newer Ceph version?). The backup server has multiple (12) CephFS mount points. One of them, the busiest, regularly causes this error on the cluster: HEALTH_WARN 1 clients failing to respond to capability release [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing to respond to capability release client_id: 521306112 And occasionally, which may be unrelated, but occurs at the same time: [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs The second one clears itself, but the first sticks until I can unmount the filesystem on the client after the backup completes. It appears that whilst it's in this stuck state there may be one or more directory trees that are inaccessible to all clients. The backup server is walking the whole tree but never gets stuck itself, so either the inaccessible directory entry is caused after it has gone past, or it's not affected. Maybe the backup server is holding a directory when it shouldn't? It may be that an upgrade to Quincy resolves this, since it's more likely to be inline with the kernel client version wise, but I don't want to knee-jerk upgrade just to try and fix this problem. Thanks for any advice. Tim. [1] The reason for the newer kernel is that the backup performance from CephFS was terrible with older kernels. This newer kernel does at least resolve that issue. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Clients failing to respond to capability release
Hi Stefan, On Wed, Sep 20, 2023 at 11:00:12AM +0200, Stefan Kooman wrote: > On 19-09-2023 13:35, Tim Bishop wrote: > > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all > > clients are working fine, with the exception of our backup server. This > > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1] > > (so I suspect a newer Ceph version?). > > > > The backup server has multiple (12) CephFS mount points. One of them, > > the busiest, regularly causes this error on the cluster: > > > > HEALTH_WARN 1 clients failing to respond to capability release > > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability > > release > > mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing > > to respond to capability release client_id: 521306112 > > > > And occasionally, which may be unrelated, but occurs at the same time: > > > > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests > > mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs > > > > The second one clears itself, but the first sticks until I can unmount > > the filesystem on the client after the backup completes. > > You are not alone. We also have a backup server running 22.04 and 6.2 and > occasionally hit this issue. We hit this with mainly 5.12.19 clients and a > 6.2 backup server. We're on 16.2.11. > > > Sidenote: > > For those of you who are wondering: why would you want to use latest > (greatest?) linux kernel for CephFS ... this is why. To try to get rid of 1) > slow requests because of some deadlock / locking issue, clients failing to > capability release, and 3) bug fixes / improvements (thx devs!). > > > Questions: > > Do you have the filesystem read only mounted and given the backup server > CephFS client read only caps on the MDS? Yes, mounted read-only and the caps for the client are read-only for the MDS. I do have multiple mounts from the same CephFS filesystem though, and I've been wondering if that could be causing more parallel requests from the backup server. I'd been thinking about doing it through a single mount, but then all the paths change which doesn't make the backups overly happy. > Are you running a multiple active MDS setup? No. We tried it for a while but after seeing some issues like this we backtracked to a single active MDS to rule out multiple active being the issue. > > It appears that whilst it's in this stuck state there may be one or more > > directory trees that are inaccessible to all clients. The backup server > > is walking the whole tree but never gets stuck itself, so either the > > inaccessible directory entry is caused after it has gone past, or it's > > not affected. Maybe the backup server is holding a directory when it > > shouldn't? > > We have seen both cases, yet most of the time the backup server would not be > able to make progress and be stuck on a file. Interesting. Backups have never got stuck for us. Whilst we regularly, pretty much daily, see the above mentioned error. But because nothing we're directly running gets stuck I only find out if a directory somewhere is inaccessible if a user reports it to us from one of our other client machines, usually a HPC node. > > It may be that an upgrade to Quincy resolves this, since it's more > > likely to be inline with the kernel client version wise, but I don't > > want to knee-jerk upgrade just to try and fix this problem. > > We are testing with 6.5 kernel clients (see other recent threads about > this). We have not seen this issue there (but time will tell, it does not > happen *that* often, but hit other issues). > > The MDS server itself is indeed older than the newer kernel clients. It > might certainly be a factor. And that raises the question what kind of > interoperability / compatibility tests (if any) are done between CephFS > (kernel) clients and MDS server versions. This might be a good "focus topic" > for a ceph User + Dev meeting ... > > > Thanks for any advice. > > You might want to try 6.5.x kernel on the clients. But might run into other > issues. Not sure about that, these might be only relevant for one of our > workloads, only one way to find out ... I've been sticking with what's available in Ubuntu - the 6.2 kernel is part of their HWE enablement stack, which is handy. It won't be long until 23.10 is out with the 6.5 kernel though. I'll definitely give it a try then. > Enable debug logging on the MDS to gather logs that might shine some light > on what is happening wit
[ceph-users] Re: Clients failing to respond to capability release
his automatically and >now we have no incidents of this. > > Yesterday for example, we had an incident of multiple MDS warning messages > of: MDS_SLOW_REQUEST, MDS_TRIM, MDS_CLIENT_RECALL, and > MDS_CLIENT_LATE_RELEASE. This was caused by a non-responsive hard-drive > leading to a build up of the MDS cache and trims being unable to be > completed where we managed to narrow down the hard-drive for the inode which > the blocked client was waiting for a rdlock on and restarted the OSD for > that the drive. Notably this hard-drive had no errors with smartctl or > elsewhere and only had the following slow ops message on OSD systemctl > status/ log: > > osd.281 123892 get_health_metrics reporting 4 slow ops, oldest is > osd_op(client.49576654.0:9101 3.d9bs0 3.ee6f2d9b (undecoded) > ondisk+read+known_if_redirected e123889) > > Whilst restarting the hard-drive that the client with the oldest blocked op > was waiting for did "clear" this /sys/kernel/debug/ceph/*/osdc queue, the > oldest blocked MDS op then became an "internal op fragmentdir:mds.0:1" one > where restarting the active MDS cleared this. Alas, this resulted in another > blocked getattr op at the flag point "failed to authpin, dir is being > fragmented" which was similarly tackled by restarting the MDS that just took > over. This finally resulted in only two clients failing to respond to caps > releases on inodes they were holding (despite rebooting at the time) where > performing a "ceph tell mds.N session kill CLIENT_ID" removed them from the > session map and allow the MDS' cache to become manageable again, thereby > clearing all of these warning messages. > > We've had this problem since the beginning of this year and upgrading from > octopus to quincy has unfortunately not solved our problem. We've only > really been able to solve this problem by undergoing an aggressive campaign > of replacing hard-drives which were reaching the end of their lives. This > has substantially reduced the amount of problems we've had in relation to > this. > > We would be very interested to hear about the rest of the community's > experience in relation to this and I would recommend looking at your > underlying OSDs Tim to see whether there are any timeout or uncorrectable > errors. We would also be very eager to hear if these approaches are > sub-optimal and whether anyone else has any insight into our problems. Sorry > as well for resurrecting an old thread but we thought our experiences may be > helpfully for others! > > Kindest regards, > > Ivan Clayson > > On 19/09/2023 12:35, Tim Bishop wrote: > > Hi, > > > > I've seen this issue mentioned in the past, but with older releases. So > > I'm wondering if anybody has any pointers. > > > > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all > > clients are working fine, with the exception of our backup server. This > > is using the kernel CephFS client on Ubuntu 22.04 with kernel 6.2.0 [1] > > (so I suspect a newer Ceph version?). > > > > The backup server has multiple (12) CephFS mount points. One of them, > > the busiest, regularly causes this error on the cluster: > > > > HEALTH_WARN 1 clients failing to respond to capability release > > [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability > > release > > mds.mds-server(mds.0): Client backupserver:cephfs-backupserver failing > > to respond to capability release client_id: 521306112 > > > > And occasionally, which may be unrelated, but occurs at the same time: > > > > [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests > > mds.mds-server(mds.0): 1 slow requests are blocked > 30 secs > > > > The second one clears itself, but the first sticks until I can unmount > > the filesystem on the client after the backup completes. > > > > It appears that whilst it's in this stuck state there may be one or more > > directory trees that are inaccessible to all clients. The backup server > > is walking the whole tree but never gets stuck itself, so either the > > inaccessible directory entry is caused after it has gone past, or it's > > not affected. Maybe the backup server is holding a directory when it > > shouldn't? > > > > It may be that an upgrade to Quincy resolves this, since it's more > > likely to be inline with the kernel client version wise, but I don't > > want to knee-jerk upgrade just to try and fix this problem. > > > > Thanks for any advice. > > > > Tim. > > > &g
[ceph-users] Advice on balancing data across OSDs
Hi all, ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable) We're having an issue with the spread of data across our OSDs. We have 108 OSDs in our cluster, all identical disk size, same number in each server, and the same number of servers in each rack. So I'd hoped we'd end up with a pretty balanced distribution of data across the disks. However, the fullest is at 85% full and the most empty is at 40% full. I've included the osd df output below, along with pool and crush rules. I've also looked at the reweight-by-utilization command which would apparently help: # ceph osd test-reweight-by-utilization moved 16 / 5715 (0.279965%) avg 52.9167 stddev 7.20998 -> 7.15325 (expected baseline 7.24063) min osd.45 with 31 -> 31 pgs (0.585827 -> 0.585827 * mean) max osd.22 with 70 -> 68 pgs (1.32283 -> 1.28504 * mean) oload 120 max_change 0.05 max_change_osds 4 average_utilization 0.6229 overload_utilization 0.7474 osd.22 weight 1. -> 0.9500 osd.23 weight 1. -> 0.9500 osd.53 weight 1. -> 0.9500 osd.78 weight 1. -> 0.9500 no change But I'd like to make sure I understand why the problem is occuring first so I can rule out a configuration issue, since it feels like the cluster shouldn't be getting in to this state in the first place. I have some suspicions that the number of PGs may be a bit low on some pools, but autoscale-status is set to "on" or "warn" for every pool, so it's happy with the current numbers. Does it play nice with CephFS? Thanks for any advice. Tim. ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 22hdd 3.63199 1.0 3.6 TiB 3.1 TiB 3.1 TiB 450 MiB 7.6 GiB 557 GiB 85.04 1.37 70 up 23hdd 3.63199 1.0 3.6 TiB 2.9 TiB 2.9 TiB 459 MiB 7.5 GiB 759 GiB 79.64 1.28 64 up 53hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 703 MiB 8.0 GiB 823 GiB 77.91 1.25 66 up 78hdd 3.63799 1.0 3.6 TiB 2.8 TiB 2.8 TiB 187 MiB 5.9 GiB 851 GiB 77.15 1.24 61 up 26hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 432 MiB 7.7 GiB 854 GiB 77.07 1.24 61 up 39hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 503 MiB 7.2 GiB 874 GiB 76.55 1.23 65 up 42hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.7 TiB 439 MiB 6.6 GiB 909 GiB 75.59 1.21 60 up 101hdd 3.63820 1.0 3.6 TiB 2.7 TiB 2.7 TiB 306 MiB 7.1 GiB 913 GiB 75.50 1.21 61 up 87hdd 3.63820 1.0 3.6 TiB 2.7 TiB 2.7 TiB 539 MiB 7.5 GiB 921 GiB 75.27 1.21 61 up 59hdd 3.63799 1.0 3.6 TiB 2.7 TiB 2.7 TiB 721 MiB 7.9 GiB 957 GiB 74.30 1.19 64 up 79hdd 3.63799 1.0 3.6 TiB 2.7 TiB 2.7 TiB 950 MiB 9.0 GiB 970 GiB 73.95 1.19 58 up 34hdd 3.63199 1.0 3.6 TiB 2.7 TiB 2.7 TiB 202 MiB 6.0 GiB 974 GiB 73.85 1.19 57 up 60hdd 3.63799 1.0 3.6 TiB 2.7 TiB 2.6 TiB 668 MiB 7.2 GiB 1009 GiB 72.91 1.17 59 up 18hdd 3.63199 1.0 3.6 TiB 2.6 TiB 2.6 TiB 453 MiB 6.5 GiB 1021 GiB 72.59 1.17 60 up 74hdd 3.63799 1.0 3.6 TiB 2.6 TiB 2.6 TiB 693 MiB 7.5 GiB 1.0 TiB 72.12 1.16 62 up 19hdd 3.63199 1.0 3.6 TiB 2.6 TiB 2.6 TiB 655 MiB 7.9 GiB 1.0 TiB 71.71 1.15 63 up 69hdd 3.63799 1.0 3.6 TiB 2.6 TiB 2.6 TiB 445 MiB 6.2 GiB 1.0 TiB 71.70 1.15 65 up 43hdd 3.63199 1.0 3.6 TiB 2.6 TiB 2.6 TiB 170 MiB 4.7 GiB 1.0 TiB 71.62 1.15 63 up 97hdd 3.63820 1.0 3.6 TiB 2.6 TiB 2.6 TiB 276 MiB 5.7 GiB 1.0 TiB 71.33 1.15 66 up 67hdd 3.63799 1.0 3.6 TiB 2.6 TiB 2.6 TiB 430 MiB 6.3 GiB 1.0 TiB 71.22 1.14 54 up 68hdd 3.63799 1.0 3.6 TiB 2.6 TiB 2.6 TiB 419 MiB 6.6 GiB 1.1 TiB 70.68 1.13 58 up 31hdd 3.63199 1.0 3.6 TiB 2.6 TiB 2.5 TiB 419 MiB 5.2 GiB 1.1 TiB 70.16 1.13 63 up 48hdd 3.63199 1.0 3.6 TiB 2.6 TiB 2.5 TiB 211 MiB 5.0 GiB 1.1 TiB 70.13 1.13 56 up 73hdd 3.63799 1.0 3.6 TiB 2.5 TiB 2.5 TiB 765 MiB 7.1 GiB 1.1 TiB 69.52 1.12 57 up 98hdd 3.63820 1.0 3.6 TiB 2.5 TiB 2.5 TiB 552 MiB 7.1 GiB 1.1 TiB 68.72 1.10 60 up 58hdd 3.63799 1.0 3.6 TiB 2.5 TiB 2.5 TiB 427 MiB 6.3 GiB 1.2 TiB 68.39 1.10 53 up 14hdd 3.63199 1.0 3.6 TiB 2.5 TiB 2.5 TiB 409 MiB 6.0 GiB 1.2 TiB 68.06 1.09 65 up 47hdd 3.63199 1.0 3.6 TiB 2.5 TiB 2.5 TiB 166 MiB 5.5 GiB 1.2 TiB 67.84 1.09 55 up 9hdd 3.63199 1.0 3.6 TiB 2.5 TiB 2.5 TiB 419 MiB 5.9 GiB 1.2 TiB 67.78 1.09 58 up 90hdd 3.63820 1.0 3.6 TiB 2.5 TiB 2.5 TiB 277 MiB 6.3 GiB 1.2 TiB 67.56 1.08 57
[ceph-users] Re: Advice on balancing data across OSDs
Hi Josh, On Mon, Oct 24, 2022 at 07:20:46AM -0600, Josh Baergen wrote: > > I've included the osd df output below, along with pool and crush rules. > > Looking at these, the balancer module should be taking care of this > imbalance automatically. What does "ceph balancer status" say? # ceph balancer status { "active": true, "last_optimize_duration": "0:00:00.038795", "last_optimize_started": "Mon Oct 24 15:35:43 2022", "mode": "upmap", "optimize_result": "Optimization plan created successfully", "plans": [] } Looks healthy? This cluster is on pacific but has been upgraded through numerous previous releases, so it is possible some settings have been inherited and are not the same defaults as a new cluster. Tim. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Advice on balancing data across OSDs
Hi Joseph, Here's some of the larger pools. Notable the largest (pool 51, 32 TiB CephFS data) doesn't have the highest number of PGs. POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL pool28 28 256 9.9 TiB2.61M 30 TiB 43.28 13 TiB pool29 29 256 9.5 TiB2.48M 28 TiB 42.13 13 TiB pool36 36 128 6.0 TiB1.58M 18 TiB 31.67 13 TiB pool39 39 256 20 TiB5.20M 60 TiB 60.37 13 TiB pool43 43 32 1.9 TiB 503.92k 5.7 TiB 12.79 13 TiB pool46 46 32 1.3 TiB 236.34k 3.9 TiB 9.14 13 TiB pool47 47 128 4.0 TiB1.04M 12 TiB 23.35 13 TiB pool51 51 128 32 TiB 32.47M 55 TiB 58.30 26 TiB pool53 53 128 3.3 TiB 864.88k 9.9 TiB 20.21 13 TiB pool57 57 128 14 TiB3.55M 21 TiB 34.80 26 TiB It does sounds like I need to increase that, but I had assume the autoscaler would have produced a warning if that was the case... it certainly has for some pools in the past, and I've adjusted as per its recommendation. Tim. On Mon, Oct 24, 2022 at 09:24:58AM -0400, Joseph Mundackal wrote: > Hi Tim, > You might want to check you pool utilization and see if there are > enough pg's in that pool. Higher GB per pg can result in this scenario. > > I am also assuming that you have the balancer module turn on (ceph balancer > status) should tell you that as well. > > If you have enough pgs in the bigger pools and the balancer module is on, > you shouldht have to manually reweight osd's. > > -Joseph > > On Mon, Oct 24, 2022 at 9:13 AM Tim Bishop wrote: > > > Hi all, > > > > ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific > > (stable) > > > > We're having an issue with the spread of data across our OSDs. We have > > 108 OSDs in our cluster, all identical disk size, same number in each > > server, and the same number of servers in each rack. So I'd hoped we'd > > end up with a pretty balanced distribution of data across the disks. > > However, the fullest is at 85% full and the most empty is at 40% full. > > > > I've included the osd df output below, along with pool and crush rules. > > > > I've also looked at the reweight-by-utilization command which would > > apparently help: > > > > # ceph osd test-reweight-by-utilization > > moved 16 / 5715 (0.279965%) > > avg 52.9167 > > stddev 7.20998 -> 7.15325 (expected baseline 7.24063) > > min osd.45 with 31 -> 31 pgs (0.585827 -> 0.585827 * mean) > > max osd.22 with 70 -> 68 pgs (1.32283 -> 1.28504 * mean) > > > > oload 120 > > max_change 0.05 > > max_change_osds 4 > > average_utilization 0.6229 > > overload_utilization 0.7474 > > osd.22 weight 1. -> 0.9500 > > osd.23 weight 1. -> 0.9500 > > osd.53 weight 1. -> 0.9500 > > osd.78 weight 1. -> 0.9500 > > no change > > > > But I'd like to make sure I understand why the problem is occuring > > first so I can rule out a configuration issue, since it feels like the > > cluster shouldn't be getting in to this state in the first place. > > > > I have some suspicions that the number of PGs may be a bit low on some > > pools, but autoscale-status is set to "on" or "warn" for every pool, so > > it's happy with the current numbers. Does it play nice with CephFS? > > > > Thanks for any advice. > > Tim. > > > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > > AVAIL %USE VAR PGS STATUS > > 22hdd 3.63199 1.0 3.6 TiB 3.1 TiB 3.1 TiB 450 MiB 7.6 > > GiB 557 GiB 85.04 1.37 70 up > > 23hdd 3.63199 1.0 3.6 TiB 2.9 TiB 2.9 TiB 459 MiB 7.5 > > GiB 759 GiB 79.64 1.28 64 up > > 53hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 703 MiB 8.0 > > GiB 823 GiB 77.91 1.25 66 up > > 78hdd 3.63799 1.0 3.6 TiB 2.8 TiB 2.8 TiB 187 MiB 5.9 > > GiB 851 GiB 77.15 1.24 61 up > > 26hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 432 MiB 7.7 > > GiB 854 GiB 77.07 1.24 61 up > > 39hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.8 TiB 503 MiB 7.2 > > GiB 874 GiB 76.55 1.23 65 up > > 42hdd 3.63199 1.0 3.6 TiB 2.8 TiB 2.7 TiB 439 MiB 6.6 > > GiB 909 GiB 75.59 1.21 60 up > > 101hdd 3.63820 1.0 3.6 TiB 2.7 TiB 2.7 TiB 306 MiB 7.1 &
[ceph-users] Re: telemetry.ceph.com certificate expired
Yup, us too. I filed a bug report earlier: https://tracker.ceph.com/issues/45096 Hopefully it can be fixed soon! Tim. On Wed, Apr 15, 2020 at 02:27:51PM +0200, Eneko Lacunza wrote: > Hi all, > > We're receiving a certificate error for telemetry module: > Module 'telemetry' has failed: > HTTPSConnectionPool(host='telemetry.ceph.com', port=443): Max retries > exceeded with url: /report (Caused by SSLError(SSLError("bad handshake: > Error([('SSL routines', 'tls_process_server_certificate', 'certificate > verify failed')],)",),)); > > Seems certificate expired yesterday (14th april). > > Cheers > Eneko -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x6C226B37FDF38D55 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io