Hi, can you share the exact command how you blocked the watcher? To get the lock list run:

rbd lock list <pool>/<image>
There is 1 exclusive lock on this image.
Locker          ID                    Address
client.1211875  auto 139643345791728  192.168.3.12:0/2259335316

To blacklist the client run:

ceph osd blacklist add client.1211875

Or try it with rbd as well:

rbd lock rm <pool>/<image> client.1211875

Hope that helps!

Zitat von m...@netpunt.nl:

Hey all,

I recently had a k8s node failure in my homelab, and even though I powered it off (and it's done for, so it won't get back up), it still shows up as watcher in rbd status.

```
root@node0:~# rbd status kubernetes/csi-vol-3e7af8ae-ceb6-4c94-8435-2f8dc29b313b
Watchers:
        watcher=10.0.0.103:0/1520114202 client.1697844 cookie=140289402510784
        watcher=10.0.0.103:0/39967552 client.1805496 cookie=140549449430704
root@node0:~# ceph osd blocklist ls
10.0.0.103:0/0 2023-04-15T13:15:39.061379+0200
listed 1 entries
```

Even though the node is down & I have blocked it multiple times for hours, it won't disappear. Meaning, ceph-csi-rbd claims the image is mounted already (manually binding works fine, and can cleanly unbind as well, but can't unbind from a node that doesn't exist anymore).

Is there any possibility to force kick an rbd client / watcher from ceph (e.g. switching the mgr / mon) or to see why this is not timing out?

I found some historical mails & issues (related to rook, which I don't use) regarding a param `osd_client_watch_timeout` but can't find how that relates to the RBD images.

Cheers,
Max.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to