[ceph-users] Querying Cephfs Metadata
Hey, I'm trying to find the last two objects on an old data pool for a cephfs cluster that seemingly aren't found via find && getfattr. More generically though, is there a low level tool or (C++ or Python) library for reading and analyzing cephfs metadata? Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Phhantom host
I don't remember how connected the dashboard is to the orchestrator in pacific, but the only thing I could think to do here is just restart it. (ceph mgr module disable dashboard, ceph mgr module enable dashboard). You could also totally fail over the mgr (ceph mgr fail) although that might change the url you need for the dashboard by changing where the active mgr is. On Fri, Jun 21, 2024 at 10:14 AM Tim Holloway wrote: > Ceph Pacific > > Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to > alias a non-ceph system's ip as a ceph host and ceph adopted it > somehow. > > I fixed the fat-fingered IP, and have gone through the usual motions to > delete a host, but some parts of the ceph ecosystem haven't caught up. > > The host no longer shows on "ceph orch host ls', but on the web control > panel, it's still there and thinks it has an OSD attached. Ditto for > the "ceph health detail". On the other hand, the webapp shows not one, > but THREE OSD's associated with the phantom host on the dashboard/hosts > detail expansion. It's claiming to own OSD daemons that are actually on > other machines. > > Any assistance would be much appreciated! > > Tim > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Phhantom host
Ceph Pacific Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to alias a non-ceph system's ip as a ceph host and ceph adopted it somehow. I fixed the fat-fingered IP, and have gone through the usual motions to delete a host, but some parts of the ceph ecosystem haven't caught up. The host no longer shows on "ceph orch host ls', but on the web control panel, it's still there and thinks it has an OSD attached. Ditto for the "ceph health detail". On the other hand, the webapp shows not one, but THREE OSD's associated with the phantom host on the dashboard/hosts detail expansion. It's claiming to own OSD daemons that are actually on other machines. Any assistance would be much appreciated! Tim ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cannot mount RBD on client
Hi Etienne, indeed, even ```rados ls --pool test``` hangs on the same instruction futex(0x7ffc2de0cb10, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=10215, tv_nsec=619004859}, FUTEX_BITSET_MATCH_ANY Yes, by netcat I have checked from client side and all OSD ports are succeed ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cannot mount RBD on client
Hi Etienne, indeed, even ```rados ls --pool test``` hangs on the same instruction futex(0x7ffc2de0cb10, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=10215, tv_nsec=619004859}, FUTEX_BITSET_MATCH_ANY Yes, by netcat I have checked from client side and all OSD ports are succeeded. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cannot mount RBD on client
Hi, By netcat I can see that OSD and MON ports are open. You check from your client? You can use rados cli to ensure your client can actually use your ceph cluster. Étienne From: service.pl...@ya.ru Sent: Friday, 21 June 2024 11:39 To: ceph-users@ceph.io Subject: [ceph-users] Cannot mount RBD on client Hi everyone! I've encountered situation I cannot even google. In a nutshell, rbd map test/kek --id test hags forever on ```futex(0x7ffdfa73d748, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY``` instruction in strace. Of course, I have all the keyrings and ceph.conf on site. I would think about network related problem however there is neither firewall not iptables filtering rules. Tcpdump show that packages fly both ends (i don't know what is inside). The trick is when I bring admin keyring onto client - ```ceph -s``` command works perfect. Thare are no mentions about connection attemp in systemd service of monitors. No mentions at all in any logs. I've been fighting for three days now and no result. So ANY advice is very appreciated and you will receive quants of love from me personally :) Ubuntu 22.04.4 LTS on both client and ceph nodes with kernel 5.15.0-112-generic There is no firewall and no iptables rules filtering rules on client and the same (excluding rules added by docker) on cluster nodes. There is a ping between client and any of cluster nodes. By netcat I can see that OSD and MON ports are open. I am totally lost here. Please, give me a hint what to check and where to find? root@ceph1:/tmp/nfs# ceph -s cluster: id: ceph-fsid health: HEALTH_OK services: mon: 3 daemons, quorum ceph10,ceph6,ceph1 (age 46h) mgr: ceph1.rynror(active, since 46h), standbys: ceph2.nxpjmd osd: 109 osds: 108 up (since 44h), 108 in (since 43h) flags noautoscale data: pools: 6 pools, 6401 pgs objects: 38 objects, 65 MiB usage: 105 TiB used, 1.7 PiB / 1.8 PiB avail pgs: 6401 active+clean Thanks in advance This is ceph v18.2.2 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cannot mount RBD on client
Hi everyone! I've encountered situation I cannot even google. In a nutshell, rbd map test/kek --id test hags forever on ```futex(0x7ffdfa73d748, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY``` instruction in strace. Of course, I have all the keyrings and ceph.conf on site. I would think about network related problem however there is neither firewall not iptables filtering rules. Tcpdump show that packages fly both ends (i don't know what is inside). The trick is when I bring admin keyring onto client - ```ceph -s``` command works perfect. Thare are no mentions about connection attemp in systemd service of monitors. No mentions at all in any logs. I've been fighting for three days now and no result. So ANY advice is very appreciated and you will receive quants of love from me personally :) Ubuntu 22.04.4 LTS on both client and ceph nodes with kernel 5.15.0-112-generic There is no firewall and no iptables rules filtering rules on client and the same (excluding rules added by docker) on cluster nodes. There is a ping between client and any of cluster nodes. By netcat I can see that OSD and MON ports are open. I am totally lost here. Please, give me a hint what to check and where to find? root@ceph1:/tmp/nfs# ceph -s cluster: id: ceph-fsid health: HEALTH_OK services: mon: 3 daemons, quorum ceph10,ceph6,ceph1 (age 46h) mgr: ceph1.rynror(active, since 46h), standbys: ceph2.nxpjmd osd: 109 osds: 108 up (since 44h), 108 in (since 43h) flags noautoscale data: pools: 6 pools, 6401 pgs objects: 38 objects, 65 MiB usage: 105 TiB used, 1.7 PiB / 1.8 PiB avail pgs: 6401 active+clean Thanks in advance This is ceph v18.2.2 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: wrong public_ip after blackout / poweroutage
Hi, This type of incident is often resolved by setting the public_network option to the "global" scope, in the configuration: ceph config set global public_network a:b:c:d::/64 Le ven. 21 juin 2024 à 03:36, Eugen Block a écrit : > Hi, > > this only a theory, not a proven answer or something. But the > orchestrator does automatically reconfigure daemons depending on the > circumstances. So my theory is, some of the OSD nodes didn't respond > via public network anymore, so ceph tried to use the cluster network > as a fallback. The other way around is more common: if you don't have > a cluster network configured at all, you see logs stating "falling > back to public interface" (or similar). If the orchestrator did > reconfigure the daemons, it would have been logged in the active mgr. > And the result would be a different ceph.conf for the daemons in > /var/lib/ceph/{FSID}/osd.{OSD_ID}/config. If you still have the mgr > logs from after the outage you might find some clues. > > Regards, > Eugen > > Zitat von mailing-lists : > > > Dear Cephers, > > > > after a succession of unfortunate events, we have suffered a > > complete datacenter blackout today. > > > > > > Ceph _nearly_ perfectly came back up. The Health was OK and all > > services were online, but we were having weird problems. Weird as > > in, we could sometimes map rbds and sometimes not, and sometimes we > > could use the cephfs and sometimes we could not... > > > > Turns out, some osds (id say 5%) came back with the cluster_ip > > address as their public_ip and thus were not reachable. > > > > I do not see any pattern, why some osds are faulty and others are > > not. The fault is spread over nearly all nodes. This is an example: > > > > osd.45 up in weight 1 up_from 184143 up_thru 184164 down_at > > 184142 last_clean_interval [182655,184103) > > [v2:192.168.222.20:6834/1536394698,v1:192.168.222.20:6842/1536394698] > > [v2:192.168.222.20:6848/1536394698,v1:192.168.222.20:6853/1536394698] > > exists,up 002326c9 > > > > This should have a public_ip in the first brackets []. Our > > cluster-network is 192.168.222.0/24, which is of course only > > available on the ceph internal switch. > > > > Simply restarting the osds that were affected solved this problem... > > So I am not really asking for your help troubleshooting this; I > > would just like to understand if there is a reasonable explanation. > > > > My guess would be some kind of race-condition when the interfaces > > came up, but then again, why on ~5% of all osds? ... Anyways im > > tired, I hope that this mail is somewhat understandable. > > > > > > We are running Ceph 17.2.7 with cephadm docker deployment. > > > > > > If you have any ideas for the cause of this, please let me know. I > > have not seen this issue when I'm gracefully rebooting the nodes. > > > > > > Best > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: wrong public_ip after blackout / poweroutage
Hi, maybe not a direct answer to your question. But i would generally recommend you to verify your ceph config with our ceph analyzer. Simply upload your ceph report there and in a few seconds you will receive feedback on the configuration:https://analyzer.clyso.com/ *Joachim Kraftmayer* a: Loristr. 8 | 80335 Munich | Germany | w: https://clyso.com | Utting a. A. | HR: Augsburg | HRB 25866 | USt. ID: DE275430677 Am Fr., 21. Juni 2024 um 03:36 Uhr schrieb Eugen Block : > Hi, > > this only a theory, not a proven answer or something. But the > orchestrator does automatically reconfigure daemons depending on the > circumstances. So my theory is, some of the OSD nodes didn't respond > via public network anymore, so ceph tried to use the cluster network > as a fallback. The other way around is more common: if you don't have > a cluster network configured at all, you see logs stating "falling > back to public interface" (or similar). If the orchestrator did > reconfigure the daemons, it would have been logged in the active mgr. > And the result would be a different ceph.conf for the daemons in > /var/lib/ceph/{FSID}/osd.{OSD_ID}/config. If you still have the mgr > logs from after the outage you might find some clues. > > Regards, > Eugen > > Zitat von mailing-lists : > > > Dear Cephers, > > > > after a succession of unfortunate events, we have suffered a > > complete datacenter blackout today. > > > > > > Ceph _nearly_ perfectly came back up. The Health was OK and all > > services were online, but we were having weird problems. Weird as > > in, we could sometimes map rbds and sometimes not, and sometimes we > > could use the cephfs and sometimes we could not... > > > > Turns out, some osds (id say 5%) came back with the cluster_ip > > address as their public_ip and thus were not reachable. > > > > I do not see any pattern, why some osds are faulty and others are > > not. The fault is spread over nearly all nodes. This is an example: > > > > osd.45 up in weight 1 up_from 184143 up_thru 184164 down_at > > 184142 last_clean_interval [182655,184103) > > [v2:192.168.222.20:6834/1536394698,v1:192.168.222.20:6842/1536394698] > > [v2:192.168.222.20:6848/1536394698,v1:192.168.222.20:6853/1536394698] > > exists,up 002326c9 > > > > This should have a public_ip in the first brackets []. Our > > cluster-network is 192.168.222.0/24, which is of course only > > available on the ceph internal switch. > > > > Simply restarting the osds that were affected solved this problem... > > So I am not really asking for your help troubleshooting this; I > > would just like to understand if there is a reasonable explanation. > > > > My guess would be some kind of race-condition when the interfaces > > came up, but then again, why on ~5% of all osds? ... Anyways im > > tired, I hope that this mail is somewhat understandable. > > > > > > We are running Ceph 17.2.7 with cephadm docker deployment. > > > > > > If you have any ideas for the cause of this, please let me know. I > > have not seen this issue when I'm gracefully rebooting the nodes. > > > > > > Best > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io