date:20240621

[ceph-users] Querying Cephfs Metadata

2024-06-21 Thread Daniel Williams

Hey,

I'm trying to find the last two objects on an old data pool for a cephfs
cluster that seemingly aren't found via find && getfattr.

More generically though, is there a low level tool or (C++ or Python)
library for reading and analyzing cephfs metadata?

Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Phhantom host

2024-06-21 Thread Adam King

I don't remember how connected the dashboard is to the orchestrator in
pacific, but the only thing I could think to do here is just restart it.
(ceph mgr module disable dashboard, ceph mgr module enable dashboard). You
could also totally fail over the mgr (ceph mgr fail) although that might
change the url you need for the dashboard by changing where the active mgr
is.

On Fri, Jun 21, 2024 at 10:14 AM Tim Holloway  wrote:

> Ceph Pacific
>
> Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to
> alias a non-ceph system's ip as a ceph host and ceph adopted it
> somehow.
>
> I fixed the fat-fingered IP, and have gone through the usual motions to
> delete a host, but some parts of the ceph ecosystem haven't caught up.
>
> The host no longer shows on "ceph orch host ls', but on the web control
> panel, it's still there and thinks it has an OSD attached. Ditto for
> the "ceph health detail". On the other hand, the webapp shows not one,
> but THREE OSD's associated with the phantom host on the dashboard/hosts
> detail expansion. It's claiming to own OSD daemons that are actually on
> other machines.
>
> Any assistance would be much appreciated!
>
> Tim
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Phhantom host

2024-06-21 Thread Tim Holloway

Ceph Pacific

Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to
alias a non-ceph system's ip as a ceph host and ceph adopted it
somehow.

I fixed the fat-fingered IP, and have gone through the usual motions to
delete a host, but some parts of the ceph ecosystem haven't caught up. 

The host no longer shows on "ceph orch host ls', but on the web control
panel, it's still there and thinks it has an OSD attached. Ditto for
the "ceph health detail". On the other hand, the webapp shows not one,
but THREE OSD's associated with the phantom host on the dashboard/hosts
detail expansion. It's claiming to own OSD daemons that are actually on
other machines.

Any assistance would be much appreciated!

Tim
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cannot mount RBD on client

2024-06-21 Thread Alex from North

Hi Etienne,
indeed, even ```rados ls --pool test``` hangs on the same instruction 
futex(0x7ffc2de0cb10, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=10215, 
tv_nsec=619004859},
FUTEX_BITSET_MATCH_ANY

Yes, by netcat I have checked from client side and all OSD ports are succeed
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cannot mount RBD on client

2024-06-21 Thread service . plant

Hi Etienne,
indeed, even ```rados ls --pool test``` hangs on the same instruction 
futex(0x7ffc2de0cb10, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=10215, 
tv_nsec=619004859}, FUTEX_BITSET_MATCH_ANY

Yes, by netcat I have checked from client side and all OSD ports are succeeded.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cannot mount RBD on client

2024-06-21 Thread Etienne Menguy

Hi,

By netcat I can see that OSD and MON ports are open.
You check from your client?

You can use rados cli to ensure your client can actually use your ceph cluster.

Étienne

From: service.pl...@ya.ru 
Sent: Friday, 21 June 2024 11:39
To: ceph-users@ceph.io 
Subject: [ceph-users] Cannot mount RBD on client

Hi everyone! I've encountered situation I cannot even google.
In a nutshell, rbd map test/kek --id test hags forever on 
```futex(0x7ffdfa73d748, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, 
NULL, FUTEX_BITSET_MATCH_ANY``` instruction in strace.
Of course, I have all the keyrings and ceph.conf on site.

I would think about network related problem however there is neither firewall 
not iptables filtering rules.

Tcpdump show that packages fly both ends (i don't know what is inside).

The trick is when I bring admin keyring onto client - ```ceph -s``` command 
works perfect.
Thare are no mentions about connection attemp in systemd service of monitors. 
No mentions at all in any logs.

I've been fighting for three days now and no result. So ANY advice is very 
appreciated and you will receive quants of love from me personally :)

Ubuntu 22.04.4 LTS on both client and ceph nodes with kernel 5.15.0-112-generic
There is no firewall and no iptables rules filtering rules on client and the 
same (excluding rules added by docker) on cluster nodes.
There is a ping between client and any of cluster nodes.
By netcat I can see that OSD and MON ports are open.

I am totally lost here. Please, give me a hint what to check and where to find?

root@ceph1:/tmp/nfs# ceph -s
  cluster:
id: ceph-fsid
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph10,ceph6,ceph1 (age 46h)
mgr: ceph1.rynror(active, since 46h), standbys: ceph2.nxpjmd
osd: 109 osds: 108 up (since 44h), 108 in (since 43h)
 flags noautoscale

  data:
pools:   6 pools, 6401 pgs
objects: 38 objects, 65 MiB
usage:   105 TiB used, 1.7 PiB / 1.8 PiB avail
pgs: 6401 active+clean


Thanks in advance

This is ceph v18.2.2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cannot mount RBD on client

2024-06-21 Thread service . plant

Hi everyone! I've encountered situation I cannot even google.
In a nutshell, rbd map test/kek --id test hags forever on 
```futex(0x7ffdfa73d748, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, 
NULL, FUTEX_BITSET_MATCH_ANY``` instruction in strace.
Of course, I have all the keyrings and ceph.conf on site.

I would think about network related problem however there is neither firewall 
not iptables filtering rules. 

Tcpdump show that packages fly both ends (i don't know what is inside).

The trick is when I bring admin keyring onto client - ```ceph -s``` command 
works perfect.
Thare are no mentions about connection attemp in systemd service of monitors. 
No mentions at all in any logs.

I've been fighting for three days now and no result. So ANY advice is very 
appreciated and you will receive quants of love from me personally :)

Ubuntu 22.04.4 LTS on both client and ceph nodes with kernel 5.15.0-112-generic
There is no firewall and no iptables rules filtering rules on client and the 
same (excluding rules added by docker) on cluster nodes.
There is a ping between client and any of cluster nodes.
By netcat I can see that OSD and MON ports are open.

I am totally lost here. Please, give me a hint what to check and where to find?

root@ceph1:/tmp/nfs# ceph -s
  cluster:
id: ceph-fsid
health: HEALTH_OK
 
  services:
mon: 3 daemons, quorum ceph10,ceph6,ceph1 (age 46h)
mgr: ceph1.rynror(active, since 46h), standbys: ceph2.nxpjmd
osd: 109 osds: 108 up (since 44h), 108 in (since 43h)
 flags noautoscale
 
  data:
pools:   6 pools, 6401 pgs
objects: 38 objects, 65 MiB
usage:   105 TiB used, 1.7 PiB / 1.8 PiB avail
pgs: 6401 active+clean


Thanks in advance

This is ceph v18.2.2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: wrong public_ip after blackout / poweroutage

2024-06-21 Thread David C.

Hi,

This type of incident is often resolved by setting the public_network
option to the "global" scope, in the configuration:

ceph config set global public_network a:b:c:d::/64


Le ven. 21 juin 2024 à 03:36, Eugen Block  a écrit :

> Hi,
>
> this only a theory, not a proven answer or something. But the
> orchestrator does automatically reconfigure daemons depending on the
> circumstances. So my theory is, some of the OSD nodes didn't respond
> via public network anymore, so ceph tried to use the cluster network
> as a fallback. The other way around is more common: if you don't have
> a cluster network configured at all, you see logs stating "falling
> back to public interface" (or similar). If the orchestrator did
> reconfigure the daemons, it would have been logged in the active mgr.
> And the result would be a different ceph.conf for the daemons in
> /var/lib/ceph/{FSID}/osd.{OSD_ID}/config. If you still have the mgr
> logs from after the outage you might find some clues.
>
> Regards,
> Eugen
>
> Zitat von mailing-lists :
>
> > Dear Cephers,
> >
> > after a succession of unfortunate events, we have suffered a
> > complete datacenter blackout today.
> >
> >
> > Ceph _nearly_ perfectly came back up. The Health was OK and all
> > services were online, but we were having weird problems. Weird as
> > in, we could sometimes map rbds and sometimes not, and sometimes we
> > could use the cephfs and sometimes we could not...
> >
> > Turns out, some osds (id say 5%) came back with the cluster_ip
> > address as their public_ip and thus were not reachable.
> >
> > I do not see any pattern, why some osds are faulty and others are
> > not. The fault is spread over nearly all nodes. This is an example:
> >
> > osd.45 up   in  weight 1 up_from 184143 up_thru 184164 down_at
> > 184142 last_clean_interval [182655,184103)
> > [v2:192.168.222.20:6834/1536394698,v1:192.168.222.20:6842/1536394698]
> > [v2:192.168.222.20:6848/1536394698,v1:192.168.222.20:6853/1536394698]
> > exists,up 002326c9
> >
> > This should have a public_ip in the first brackets []. Our
> > cluster-network is 192.168.222.0/24, which is of course only
> > available on the ceph internal switch.
> >
> > Simply restarting the osds that were affected solved this problem...
> > So I am not really asking for your help troubleshooting this; I
> > would just like to understand if there is a reasonable explanation.
> >
> > My guess would be some kind of race-condition when the interfaces
> > came up, but then again, why on ~5% of all osds? ... Anyways im
> > tired, I hope that this mail is somewhat understandable.
> >
> >
> > We are running Ceph 17.2.7 with cephadm docker deployment.
> >
> >
> > If you have any ideas for the cause of this, please let me know. I
> > have not seen this issue when I'm gracefully rebooting the nodes.
> >
> >
> > Best
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: wrong public_ip after blackout / poweroutage

2024-06-21 Thread Joachim Kraftmayer

Hi,
maybe not a direct answer to your question.
But i would generally recommend you to verify your ceph config with our
ceph analyzer.
Simply upload your ceph report there and in a few seconds you will receive
feedback on the configuration:https://analyzer.clyso.com/

*Joachim Kraftmayer*



a: Loristr. 8 | 80335 Munich | Germany | w: https://clyso.com |
Utting a. A. | HR: Augsburg | HRB 25866 | USt. ID: DE275430677


Am Fr., 21. Juni 2024 um 03:36 Uhr schrieb Eugen Block :

> Hi,
>
> this only a theory, not a proven answer or something. But the
> orchestrator does automatically reconfigure daemons depending on the
> circumstances. So my theory is, some of the OSD nodes didn't respond
> via public network anymore, so ceph tried to use the cluster network
> as a fallback. The other way around is more common: if you don't have
> a cluster network configured at all, you see logs stating "falling
> back to public interface" (or similar). If the orchestrator did
> reconfigure the daemons, it would have been logged in the active mgr.
> And the result would be a different ceph.conf for the daemons in
> /var/lib/ceph/{FSID}/osd.{OSD_ID}/config. If you still have the mgr
> logs from after the outage you might find some clues.
>
> Regards,
> Eugen
>
> Zitat von mailing-lists :
>
> > Dear Cephers,
> >
> > after a succession of unfortunate events, we have suffered a
> > complete datacenter blackout today.
> >
> >
> > Ceph _nearly_ perfectly came back up. The Health was OK and all
> > services were online, but we were having weird problems. Weird as
> > in, we could sometimes map rbds and sometimes not, and sometimes we
> > could use the cephfs and sometimes we could not...
> >
> > Turns out, some osds (id say 5%) came back with the cluster_ip
> > address as their public_ip and thus were not reachable.
> >
> > I do not see any pattern, why some osds are faulty and others are
> > not. The fault is spread over nearly all nodes. This is an example:
> >
> > osd.45 up   in  weight 1 up_from 184143 up_thru 184164 down_at
> > 184142 last_clean_interval [182655,184103)
> > [v2:192.168.222.20:6834/1536394698,v1:192.168.222.20:6842/1536394698]
> > [v2:192.168.222.20:6848/1536394698,v1:192.168.222.20:6853/1536394698]
> > exists,up 002326c9
> >
> > This should have a public_ip in the first brackets []. Our
> > cluster-network is 192.168.222.0/24, which is of course only
> > available on the ceph internal switch.
> >
> > Simply restarting the osds that were affected solved this problem...
> > So I am not really asking for your help troubleshooting this; I
> > would just like to understand if there is a reasonable explanation.
> >
> > My guess would be some kind of race-condition when the interfaces
> > came up, but then again, why on ~5% of all osds? ... Anyways im
> > tired, I hope that this mail is somewhat understandable.
> >
> >
> > We are running Ceph 17.2.7 with cephadm docker deployment.
> >
> >
> > If you have any ideas for the cause of this, please let me know. I
> > have not seen this issue when I'm gracefully rebooting the nodes.
> >
> >
> > Best
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Querying Cephfs Metadata

[ceph-users] Re: Phhantom host

[ceph-users] Phhantom host

[ceph-users] Re: Cannot mount RBD on client

[ceph-users] Re: Cannot mount RBD on client

[ceph-users] Re: Cannot mount RBD on client

[ceph-users] Cannot mount RBD on client

[ceph-users] Re: wrong public_ip after blackout / poweroutage

[ceph-users] Re: wrong public_ip after blackout / poweroutage

9 matches

Site Navigation

Mail list logo

Footer information