[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Sake Ceph
That wasn't really clear in the docs :(

> Op 21-12-2023 17:26 CET schreef Patrick Donnelly :
> 
>  
> On Thu, Dec 21, 2023 at 3:05 AM Sake Ceph  wrote:
> >
> > Hi David
> >
> > Reducing max_mds didn't work. So I executed a fs reset:
> > ceph fs set atlassian-prod allow_standby_replay false
> > ceph fs set atlassian-prod cluster_down true
> > ceph mds fail atlassian-prod.pwsoel13142.egsdfl
> > ceph mds fail atlassian-prod.pwsoel13143.qlvypn
> > ceph fs reset atlassian-prod
> > ceph fs reset atlassian-prod --yes-i-really-mean-it
> >
> > This brought the fs back online and the servers/applications are working 
> > again.
> 
> This was not the right thing to do. You can mark the rank repaired. See end 
> of:
> 
> https://docs.ceph.com/en/latest/cephfs/administration/#daemons
> 
> (ceph mds repaired )
> 
> I admit that is not easy to find. I will add a ticket to improve the
> documentation:
> 
> https://tracker.ceph.com/issues/63885
> 
> -- 
> Patrick Donnelly, Ph.D.
> He / Him / His
> Red Hat Partner Engineer
> IBM, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Simon Oosthoek
Hi Wes,

thanks the `ceph tell mon.* sessions` got me the answer very quickly :-)

Cheers

/Simon

On Thu, 21 Dec 2023 at 18:27, Wesley Dillingham 
wrote:

> You can ask the monitor to dump its sessions (which should expose the IPs
> and the release / features) you can then track down by IP those with the
> undesirable features/release
>
> ceph daemon mon.`hostname -s` sessions
>
> Assuming your mon is named after the short hostname, you may need to do
> this for every mon.  Alternatively using the `ceph tell mon.* sessions` to
> hit every mon at once.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Thu, Dec 21, 2023 at 10:46 AM Anthony D'Atri 
> wrote:
>
>> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features
>> {
>> "mon": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 3
>> }
>> ],
>> "osd": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 600
>> }
>> ],
>> "client": [
>> {
>> "features": "0x2f018fb87aa4aafe",
>> "release": "luminous",
>> "num": 41
>> },
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 147
>> }
>> ],
>> "mgr": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 2
>> }
>> ]
>> }
>> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$
>>
>> IIRC there are nuances, there are case where a client can *look* like
>> Jewel but actually be okay.
>>
>>
>> > On Dec 21, 2023, at 10:41, Simon Oosthoek 
>> wrote:
>> >
>> > Hi,
>> >
>> > Our cluster is currently running quincy, and I want to set the minimal
>> > client version to luminous, to enable upmap balancer, but when I tried
>> to,
>> > I got this:
>> >
>> > # ceph osd set-require-min-compat-client luminous Error EPERM: cannot
>> set
>> > require_min_compat_client to luminous: 2 connected client(s) look like
>> > jewel (missing 0x800); add --yes-i-really-mean-it to do it
>> > anyway
>> >
>> > I think I know the most likely candidate (and I've asked them), but is
>> > there a way to find out, the way ceph seems to know?
>> >
>> > tnx
>> >
>> > /Simon
>> > --
>> > I'm using my gmail.com address, because the gmail.com dmarc policy is
>> > "none", some mail servers will reject this (microsoft?) others will
>> instead
>> > allow this when I send mail to a mailling list which has not yet been
>> > configured to send mail "on behalf of" the sender, but rather do a kind
>> of
>> > "forward". The latter situation causes dkim/dmarc failures and the dmarc
>> > policy will be applied. see https://wiki.list.org/DEV/DMARC for more
>> details
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will instead
allow this when I send mail to a mailling list which has not yet been
configured to send mail "on behalf of" the sender, but rather do a kind of
"forward". The latter situation causes dkim/dmarc failures and the dmarc
policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-21 Thread Robert Sander

Hi,

On 21.12.23 15:13, Nico Schottelius wrote:


I would strongly recommend k8s+rook for new clusters, also allows
running Alpine Linux as the host OS.


Why would I want to learn Kubernetes before I can deploy a new Ceph 
cluster when I have no need for K8s at all?


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-21 Thread Robert Sander

Hi,

On 21.12.23 19:11, Albert Shih wrote:


What is the advantage of podman vs docker ? (I mean not in general but for
ceph).


Docker comes with the Docker daemon that when it gets an update has to 
be restarted and restarts all containers. For a storage system not the 
best procedure.


Everything needed for the Ceph containers is provided by podman.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-21 Thread Simon Ironside

On 21/12/2023 13:50, Drew Weaver wrote:

Howdy,

I am going to be replacing an old cluster pretty soon and I am looking for a 
few suggestions.

#1 cephadm or ceph-ansible for management?
#2 Since the whole... CentOS thing... what distro appears to be the most 
straightforward to use with Ceph?  I was going to try and deploy it on Rocky 9.


I'm in the same boat and have used cephadm on Rocky 9 and the standard 
podman packages that come with the distro. Installation went without a 
hitch, a breeze actually compared to the old ceph-deploy/Nautilus 
install it's going to replace.


Cheers,
Simon.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Wesley Dillingham
You can ask the monitor to dump its sessions (which should expose the IPs
and the release / features) you can then track down by IP those with the
undesirable features/release

ceph daemon mon.`hostname -s` sessions

Assuming your mon is named after the short hostname, you may need to do
this for every mon.  Alternatively using the `ceph tell mon.* sessions` to
hit every mon at once.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Dec 21, 2023 at 10:46 AM Anthony D'Atri 
wrote:

> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features
> {
> "mon": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 3
> }
> ],
> "osd": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 600
> }
> ],
> "client": [
> {
> "features": "0x2f018fb87aa4aafe",
> "release": "luminous",
> "num": 41
> },
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 147
> }
> ],
> "mgr": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 2
> }
> ]
> }
> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$
>
> IIRC there are nuances, there are case where a client can *look* like
> Jewel but actually be okay.
>
>
> > On Dec 21, 2023, at 10:41, Simon Oosthoek 
> wrote:
> >
> > Hi,
> >
> > Our cluster is currently running quincy, and I want to set the minimal
> > client version to luminous, to enable upmap balancer, but when I tried
> to,
> > I got this:
> >
> > # ceph osd set-require-min-compat-client luminous Error EPERM: cannot set
> > require_min_compat_client to luminous: 2 connected client(s) look like
> > jewel (missing 0x800); add --yes-i-really-mean-it to do it
> > anyway
> >
> > I think I know the most likely candidate (and I've asked them), but is
> > there a way to find out, the way ceph seems to know?
> >
> > tnx
> >
> > /Simon
> > --
> > I'm using my gmail.com address, because the gmail.com dmarc policy is
> > "none", some mail servers will reject this (microsoft?) others will
> instead
> > allow this when I send mail to a mailling list which has not yet been
> > configured to send mail "on behalf of" the sender, but rather do a kind
> of
> > "forward". The latter situation causes dkim/dmarc failures and the dmarc
> > policy will be applied. see https://wiki.list.org/DEV/DMARC for more
> details
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 3:05 AM Sake Ceph  wrote:
>
> Hi David
>
> Reducing max_mds didn't work. So I executed a fs reset:
> ceph fs set atlassian-prod allow_standby_replay false
> ceph fs set atlassian-prod cluster_down true
> ceph mds fail atlassian-prod.pwsoel13142.egsdfl
> ceph mds fail atlassian-prod.pwsoel13143.qlvypn
> ceph fs reset atlassian-prod
> ceph fs reset atlassian-prod --yes-i-really-mean-it
>
> This brought the fs back online and the servers/applications are working 
> again.

This was not the right thing to do. You can mark the rank repaired. See end of:

https://docs.ceph.com/en/latest/cephfs/administration/#daemons

(ceph mds repaired )

I admit that is not easy to find. I will add a ticket to improve the
documentation:

https://tracker.ceph.com/issues/63885

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 2:49 AM David C.  wrote:
> I would start by decrementing max_mds by 1:
> ceph fs set atlassian-prod max_mds 2

This will have no positive effect. The monitors will not alter the
number of ranks (i.e. stop a rank) if the cluster is degraded.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 2:11 AM Sake Ceph  wrote:
>
> Starting a new thread, forgot subject in the previous.
> So our FS down. Got the following error, what can I do?
>
> # ceph health detail
> HEALTH_ERR 1 filesystem is degraded; 1 mds daemon damaged
> [WRN] FS_DEGRADED: 1 filesystem is degraded
> fs atlassian/prod is degraded
> [ERR] MDS_DAMAGE: 1 mds daemon damaged
> fs atlassian-prod mds.1 is damaged

Identify what is damaged by reviewing the MDS logs. Increase mds
debugging and mark the rank repaired if there is insufficient
information (which assumes that whatever caused the MDS to become
damaged will reoccur when it restarts).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Anthony D'Atri
[rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features
{
"mon": [
{
"features": "0x3f01cfbf7ffd",
"release": "luminous",
"num": 3
}
],
"osd": [
{
"features": "0x3f01cfbf7ffd",
"release": "luminous",
"num": 600
}
],
"client": [
{
"features": "0x2f018fb87aa4aafe",
"release": "luminous",
"num": 41
},
{
"features": "0x3f01cfbf7ffd",
"release": "luminous",
"num": 147
}
],
"mgr": [
{
"features": "0x3f01cfbf7ffd",
"release": "luminous",
"num": 2
}
]
}
[rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$

IIRC there are nuances, there are case where a client can *look* like Jewel but 
actually be okay.


> On Dec 21, 2023, at 10:41, Simon Oosthoek  wrote:
> 
> Hi,
> 
> Our cluster is currently running quincy, and I want to set the minimal
> client version to luminous, to enable upmap balancer, but when I tried to,
> I got this:
> 
> # ceph osd set-require-min-compat-client luminous Error EPERM: cannot set
> require_min_compat_client to luminous: 2 connected client(s) look like
> jewel (missing 0x800); add --yes-i-really-mean-it to do it
> anyway
> 
> I think I know the most likely candidate (and I've asked them), but is
> there a way to find out, the way ceph seems to know?
> 
> tnx
> 
> /Simon
> -- 
> I'm using my gmail.com address, because the gmail.com dmarc policy is
> "none", some mail servers will reject this (microsoft?) others will instead
> allow this when I send mail to a mailling list which has not yet been
> configured to send mail "on behalf of" the sender, but rather do a kind of
> "forward". The latter situation causes dkim/dmarc failures and the dmarc
> policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Simon Oosthoek
Hi,

Our cluster is currently running quincy, and I want to set the minimal
client version to luminous, to enable upmap balancer, but when I tried to,
I got this:

# ceph osd set-require-min-compat-client luminous Error EPERM: cannot set
require_min_compat_client to luminous: 2 connected client(s) look like
jewel (missing 0x800); add --yes-i-really-mean-it to do it
anyway

I think I know the most likely candidate (and I've asked them), but is
there a way to find out, the way ceph seems to know?

tnx

/Simon
-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will instead
allow this when I send mail to a mailling list which has not yet been
configured to send mail "on behalf of" the sender, but rather do a kind of
"forward". The latter situation causes dkim/dmarc failures and the dmarc
policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-21 Thread Nico Schottelius


Hey Drew,

Drew Weaver  writes:
> #1 cephadm or ceph-ansible for management?
> #2 Since the whole... CentOS thing... what distro appears to be the most 
> straightforward to use with Ceph?  I was going to try and deploy it on Rocky 
> 9.

I would strongly recommend k8s+rook for new clusters, also allows
running Alpine Linux as the host OS.

BR,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Building new cluster had a couple of questions

2023-12-21 Thread Robert Sander

Hi,

On 12/21/23 14:50, Drew Weaver wrote:

#1 cephadm or ceph-ansible for management?


cephadm.

The ceph-ansible project writes in its README:

NOTE: cephadm is the new official installer, you should consider 
migrating to cephadm.


https://github.com/ceph/ceph-ansible


#2 Since the whole... CentOS thing... what distro appears to be the most 
straightforward to use with Ceph?  I was going to try and deploy it on Rocky 9.


Any distribution with a recent systemd, podman, LVM2 and time 
synchronization is viable. I prefer Debian, others RPM-based distributions.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Building new cluster had a couple of questions

2023-12-21 Thread Drew Weaver
Howdy,

I am going to be replacing an old cluster pretty soon and I am looking for a 
few suggestions.

#1 cephadm or ceph-ansible for management?
#2 Since the whole... CentOS thing... what distro appears to be the most 
straightforward to use with Ceph?  I was going to try and deploy it on Rocky 9.

That is all I have.

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW requests piling up

2023-12-21 Thread Gauvain Pocentek
Hello Ceph users,

We've been having an issue with RGW for a couple days and we would
appreciate some help, ideas, or guidance to figure out the issue.

We run a multi-site setup which has been working pretty fine so far. We
don't actually have data replication enabled yet, only metadata
replication. On the master region we've started to see requests piling up
in the rgw process, leading to very slow operations and failures all other
the place (clients timeout before getting responses from rgw). The
workaround for now is to restart the rgw containers regularly.

We've made a mistake and forcefully deleted a bucket on a secondary zone,
this might be the trigger but we are not sure.

Other symptoms include:

* Increased memory usage of the RGW processes (we bumped the container
limits from 4G to 48G to cater for that)
* Lots of read IOPS on the index pool (4 or 5 times more compared to what
we were seeing before)
* The prometheus ceph_rgw_qlen and ceph_rgw_qactive metrics (number of
active requests) seem to show that the number of concurrent requests
increases with time, although we don't see more requests coming in on the
load-balancer side.

The current thought is that the RGW process doesn't close the requests
properly, or that some requests just hang. After a restart of the process
things look OK but the situation turns bad fairly quickly (after 1 hour we
start to see many timeouts).

The rados cluster seems completely healthy, it is also used for rbd
volumes, and we haven't seen any degradation there.

Has anyone experienced that kind of issue? Anything we should be looking at?

Thanks for your help!

Gauvain
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Sake Ceph
Hi David

Reducing max_mds didn't work. So I executed a fs reset:
ceph fs set atlassian-prod allow_standby_replay false
ceph fs set atlassian-prod cluster_down true
ceph mds fail atlassian-prod.pwsoel13142.egsdfl
ceph mds fail atlassian-prod.pwsoel13143.qlvypn
ceph fs reset atlassian-prod
ceph fs reset atlassian-prod --yes-i-really-mean-it

This brought the fs back online and the servers/applications are working again. 

Question: can I increase the max_mds and active standby_replay? 

Will collect logs, maybe we can pinpoint the cause. 

Best regards, 
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io