date:20230515

why are you still not on 14.2.22?

> 
> Yes, the documents show an example of upgrading from Nautilus to
> Pacific. But I'm not really 100% trusting the Ceph documents, and I'm
> also afraid of what if Nautilus is not compatible with Pacific in some
> operations of monitor or osd =)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] CEPH Version choice

2023-05-15 Thread Tino Todino

Hi all,

I've been reading through this email list for a while now, but one thing that 
I'm curious about is why a lot of installations out there aren't upgraded to 
the latest version of CEPH (Quincy).

What are the main reasons for not upgrading to the latest and greatest?

Thanks.

Tino
This E-mail is intended solely for the person or organisation to which it is 
addressed. It may contain privileged or confidential information and, if you 
are not the intended recipient, you must not copy, distribute or take any 
action in reliance upon it. Any views or opinions presented are solely those of 
the author and do not necessarily represent those of Marlan Maritime 
Technologies Ltd. If you have received this E-mail in error, please notify us 
as soon as possible and delete it from your computer. Marlan Maritime 
Technologies Ltd Registered in England & Wales 323 Mariners House, Norfolk 
Street, Liverpool. L1 0BG Company No. 08492427.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Konstantin Shalygin

Hi,

> On 15 May 2023, at 11:37, Tino Todino  wrote:
> 
> What are the main reasons for not upgrading to the latest and greatest?

One of the main reasons - "just can't", because your Ceph-based products will 
get worse at real (not benchmark) performance, see [1]


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/2E67NW6BEAVITL4WTAAU3DFLW7LJX477/


k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

> 
> I've been reading through this email list for a while now, but one thing
> that I'm curious about is why a lot of installations out there aren't
> upgraded to the latest version of CEPH (Quincy).
> 
> What are the main reasons for not upgrading to the latest and greatest?

If you are starting with a fresh install. Just go for the newest. Once you have 
it, don't go and install any update/upgrade blindly. Depending on your team 
size, it pays of to wait a bit and see what others are reporting. 

cephadm is a container implementation also 'new'. If you are not proficient 
with containers or have already a different container environment, maybe stick 
to the basic packages. This still seems to be the easiest to maintain.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

> What are the main reasons for not upgrading to the latest and greatest?

Because more often than not it isn't.

I guess when you write "latest and greatest" you talk about features. When we 
admins talk about "latest and greatest" we talk about stability. The times that 
one could jump with a production system onto a "stable" release with the ending 
.2 are long gone. Anyone who becomes an early adapter is more and more likely 
to experience serious issues. Which leads to more admins waiting with upgrades. 
Which in turn leads to more bugs discovered only at late releases. Which again 
makes more admins postpone an upgrade. A vicious cycle.

A long time ago there was a discussion about exactly this problem and the 
admins were pretty much in favor of increasing the release cadence to at least 
4 years if not longer. Its simply too many releases with too many serious bugs 
not fixed, lately not even during their official life time. Octopus still has 
serious bugs but is EOL.

I'm not surprised that admins give up on upgrading entirely and stay on a 
version until their system dies.

To give you one from my own experience, upgrading from mimic latest to octopus 
latest. This experience almost certainly applies to every upgrade that involves 
an OSD format change (the infamous "quick fix" that could take several days per 
OSD and crush entire clusters).

There is an OSD conversion involved in this upgrade and we found out that out 
of 2 possible upgrade paths, one leads to a heavily performance degraded 
cluster with no possibility to recover other than redeploying all OSDs step by 
step. Funnily enough, the problematic procedure is the one described in the 
documentation - it hasn't been updated until today despite users still getting 
caught in this trap.

To give you an idea of what amount of work is now involved in an attempt to 
avoid such pitfalls, here our path:

We set up a test cluster with a script producing realistic workload and started 
testing an upgrade under load. This took about a month (meaning repeating the 
upgrade with a cluster on mimic deployed and populated from scratch every time) 
to confirm that we managed to get onto a robust path avoiding a number of 
pitfalls along the way - mainly the serious performance degradation due to OSD 
conversion, but also an issue with stray entries plus noise. A month! Once we 
were convinced that it would work - meaning we did run it a couple of times 
without any further issues being discovered, we started upgrading our 
production cluster.

Went smooth until we started the OSD conversion of our FS meta data OSDs. They 
had a special performance optimized deployment resulting in a large number of 
100G OSDs with about 30-40% utilization. These OSDs started crashing with some 
weird corruption. Turns out - thanks Igor! - that while spill-over from fast to 
slow drive was handled, the other direction was not. Our OSDs crashed because 
Octopus apparently required substantially more space on the slow device and 
couldn't use the plenty of fast space that was actually available.

The whole thing ended in 3 days of complete downtime and me working 12 hour 
days on the weekend. We managed to recover from this only because we had a 
larger delivery of hardware already on-site and I could scavenge parts from 
there.

So, the story was that after 1 month of testing we still run into 3 days of 
downtime, because there was another unannounced change that broke a config that 
was working fine for years on mimic.

To say the same thing with different words: major version upgrades have become 
very disruptive and require a lot of effort to get halfway right. And I'm not 
talking about the deployment system here.

Add to this list the still open cases discussed on the list about MDS dentry 
corruption, snapshots disappearing/corrupting together with a lack of good 
built-in tools for detection and repair, performance degradation etc. all not 
even addressed in pacific. In this state the devs are pushing for pacific 
becoming EOL while at the same time the admins become ever more reluctant to 
upgrade.

In my specific case, I planned to upgrade at least to pacific this year, but my 
time budget simply doesn't allow for the verification of the procedure and 
checking that all for us relevant bugs have been addressed. I gave up. Maybe 
next year. Maybe then its even a bit closer to rock solid.

So to get back to my starting point, we admins actually value rock solid over 
features. I know that this is boring for devs, but nothing is worse than nobody 
using your latest and greatest - which probably was the motivation for your 
question. If the upgrade paths were more solid and things like the question 
"why does an OSD conversion not lead to an OSD that is identical to one 
deployed freshly" or "where does the performance go" would actually attempted 
to track down, we would be much less reluctant to upgrade.

And then, but only then, would the latest and greatest f

[ceph-users] Re: CEPH Version choice

> 
> We set up a test cluster with a script producing realistic workload and
> started testing an upgrade under load. This took about a month (meaning
> repeating the upgrade with a cluster on mimic deployed and populated

Hi Frank, do you have such scripts online? On github or so? I was thinking of 
compiling el9 rpms for Nautilus and run tests for a few days on a test cluster 
with mixed el7 and el9 hosts.

> 
> So to get back to my starting point, we admins actually value rock solid
> over features. I know that this is boring for devs, but nothing is worse
> than nobody using your latest and greatest - which probably was the
> motivation for your question. If the upgrade paths were more solid and
> things like the question "why does an OSD conversion not lead to an OSD
> that is identical to one deployed freshly" or "where does the
> performance go" would actually attempted to track down, we would be much
> less reluctant to upgrade.


> 
> I will bring it up here again: with the complexity that the code base
> reached now, the 2 year release cadence is way too fast, it doesn't
> provide sufficient maturity for upgrading fast as well. More and more
> admins will be several cycles behind and we are reaching the point where
> major bugs in so-called EOL versions will only be discovered before
> large clusters even reached this version. Which might become a
> fundamental blocker to upgrades entirely.

Indeed. 

> An alternative to increasing the release cadence would be to keep more
> cycles in the life-time loop instead of only the last 2 major releases.
> 4 years really is nothing when it comes to storage.
> 

I would like to see this change also.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] cephadm does not honor container_image default value

2023-05-15 Thread Daniel Krambrock


Hello.

I think i found a bug in cephadm/ceph orch:
Redeploying a container image (tested with alertmanager) after removing 
a custom `mgr/cephadm/container_image_alertmanager` value, deploys the 
previous container image and not the default container image.


I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and 
`ceph` version 17.2.6.


Here is an example. Node clrz20-08 is the node altermanager is running 
on, clrz20-01 the node I'm controlling ceph from:


* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == 
"alertmanager")| .container_image_name'

"quay.io/prometheus/alertmanager:v0.23.0"
```

* Set alertmanager image
```
root@clrz20-01:~# ceph config set mgr 
mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager
root@clrz20-01:~# ceph config get mgr 
mgr/cephadm/container_image_alertmanager

quay.io/prometheus/alertmanager
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == 
"alertmanager")| .container_image_name'

"quay.io/prometheus/alertmanager:latest"
```

* Remove alertmanager image setting, revert to default:
```
root@clrz20-01:~# ceph config rm mgr 
mgr/cephadm/container_image_alertmanager
root@clrz20-01:~# ceph config get mgr 
mgr/cephadm/container_image_alertmanager

quay.io/prometheus/alertmanager:v0.23.0
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == 
"alertmanager")| .container_image_name'

"quay.io/prometheus/alertmanager:latest"
```
-> `mgr/cephadm/container_image_alertmanager` is set to 
`quay.io/prometheus/alertmanager:v0.23.0`, but redeploy uses 
`quay.io/prometheus/alertmanager:latest`. This looks like a bug.


* Set alertmanager image explicitly to the default value
```
root@clrz20-01:~# ceph config set mgr 
mgr/cephadm/container_image_alertmanager 
quay.io/prometheus/alertmanager:v0.23.0
root@clrz20-01:~# ceph config get mgr 
mgr/cephadm/container_image_alertmanager

quay.io/prometheus/alertmanager:v0.23.0
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name == 
"alertmanager")| .container_image_name'

"quay.io/prometheus/alertmanager:v0.23.0"
```
-> Setting `mgr/cephadm/container_image_alertmanager` to the default 
setting fixes the issue.




Bests,
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Dedicated radosgw gateways

2023-05-15 Thread Michal Strnad


Hi all,

at Cephalocon 2023, it was mentioned several times that for service 
tasks such as data deletion via garbage collection or data replication 
in S3 via zoning, it is good to do them on dedicated radosgw gateways 
and not mix them with gateways used by users. How can this be achieved? 
How can we isolate these tasks? Will using dedicated keyrings instead of 
admin keys be sufficient? How do you operate this in your environment?


Thx
Michal





smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Dedicated radosgw gateways

2023-05-15 Thread Konstantin Shalygin

Hi,

> On 15 May 2023, at 14:58, Michal Strnad  wrote:
> 
> at Cephalocon 2023, it was mentioned several times that for service tasks 
> such as data deletion via garbage collection or data replication in S3 via 
> zoning, it is good to do them on dedicated radosgw gateways and not mix them 
> with gateways used by users. How can this be achieved? How can we isolate 
> these tasks? Will using dedicated keyrings instead of admin keys be 
> sufficient? How do you operate this in your environment?

Just:

# don't put client traffic to "dedicated radosgw gateways"
# disable lc/gc on "gateways used by users" via `rgw_enable_lc_threads = false` 
& `rgw_enable_gc_threads = false`


k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm does not honor container_image default value

I think with the `config set` commands there is logic to notify the
relevant mgr modules and update their values. That might not exist with
`config rm`, so it's still using the last set value. Looks like a real bug.
Curious what happens if the mgr restarts after the `config rm`. Whether it
goes back to the default image in that case or not. Might take a look later.

On Mon, May 15, 2023 at 7:37 AM Daniel Krambrock <
krambr...@hrz.uni-marburg.de> wrote:

> Hello.
>
> I think i found a bug in cephadm/ceph orch:
> Redeploying a container image (tested with alertmanager) after removing
> a custom `mgr/cephadm/container_image_alertmanager` value, deploys the
> previous container image and not the default container image.
>
> I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and
> `ceph` version 17.2.6.
>
> Here is an example. Node clrz20-08 is the node altermanager is running
> on, clrz20-01 the node I'm controlling ceph from:
>
> * Get alertmanager version
> ```
> root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
> "alertmanager")| .container_image_name'
> "quay.io/prometheus/alertmanager:v0.23.0"
> ```
>
> * Set alertmanager image
> ```
> root@clrz20-01:~# ceph config set mgr
> mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager
> root@clrz20-01:~# ceph config get mgr
> mgr/cephadm/container_image_alertmanager
> quay.io/prometheus/alertmanager
> ```
>
> * redeploy altermanager
> ```
> root@clrz20-01:~# ceph orch redeploy alertmanager
> Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
> ```
>
> * Get alertmanager version
> ```
> root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
> "alertmanager")| .container_image_name'
> "quay.io/prometheus/alertmanager:latest"
> ```
>
> * Remove alertmanager image setting, revert to default:
> ```
> root@clrz20-01:~# ceph config rm mgr
> mgr/cephadm/container_image_alertmanager
> root@clrz20-01:~# ceph config get mgr
> mgr/cephadm/container_image_alertmanager
> quay.io/prometheus/alertmanager:v0.23.0
> ```
>
> * redeploy altermanager
> ```
> root@clrz20-01:~# ceph orch redeploy alertmanager
> Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
> ```
>
> * Get alertmanager version
> ```
> root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
> "alertmanager")| .container_image_name'
> "quay.io/prometheus/alertmanager:latest"
> ```
> -> `mgr/cephadm/container_image_alertmanager` is set to
> `quay.io/prometheus/alertmanager:v0.23.0`
> , but redeploy uses
> `quay.io/prometheus/alertmanager:latest`
> . This looks like a bug.
>
> * Set alertmanager image explicitly to the default value
> ```
> root@clrz20-01:~# ceph config set mgr
> mgr/cephadm/container_image_alertmanager
> quay.io/prometheus/alertmanager:v0.23.0
> root@clrz20-01:~# ceph config get mgr
> mgr/cephadm/container_image_alertmanager
> quay.io/prometheus/alertmanager:v0.23.0
> ```
>
> * redeploy altermanager
> ```
> root@clrz20-01:~# ceph orch redeploy alertmanager
> Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
> ```
>
> * Get alertmanager version
> ```
> root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
> "alertmanager")| .container_image_name'
> "quay.io/prometheus/alertmanager:v0.23.0"
> ```
> -> Setting `mgr/cephadm/container_image_alertmanager` to the default
> setting fixes the issue.
>
>
>
> Bests,
> Daniel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS crashes to damaged metadata

2023-05-15 Thread Stefan Kooman


On 12/15/22 15:31, Stolte, Felix wrote:

Hi Patrick,

we used your script to repair the damaged objects on the weekend and it went 
smoothly. Thanks for your support.

We adjusted your script to scan for damaged files on a daily basis, runtime is 
about 6h. Until thursday last week, we had exactly the same 17 Files. On 
thursday at 13:05 a snapshot was created and our active mds crashed once at 
this time (snapshot was created):


Are you willing to share this script? I would like to use it to scan our 
CephFS before upgrading to 16.2.13. Do you run this script when the 
filesystem is online / active?


Thanks,

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Upgrade Ceph cluster + radosgw from 14.2.18 to latest 15

2023-05-15 Thread Wesley Dillingham

I have upgraded dozens of clusters 14 -> 16 using the methods described in
the docs, and when followed precisely no issues have arisen. I would
suggest moving to a release that is receiving backports still (pacific or
quincy). The important aspects are only doing one system at a time. In the
case of monitors ensuring it rejoins quorum after restarting on new version
before proceeding to next mon. In the case of OSDs waiting for all PGs to
be active+clean* before proceeding to the next host.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 

On Mon, May 15, 2023 at 3:46 AM Marc  wrote:

> why are you still not on 14.2.22?
>
> >
> > Yes, the documents show an example of upgrading from Nautilus to
> > Pacific. But I'm not really 100% trusting the Ceph documents, and I'm
> > also afraid of what if Nautilus is not compatible with Pacific in some
> > operations of monitor or osd =)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Octopus on Ubuntu 20.04.6 LTS with kernel 5

2023-05-15 Thread Szabo, Istvan (Agoda)

Hi,

Pacific and quincy still supports barematel deloyed setup?

Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

-Original Message-
From: Ilya Dryomov 
Sent: Thursday, May 11, 2023 3:39 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: Re: [ceph-users] Re: Octopus on Ubuntu 20.04.6 LTS with kernel 5

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

On Thu, May 11, 2023 at 7:13 AM Szabo, Istvan (Agoda)  
wrote:
>
> I can answer my question, even in the official ubuntu repo they are using by 
> default the octopus version so for sure it works with kernel 5.
>
> https://packages.ubuntu.com/focal/allpackages
>
>
> -Original Message-
> From: Szabo, Istvan (Agoda) 
> Sent: Thursday, May 11, 2023 11:20 AM
> To: Ceph Users 
> Subject: [ceph-users] Octopus on Ubuntu 20.04.6 LTS with kernel 5
>
> Hi,
>
> In octopus documentation we can see kernel 4 as recommended, however we've 
> changed our test cluster yesterday from centos 7 / 8 to Ubuntu 20.04.6 LTS 
> with kernel 5.4.0-148 and seems working, I just want to make sure before I 
> move to prod there isn't any caveats.

Hi Istvan,

Note that on https://docs.ceph.com/en/octopus/start/os-recommendations/
it starts with:

> If you are using the kernel client to map RBD block devices or mount
> CephFS, the general advice is to use a “stable” or “longterm
> maintenance” kernel series provided by either http://kernel.org or
> your Linux distribution on any client hosts.

The recommendation for 4.x kernels follows that just as a precaution against 
folks opting to stick to something older.  If your distribution provides 5.x or 
6.x stable kernels, by all means use them!

A word of caution though: Octopus was EOLed last year.  Please consider 
upgrading your cluster to a supported release -- preferably Quincy since 
Pacific is scheduled to go EOL sometime this year too.

Thanks,

Ilya

This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Quincy Ceph-orchestrator and multipath SAS

2023-05-15 Thread James Turner

CEPH configurations can be forced to use multipath but my experience is
that it is painful and manual at best. The orchestrator design criteria
supports low-cost/commodity hardware and multipath is a sophistication
not yet addressed. The orchestrator sees all of the available device paths
with no association and as a result it's not a good idea to use it for
device management in that environment. I've tried to construct a device
filter that looks like /dev/mpath* - but that doesn't work. You could try
to raise a feature request.

The good news is that once you have manually created a multipath OSD.. the
mainline OSD code recognizes and treats it appropriately - it knows the
relationship between "dm" and "mpath" devices. Just make sure that you use
a multipath device name when you create the device (LVM or otherwise)
passed to ceph-volume.

On Fri, May 12, 2023 at 11:17 AM Deep Dish  wrote:

> Hello,
>
> I have a few hosts about to add into a cluster that have a multipath
> storage config for SAS devices.Is this supported on Quincy, and how
> would ceph-orchestrator and / or ceph-volume handle multipath storage?
>
> Here's a snip of lsblk output of a host in question:
>
> # lsblk
>
> NAME  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
>
> ...
>
> sdc 8:32   0   9.1T  0 disk
>
> └─mpathh  253:40   9.1T  0 mpath
>
> sdd 8:48   0   9.1T  0 disk
>
> └─mpathi  253:50   9.1T  0 mpath
>
> sde 8:64   0   7.3T  0 disk
>
> └─mpathj  253:60   7.3T  0 mpath
>
> sdf 8:80   0   7.3T  0 disk
>
> └─mpathl  253:70   7.3T  0 mpath
>
> sdg 8:96   0   7.2T  0 disk
>
> └─mpathk  253:80   7.2T  0 mpath
>
> sdh 8:112  0   7.3T  0 disk
>
> └─mpathe  253:90   7.3T  0 mpath
>
> sdi 8:128  0   7.3T  0 disk
>
> └─mpathg  253:10   0   7.3T  0 mpath
>
> sdj 8:144  0   7.3T  0 disk
>
> └─mpathf  253:11   0   7.3T  0 mpath
>
> sdk 8:160  0   7.3T  0 disk
>
> └─mpathc  253:12   0   7.3T  0 mpath
>
> sdl 8:176  0   7.3T  0 disk
>
> └─mpathd  253:13   0   7.3T  0 mpath
>
> ...
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Joachim Kraftmayer - ceph ambassador

Hi,

I know the problems that Frank has raised. However, it should also be
mentioned that many critical bugs have been fixed in the major versions.
We are working on the fixes ourselves.

We and others have written a lot of tools for ourselves in the last 10
years to improve migration/update and upgrade paths/strategy.

From version to version, we also test for up to 6 months before putting
them into production.

However, our goal is always to use Ceph versions that still get
backports and on the other hand, only use the features we really need.
Our developers also always aim to bring bug fixes upstream and into the
supported versions.

By the way, regarding performance I recommend the Cephalocon
presentations by Adam and Mark. There you can learn what efforts are
made to improve ceph performance for current and future versions.

Regards, Joachim

___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 15.05.23 um 12:11 schrieb Frank Schilder:

What are the main reasons for not upgrading to the latest and greatest?

Because more often than not it isn't.

I guess when you write "latest and greatest" you talk about features. When we admins talk about
"latest and greatest" we talk about stability. The times that one could jump with a production
system onto a "stable" release with the ending .2 are long gone. Anyone who becomes an early
adapter is more and more likely to experience serious issues. Which leads to more admins waiting with
upgrades. Which in turn leads to more bugs discovered only at late releases. Which again makes more admins
postpone an upgrade. A vicious cycle.

A long time ago there was a discussion about exactly this problem and the
admins were pretty much in favor of increasing the release cadence to at least
4 years if not longer. Its simply too many releases with too many serious bugs
not fixed, lately not even during their official life time. Octopus still has
serious bugs but is EOL.

I'm not surprised that admins give up on upgrading entirely and stay on a
version until their system dies.

To give you one from my own experience, upgrading from mimic latest to octopus latest.
This experience almost certainly applies to every upgrade that involves an OSD format
change (the infamous "quick fix" that could take several days per OSD and crush
entire clusters).

There is an OSD conversion involved in this upgrade and we found out that out
of 2 possible upgrade paths, one leads to a heavily performance degraded
cluster with no possibility to recover other than redeploying all OSDs step by
step. Funnily enough, the problematic procedure is the one described in the
documentation - it hasn't been updated until today despite users still getting
caught in this trap.

To give you an idea of what amount of work is now involved in an attempt to
avoid such pitfalls, here our path:

We set up a test cluster with a script producing realistic workload and started
testing an upgrade under load. This took about a month (meaning repeating the
upgrade with a cluster on mimic deployed and populated from scratch every time)
to confirm that we managed to get onto a robust path avoiding a number of
pitfalls along the way - mainly the serious performance degradation due to OSD
conversion, but also an issue with stray entries plus noise. A month! Once we
were convinced that it would work - meaning we did run it a couple of times
without any further issues being discovered, we started upgrading our
production cluster.

Went smooth until we started the OSD conversion of our FS meta data OSDs. They
had a special performance optimized deployment resulting in a large number of
100G OSDs with about 30-40% utilization. These OSDs started crashing with some
weird corruption. Turns out - thanks Igor! - that while spill-over from fast to
slow drive was handled, the other direction was not. Our OSDs crashed because
Octopus apparently required substantially more space on the slow device and
couldn't use the plenty of fast space that was actually available.

The whole thing ended in 3 days of complete downtime and me working 12 hour
days on the weekend. We managed to recover from this only because we had a
larger delivery of hardware already on-site and I could scavenge parts from
there.

So, the story was that after 1 month of testing we still run into 3 days of
downtime, because there was another unannounced change that broke a config that
was working fine for years on mimic.

To say the same thing with different words: major version upgrades have become
very disruptive and require a lot of effort to get halfway right. And I'm not
talking about the deployment system here.

Add to this list the still open cases discussed on the list about MDS dentry
corruption, snapshots disappearing/corrupting together with a lack of good
built-in tools for detection and repair, perfo

[ceph-users] Re: CEPH Version choice

> 
> By the way, regarding performance I recommend the Cephalocon
> presentations by Adam and Mark. There you can learn what efforts are
> made to improve ceph performance for current and future versions.
> 

Link?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Jens Galsgaard

https://www.youtube.com/playlist?list=PLrBUGiINAakPd9nuoorqeOuS9P9MTWos3

-Original Message-
From: Marc  
Sent: Monday, May 15, 2023 4:42 PM
To: Joachim Kraftmayer - ceph ambassador ; Frank 
Schilder ; Tino Todino 
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: CEPH Version choice

> 
> By the way, regarding performance I recommend the Cephalocon 
> presentations by Adam and Mark. There you can learn what efforts are 
> made to improve ceph performance for current and future versions.
> 

Link?
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm does not honor container_image default value

2023-05-15 Thread Joachim Kraftmayer - ceph ambassador

Don't know if it helps, but we have also experienced something similar 
with osd images. We changed the image tag from version to sha and it did 
not happen again.


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 15.05.23 um 14:50 schrieb Adam King:

I think with the `config set` commands there is logic to notify the
relevant mgr modules and update their values. That might not exist with
`config rm`, so it's still using the last set value. Looks like a real bug.
Curious what happens if the mgr restarts after the `config rm`. Whether it
goes back to the default image in that case or not. Might take a look later.

On Mon, May 15, 2023 at 7:37 AM Daniel Krambrock <
krambr...@hrz.uni-marburg.de> wrote:


Hello.

I think i found a bug in cephadm/ceph orch:
Redeploying a container image (tested with alertmanager) after removing
a custom `mgr/cephadm/container_image_alertmanager` value, deploys the
previous container image and not the default container image.

I'm running `cephadm` from ubuntu 22.04 pkg 17.2.5-0ubuntu0.22.04.3 and
`ceph` version 17.2.6.

Here is an example. Node clrz20-08 is the node altermanager is running
on, clrz20-01 the node I'm controlling ceph from:

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
"alertmanager")| .container_image_name'
"quay.io/prometheus/alertmanager:v0.23.0"
```

* Set alertmanager image
```
root@clrz20-01:~# ceph config set mgr
mgr/cephadm/container_image_alertmanager quay.io/prometheus/alertmanager
root@clrz20-01:~# ceph config get mgr
mgr/cephadm/container_image_alertmanager
quay.io/prometheus/alertmanager
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
"alertmanager")| .container_image_name'
"quay.io/prometheus/alertmanager:latest"
```

* Remove alertmanager image setting, revert to default:
```
root@clrz20-01:~# ceph config rm mgr
mgr/cephadm/container_image_alertmanager
root@clrz20-01:~# ceph config get mgr
mgr/cephadm/container_image_alertmanager
quay.io/prometheus/alertmanager:v0.23.0
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
"alertmanager")| .container_image_name'
"quay.io/prometheus/alertmanager:latest"
```
-> `mgr/cephadm/container_image_alertmanager` is set to
`quay.io/prometheus/alertmanager:v0.23.0`
, but redeploy uses
`quay.io/prometheus/alertmanager:latest`
. This looks like a bug.

* Set alertmanager image explicitly to the default value
```
root@clrz20-01:~# ceph config set mgr
mgr/cephadm/container_image_alertmanager
quay.io/prometheus/alertmanager:v0.23.0
root@clrz20-01:~# ceph config get mgr
mgr/cephadm/container_image_alertmanager
quay.io/prometheus/alertmanager:v0.23.0
```

* redeploy altermanager
```
root@clrz20-01:~# ceph orch redeploy alertmanager
Scheduled to redeploy alertmanager.clrz20-08 on host 'clrz20-08'
```

* Get alertmanager version
```
root@clrz20-08:~# cephadm ls | jq '.[] | select(.service_name ==
"alertmanager")| .container_image_name'
"quay.io/prometheus/alertmanager:v0.23.0"
```
-> Setting `mgr/cephadm/container_image_alertmanager` to the default
setting fixes the issue.



Bests,
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Joachim Kraftmayer - ceph ambassador


Adam & Mark topics: bluestore and bluestore v2

https://youtu.be/FVUoGw6kY5k

https://youtu.be/7D5Bgd5TuYw


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 15.05.23 um 16:47 schrieb Jens Galsgaard:

https://www.youtube.com/playlist?list=PLrBUGiINAakPd9nuoorqeOuS9P9MTWos3


-Original Message-
From: Marc 
Sent: Monday, May 15, 2023 4:42 PM
To: Joachim Kraftmayer - ceph ambassador ; Frank Schilder 
; Tino Todino 
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: CEPH Version choice


By the way, regarding performance I recommend the Cephalocon
presentations by Adam and Mark. There you can learn what efforts are
made to improve ceph performance for current and future versions.


Link?
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

Hi Marc,

I planned to put it on-line. The hold-back is that the main test is un-taring a 
nasty archive and this archive might contain personal information, so I can't 
just upload it as is. I can try to put together a similar archive from public 
sources. Please give me a bit of time. I'm also a bit under stress right now 
with our users being hit by an FS meta data corruption. That's also why I'm a 
bit trigger happy.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Marc 
Sent: Monday, May 15, 2023 1:03 PM
To: Frank Schilder; Tino Todino
Cc: ceph-users@ceph.io; d...@ceph.io
Subject: RE: [ceph-users] Re: CEPH Version choice

>
> We set up a test cluster with a script producing realistic workload and
> started testing an upgrade under load. This took about a month (meaning
> repeating the upgrade with a cluster on mimic deployed and populated

Hi Frank, do you have such scripts online? On github or so? I was thinking of 
compiling el9 rpms for Nautilus and run tests for a few days on a test cluster 
with mixed el7 and el9 hosts.

>
> So to get back to my starting point, we admins actually value rock solid
> over features. I know that this is boring for devs, but nothing is worse
> than nobody using your latest and greatest - which probably was the
> motivation for your question. If the upgrade paths were more solid and
> things like the question "why does an OSD conversion not lead to an OSD
> that is identical to one deployed freshly" or "where does the
> performance go" would actually attempted to track down, we would be much
> less reluctant to upgrade.


>
> I will bring it up here again: with the complexity that the code base
> reached now, the 2 year release cadence is way too fast, it doesn't
> provide sufficient maturity for upgrading fast as well. More and more
> admins will be several cycles behind and we are reaching the point where
> major bugs in so-called EOL versions will only be discovered before
> large clusters even reached this version. Which might become a
> fundamental blocker to upgrades entirely.

Indeed.

> An alternative to increasing the release cadence would be to keep more
> cycles in the life-time loop instead of only the last 2 major releases.
> 4 years really is nothing when it comes to storage.
>

I would like to see this change also.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm


Hi,

I tried a lot of different approaches but I didn't have any success so far.

"ceph orch ps" still doesn't get refreshed.

Some examples:

mds.mds01.ceph06.huavsw  ceph06   starting  - 
--- 
mds.mds01.ceph06.rrxmks  ceph06   error4w ago 
3M-- 
mds.mds01.ceph07.omdisd  ceph07   error4w ago 
4M-- 
mds.mds01.ceph07.vvqyma  ceph07   starting  - 
--- 
mgr.ceph04.qaexpvceph04  *:8443,9283  running (4w) 4w ago 
10M 551M-  17.2.6 9cea3956c04b  33df84e346a0
mgr.ceph05.jcmkbbceph05  *:8443,9283  running (4w) 4w ago 
4M 441M-  17.2.6 9cea3956c04b  1ad485df4399
mgr.ceph06.xbduufceph06  *:8443,9283  running (4w) 4w ago 
4M 432M-  17.2.6 9cea3956c04b  5ba5fd95dc48
mon.ceph04   ceph04   running (4w) 4w ago 
4M 223M2048M  17.2.6 9cea3956c04b  8b6116dd216f
mon.ceph05   ceph05   running (4w) 4w ago 
4M 326M2048M  17.2.6 9cea3956c04b  70520d737f29


Debug Log doesn't show anything that could help me, either.

2023-05-15T14:48:40.852088+ mgr.ceph05.jcmkbb (mgr.83897390) 1376 : 
cephadm [INF] Schedule start daemon mds.mds01.ceph04.hcmvae
2023-05-15T14:48:43.620700+ mgr.ceph05.jcmkbb (mgr.83897390) 1380 : 
cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.hcmvae
2023-05-15T14:48:45.124822+ mgr.ceph05.jcmkbb (mgr.83897390) 1392 : 
cephadm [INF] Schedule start daemon mds.mds01.ceph04.krxszj
2023-05-15T14:48:46.493902+ mgr.ceph05.jcmkbb (mgr.83897390) 1394 : 
cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.krxszj
2023-05-15T15:05:25.637079+ mgr.ceph05.jcmkbb (mgr.83897390) 2629 : 
cephadm [INF] Saving service mds.mds01 spec with placement count:2
2023-05-15T15:07:27.625773+ mgr.ceph05.jcmkbb (mgr.83897390) 2780 : 
cephadm [INF] Saving service mds.fs_name spec with placement count:3
2023-05-15T15:07:42.120912+ mgr.ceph05.jcmkbb (mgr.83897390) 2795 : 
cephadm [INF] Saving service mds.mds01 spec with placement count:3


I'm seeing all the commands I give but I don't get any more information 
on why it's not actually happening.


I tried to change different scheduling mechanisms. Host, Tag, unmanaged 
and back again. I turned off orchestration and resumed. I failed mgr. I 
even had full cluster stops (in the past). I made sure all daemons run 
the same version. (If you remember, upgrade failed underway).


So my only way of getting daemons only is manually. I added two more 
hosts, tagged them. But there isn't a single daemon started there.


Could you help me again with how to debug orchestration not working?


On 04.05.23 15:12, Thomas Widhalm wrote:

Thanks.

I set the log level to debug, try a few steps and then come back.

On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph config set mgr mgr/cephadm/log_level debug

This should provide more details what the mgr is trying and where it's 
failing, hopefully. Last week this helped to identify an issue between 
a lower pacific issue for me.
Do you see anything in the cephadm.log pointing to the mgr actually 
trying something?



Zitat von Thomas Widhalm :


Hi,

I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but 
the following problem existed when I was still everywhere on 17.2.5 .


I had a major issue in my cluster which could be solved with a lot of 
your help and even more trial and error. Right now it seems that most 
is already fixed but I can't rule out that there's still some problem 
hidden. The very issue I'm asking about started during the repair.


When I want to orchestrate the cluster, it logs the command but it 
doesn't do anything. No matter if I use ceph dashboard or "ceph orch" 
in "cephadm shell". I don't get any error message when I try to 
deploy new services, redeploy them etc. The log only says "scheduled" 
and that's it. Same when I change placement rules. Usually I use 
tags. But since they don't work anymore, too, I tried host and 
umanaged. No success. The only way I can actually start and stop 
containers is via systemctl from the host itself.


When I run "ceph orch ls" or "ceph orch ps" I see services I deployed 
for testing being deleted (for weeks now). Ans especially a lot of 
old MDS are listed as "error" or "starting". The list doesn't match 
reality at all because I had to start them by hand.


I tried "ceph mgr fail" and even a complete shutdown of the whole 
cluster with all nodes including all mgs, mds even osd - everything 
during a maintenance window. Didn't change anything.


Could you help me? To be honest I'm still rather new to Ceph and 
since I didn't find anything in the logs that caught my eye I would 
be thankful for hints how to debug.


Cheers,
Thomas
--
http://www.widhalm.or.at
GnuPG :

[ceph-users] Re: CEPH Version choice

Hi all, to avoid a potentially wrong impression I would like to add some words.

Slightly out of order:

> By the way, regarding performance I recommend the Cephalocon
> presentations by Adam and Mark. There you can learn what efforts are
> made to improve ceph performance for current and future versions.

For me personally the ceph performance degradation due to removing the WAL 
re-use is not a problem as it is predictable and the reasons for removing it 
are solid. A bit more worrying is the degradation over time and I know that 
there is work spent on it and it is expensive to debug, because collecting data 
takes so long. I mentioned performance mainly because there was at least one 
other user who explicitly called this a show-stopper.

I appreciate the effort put into this and am not complaining about a lack of 
effort. What I am complaining about is that this effort is under unnecessary 
pressure due to the short release cadence. 2 years are not much to mature for a 
system like ceph and starting to count with the .2 release seems a bit 
premature given recent experience.

> I know the problems that Frank has raised. However, it should also be
> mentioned that many critical bugs have been fixed in the major versions.
> We are working on the fixes ourselves.

Again, I know all this and I very much appreciate it. I still consider the 
voluntary ceph support as better than support we got for enterprise systems. I 
got a lot of invaluable help from Igor during my upgrade experience and I got 
some important stuff fixed by Xiubo recently. Just to repeat it though, I am 
convinced one could reach much higher maturity with less time pressure - and 
maybe less often forget this one critical PR that causes so much trouble for 
some users.

> However, our goal is always to use Ceph versions that still get
> backports and on the other hand, only use the features we really need.
> Our developers also always aim to bring bug fixes upstream and into the
> supported versions.

I would love to, but the speed of counting versions up is too fast and the 
problems trying to keep up are a bit too much for a one-man army. After last 
time I have a hard time now convincing users to take another possible hit.

If there was a bit more time to take a breath after the last upgrade, I would 
probably be able to do it. However, with my current experience I look at not 
being able to catch up for the time being. We might even fall further behind. 
Which is a shame because we operate a large installation and tend to discover 
relevant bugs that don't show up in smaller systems with less load.

My wish and hypothesis simply are that if we would reduce the speed of major 
release cycles, a lot more operators would be able to follow and the releases 
and upgrade procedures would be significantly more stable just for the fact 
that more clusters are continuously closer to latest.

For example, don't start counting the life time with the release of the .2 
release of a major version, start when 50% of the top-10 (25/50) sized clusters 
in telemetry are on or above that version and declare a version EOL if 90% of 
these clusters are at a newer major release. This would give a much better 
indication of the perceived maturity of major releases by operators of 
significant installations. It would also give an incentive to submit data to 
telemetry as well as helping the last few % over the upgrade hurdle to be able 
to declare a version as EOL.

Kind of a community where the ones who fall behind get an extra helping hand so 
everyone can move on.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Joachim Kraftmayer - ceph ambassador 
Sent: Monday, May 15, 2023 4:34 PM
To: Frank Schilder; Tino Todino
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: CEPH Version choice

Hi,


I know the problems that Frank has raised. However, it should also be
mentioned that many critical bugs have been fixed in the major versions.
We are working on the fixes ourselves.

We and others have written a lot of tools for ourselves in the last 10
years to improve migration/update and upgrade paths/strategy.

 From version to version, we also test for up to 6 months before putting
them into production.

However, our goal is always to use Ceph versions that still get
backports and on the other hand, only use the features we really need.
Our developers also always aim to bring bug fixes upstream and into the
supported versions.

By the way, regarding performance I recommend the Cephalocon
presentations by Adam and Mark. There you can learn what efforts are
made to improve ceph performance for current and future versions.

Regards, Joachim


___
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 15.05.23 um 12:11 schrieb Frank Schilder:
>> What are the main reasons for not upgra

[ceph-users] Re: mds dump inode crashes file system

Dear Xiubo,

I uploaded the cache dump, the MDS log and the dmesg log containing the 
snaptrace dump to

ceph-post-file: 763955a3-7d37-408a-bbe4-a95dc687cd3f

Sorry, I forgot to add user and description this time.

A question about trouble shooting. I'm pretty sure I know the path where the 
error is located. Would a "ceph tell mds.1 scrub start / recursive repair" be 
able to discover and fix broken snaptraces? If not I'm awaiting further 
instructions.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Friday, May 12, 2023 3:44 PM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: mds dump inode crashes file system


On 5/12/23 20:27, Frank Schilder wrote:
> Dear Xiubo and others.
>
>>> I have never heard about that option until now. How do I check that and how 
>>> to I disable it if necessary?
>>> I'm in meetings pretty much all day and will try to send some more info 
>>> later.
>> $ mount|grep ceph
> I get
>
> MON-IPs:SRC on DST type ceph 
> (rw,relatime,name=con-fs2-rit-pfile,secret=,noshare,acl,mds_namespace=con-fs2,_netdev)
>
> so async dirop seems disabled.

Yeah.


>> Yeah, the kclient just received a corrupted snaptrace from MDS.
>> So the first thing is you need to fix the corrupted snaptrace issue in 
>> cephfs and then continue.
> Ooookaaa. I will take it as a compliment that you seem to assume I know 
> how to do that. The documentation gives 0 hits. Could you please provide me 
> with instructions of what to look for and/or what to do first?

There is no doc about this as I know.

>> If possible you can parse the above corrupted snap message to check what 
>> exactly corrupted.
>> I haven't get a chance to do that.
> Again, how would I do that? Is there some documentation and what should I 
> expect?

Currently there is no easy way to do this as I know, last time I have
parsed the corrupted binary data to the corresponding message manully.

And then we could know what exactly has happened for the snaptrace.


>> You seems didn't enable the 'osd blocklist' cephx auth cap for mon:
> I can't find anything about an osd blocklist client auth cap in the 
> documentation. Is this something that came after octopus? Our caps are as 
> shown in the documentation for a ceph fs client 
> (https://docs.ceph.com/en/octopus/cephfs/client-auth/), the one for mon is 
> "allow r":
>
>  caps mds = "allow rw path=/shares"
>  caps mon = "allow r"
>  caps osd = "allow rw tag cephfs data=con-fs2"
Yeah, it seems the 'osd blocklist' was disabled. As I remembered if
enabled it should be something likes:

caps mon = "allow r, allow command \"osd blocklist\""

>
>> I checked that but by reading the code I couldn't get what had cause the MDS 
>> crash.
>> There seems something wrong corrupt the metadata in cephfs.
> He wrote something about an invalid xattrib (empty value). It would be really 
> helpful to get a clue how to proceed. I managed to dump the MDS cache with 
> the critical inode in cache. Would this help with debugging? I also managed 
> to get debug logs with debug_mds=20 during a crash caused by an "mds dump 
> inode" command. Would this contain something interesting? I can also pull the 
> rados objects out and can upload all of these files.

Yeah, possibly. Where is the logs ?


> I managed to track the problem down to a specific folder with a few files 
> (I'm not sure if this coincides with the snaptrace issue, we might have 2 
> issues here). I made a copy of the folder and checked that an "mds dump 
> inode" for the copy does not crash the MDS. I then moved the folders for 
> which this command causes a crash to a different location outside the mounts. 
> Do you think this will help? I'm wondering if after taking our daily snapshot 
> tomorrow we end up in the degraded situation again.
>
> I really need instructions for how to check what is broken without an MDS 
> crash and then how to fix it.

Firstly we need to know where the corrupted metadata is.

I think the mds debug logs and the above corrupted snaptrace could help.
Need to parse that corrupted binary data.

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] NFS export of 2 disjoint sub-dir mounts

Hi all,

I have a problem with exporting 2 different sub-folder ceph-fs kernel mounts 
via nfsd to the same IP address. The top-level structure on the ceph fs is 
something like /A/S1 and /A/S2. On a file server I mount /A/S1 and /A/S2 as two 
different file systems under /mnt/S1 and /mnt/S2 using the ceph fs kernel 
client. Then, these 2 mounts are exported with lines like these in /etc/exports:

/mnt/S1 -options NET
/mnt/S2 -options IP

IP is an element of NET, meaning that the host at IP should be the only host 
being able to access /mnt/S1 and /mnt/S2. What we observe is that any attempt 
to mount the export /mnt/S1 on the host at IP results in /mnt/S2 being mounted 
instead.

My first guess was that here we have a clash of fsids and the ceph fs is simply 
reporting the same fsid to nfsd and, hence, nfsd thinks both mountpoints 
contain the same. So I modified the second export line to

/mnt/S2 -options,fsid=100 IP

to no avail. The two folders are completely disjoint, neither symlinks nor 
hard-links between them. So it should be safe to export these as 2 different 
file systems.

Exporting such constructs to non-overlapping networks/IPs works as expected - 
even when exporting subdirs of a dir (like exporting /A/B and /A/B/C from the 
same file server to strictly different IPs). It seems the same-IP config that 
breaks expectations.

Am I missing here a magic -yes-i-really-know-what-i-am-doing hack? The file 
server is on AlmaLinux release 8.7 (Stone Smilodon) and all ceph packages match 
the ceph version octopus latest of our cluster.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

This is sort of similar to what I said in a previous email, but the only
way I've seen this happen in other setups is through hanging cephadm
commands. The debug process has been, do a mgr failover, wait a few
minutes, see in "ceph orch ps" and "ceph orch device ls" which hosts have
and have not been refreshed (the REFRESHED column should be some lower
value on the hosts where it refreshed), go to the hosts where it did not
refresh and check "ps aux | grep cephadm" looking for long running (and
therefore most likely hung) processes. I would still expect that's the most
likely thing you're experiencing here. I haven't seen any other causes for
cephadm to not refresh unless the module crashed, but that would be
explicitly stated in the cluster health.

On Mon, May 15, 2023 at 11:44 AM Thomas Widhalm 
wrote:

> Hi,
>
> I tried a lot of different approaches but I didn't have any success so far.
>
> "ceph orch ps" still doesn't get refreshed.
>
> Some examples:
>
> mds.mds01.ceph06.huavsw  ceph06   starting  -
> --- 
> mds.mds01.ceph06.rrxmks  ceph06   error4w ago
> 3M-- 
> mds.mds01.ceph07.omdisd  ceph07   error4w ago
> 4M-- 
> mds.mds01.ceph07.vvqyma  ceph07   starting  -
> --- 
> mgr.ceph04.qaexpvceph04  *:8443,9283  running (4w) 4w ago
> 10M 551M-  17.2.6 9cea3956c04b  33df84e346a0
> mgr.ceph05.jcmkbbceph05  *:8443,9283  running (4w) 4w ago
> 4M 441M-  17.2.6 9cea3956c04b  1ad485df4399
> mgr.ceph06.xbduufceph06  *:8443,9283  running (4w) 4w ago
> 4M 432M-  17.2.6 9cea3956c04b  5ba5fd95dc48
> mon.ceph04   ceph04   running (4w) 4w ago
> 4M 223M2048M  17.2.6 9cea3956c04b  8b6116dd216f
> mon.ceph05   ceph05   running (4w) 4w ago
> 4M 326M2048M  17.2.6 9cea3956c04b  70520d737f29
>
> Debug Log doesn't show anything that could help me, either.
>
> 2023-05-15T14:48:40.852088+ mgr.ceph05.jcmkbb (mgr.83897390) 1376 :
> cephadm [INF] Schedule start daemon mds.mds01.ceph04.hcmvae
> 2023-05-15T14:48:43.620700+ mgr.ceph05.jcmkbb (mgr.83897390) 1380 :
> cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.hcmvae
> 2023-05-15T14:48:45.124822+ mgr.ceph05.jcmkbb (mgr.83897390) 1392 :
> cephadm [INF] Schedule start daemon mds.mds01.ceph04.krxszj
> 2023-05-15T14:48:46.493902+ mgr.ceph05.jcmkbb (mgr.83897390) 1394 :
> cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.krxszj
> 2023-05-15T15:05:25.637079+ mgr.ceph05.jcmkbb (mgr.83897390) 2629 :
> cephadm [INF] Saving service mds.mds01 spec with placement count:2
> 2023-05-15T15:07:27.625773+ mgr.ceph05.jcmkbb (mgr.83897390) 2780 :
> cephadm [INF] Saving service mds.fs_name spec with placement count:3
> 2023-05-15T15:07:42.120912+ mgr.ceph05.jcmkbb (mgr.83897390) 2795 :
> cephadm [INF] Saving service mds.mds01 spec with placement count:3
>
> I'm seeing all the commands I give but I don't get any more information
> on why it's not actually happening.
>
> I tried to change different scheduling mechanisms. Host, Tag, unmanaged
> and back again. I turned off orchestration and resumed. I failed mgr. I
> even had full cluster stops (in the past). I made sure all daemons run
> the same version. (If you remember, upgrade failed underway).
>
> So my only way of getting daemons only is manually. I added two more
> hosts, tagged them. But there isn't a single daemon started there.
>
> Could you help me again with how to debug orchestration not working?
>
>
> On 04.05.23 15:12, Thomas Widhalm wrote:
> > Thanks.
> >
> > I set the log level to debug, try a few steps and then come back.
> >
> > On 04.05.23 14:48, Eugen Block wrote:
> >> Hi,
> >>
> >> try setting debug logs for the mgr:
> >>
> >> ceph config set mgr mgr/cephadm/log_level debug
> >>
> >> This should provide more details what the mgr is trying and where it's
> >> failing, hopefully. Last week this helped to identify an issue between
> >> a lower pacific issue for me.
> >> Do you see anything in the cephadm.log pointing to the mgr actually
> >> trying something?
> >>
> >>
> >> Zitat von Thomas Widhalm :
> >>
> >>> Hi,
> >>>
> >>> I'm in the process of upgrading my cluster from 17.2.5 to 17.2.6 but
> >>> the following problem existed when I was still everywhere on 17.2.5 .
> >>>
> >>> I had a major issue in my cluster which could be solved with a lot of
> >>> your help and even more trial and error. Right now it seems that most
> >>> is already fixed but I can't rule out that there's still some problem
> >>> hidden. The very issue I'm asking about started during the repair.
> >>>
> >>> When I want to orchestrate the cluster, it logs the command but it
> >>> doesn't do anything. No matter if I use ceph dashboard or "ceph orch"
> >>> in "cephadm shel

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Daniel Baumann

On 5/15/23 12:11, Frank Schilder wrote:
> Because more often than not it isn't.

Sadly, I have to agree. We basically gave up after luminous, where every
update (on our test-ceph cluster) was a major pain. Until then, we
always updated after one week of a new release.

To add one more point..

The current Ceph version (17.x) will not be included in the upcoming
Debian 12 release to be released later this summer. This hasn't been a
problem in the past, because we just built our own backports and
everything was fine.

Nowadays, Ceph 17 doesn't even build on Debian unstable/testing because
some libraries (mostly fmtlib and others) are too new (sic!), so.. we'll
be staying with Ceph 16 on Debian 12 until we'll trash the hardware.

..or in other words: it would be nice if you could more efforts into
upgrade-tests/QA as well as on releasing stuff that actually compiles on
non-RHEL/current Linux distributions.

Regards,
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-15 Thread Beaman, Joshua

Patrick,

Sorry for delayed response.  This seems to be the limit of assistance I’m 
capable of providing.  My deployments are all ubuntu and bootstrapped (or 
upgraded) according to this starting doc:
https://docs.ceph.com/en/quincy/cephadm/install/#cephadm-deploying-new-cluster

It is very confusing to me that cephadm and ceph-volume are able to zap the 
device, but cephadm ceph-volume inventory shows nothing.  It’s even more 
perplexing to me, because on my systems even the OS disks are listed as not 
available.

Maybe someone else here as an idea what’s going on.

One last difference that might be a place for you to investigate.  I’m using 
docker, so perhaps your podman installation is somehow limiting direct access 
to the disk devices?

Best of luck,
Josh Beaman

From: Patrick Begou 
Date: Saturday, May 13, 2023 at 3:33 AM
To: Beaman, Joshua , ceph-users 
Subject: Re: [EXTERNAL] [ceph-users] [Pacific] ceph orch device ls do not 
returns any HDD
Hi Joshua,

I've tried these commands but it looks like CEPH is unable to see and configure 
these HDDs.
[root@mostha1 ~]# cephadm ceph-volume inventory
Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544

Device Path   Size Device nodesrotates available Model 
name
[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
[ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]# ceph-volume 
lvm zap /dev/sdb
--> Zapping: /dev/sdb
--> --destroy was not specified, but zapping a whole device will remove the 
partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10 conv=fsync
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
--> Zapping successful for: 
I can check that /dev/sdb1 has been erased, so previous command is successful
[ceph: root@mostha1 ceph]# lsblk
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda8:01 232.9G  0 disk
|-sda1 8:11   3.9G  0 part /rootfs/boot
|-sda2 8:21  78.1G  0 part
| `-osvg-rootvol 253:00  48.8G  0 lvm  /rootfs
|-sda3 8:31   3.9G  0 part [SWAP]
`-sda4 8:41 146.9G  0 part
  |-secretvg-homevol 253:10   9.8G  0 lvm  /rootfs/home
  |-secretvg-tmpvol  253:20   9.8G  0 lvm  /rootfs/tmp
  `-secretvg-varvol  253:30   9.8G  0 lvm  /rootfs/var
sdb8:16   1 465.8G  0 disk
sdc8:32   1 232.9G  0 disk

But still no visible HDD:

[ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
[ceph: root@mostha1 ceph]# ceph orch device ls
[ceph: root@mostha1 ceph]#

May be I have done something bad at install time as in the container I've 
unintentionally run:

dnf -y install
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm

(an awful copy/paste launching the command). Can this break The container ? I 
do not know what should be available as ceph packages in the container to 
remove properly this install (no dnf.log file in the container)

Patrick


Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :

The most significant point I see there, is you have no OSD service spec to tell 
orchestrator how to deploy OSDs.  The easiest fix for that would be “ceph orch 
apply osd --all-available-devices”
This will create a simple spec that should work for a test environment.  Most 
likely it will collocate the block, block.db, and WAL all on the same device.  
Not ideal for prod environments, but fine for practice and testing.

The other command I should have had you try is “cephadm ceph-volume inventory”. 
 That should show you the devices available for OSD deployment, and hopefully 
matches up to what your “lsblk” shows.  If you need to zap HDDs and 
orchestrator is still not seeing them, you can try “cephadm ceph-volume lvm zap 
/dev/sdb”

Thank you,
Josh Beaman

From: Patrick Begou 

Date: Friday, May 12, 2023 at 2:22 PM
To: Beaman, Joshua 
, ceph-users 

Subject: Re: [EXTERNAL] [ceph-users] [Pacific] ceph orch device ls do not 
returns any HDD
Hi Joshua and thanks for this quick reply.

At this step I have only one node. I was checking what ceph was returning with 
different commands on this host before adding new hosts. Just to compare with 
my first Octopus install. As this hardware is for testing only, it remains easy 
for me to break everything and re

[ceph-users] Re: Dedicated radosgw gateways

2023-05-15 Thread Michal Strnad


Hi,


thank you for the response. That sounds like a reasonable solution.

Michal

On 5/15/23 14:15, Konstantin Shalygin wrote:

Hi,


On 15 May 2023, at 14:58, Michal Strnad  wrote:

at Cephalocon 2023, it was mentioned several times that for service 
tasks such as data deletion via garbage collection or data replication 
in S3 via zoning, it is good to do them on dedicated radosgw gateways 
and not mix them with gateways used by users. How can this be 
achieved? How can we isolate these tasks? Will using dedicated 
keyrings instead of admin keys be sufficient? How do you operate this 
in your environment?


Just:

# don't put client traffic to "dedicated radosgw gateways"
# disable lc/gc on "gateways used by users" via `rgw_enable_lc_threads = 
false` & `rgw_enable_gc_threads = false`



k


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CEPH Version choice

2023-05-15 Thread Mark Nelson



On 5/15/23 13:03, Daniel Baumann wrote:

On 5/15/23 12:11, Frank Schilder wrote:

Because more often than not it isn't.

Sadly, I have to agree. We basically gave up after luminous, where every
update (on our test-ceph cluster) was a major pain. Until then, we
always updated after one week of a new release.


Any chance you could highlight the major pain points?  I've heard a 
couple of stories (especially related to swap and buffered IO), but it 
would be good to know what the others have been.





To add one more point..

The current Ceph version (17.x) will not be included in the upcoming
Debian 12 release to be released later this summer. This hasn't been a
problem in the past, because we just built our own backports and
everything was fine.

Nowadays, Ceph 17 doesn't even build on Debian unstable/testing because
some libraries (mostly fmtlib and others) are too new (sic!), so.. we'll
be staying with Ceph 16 on Debian 12 until we'll trash the hardware.



This is getting off-topic, but I wanted to make sure you got a quick 
reply because it's an important topic.  Sadly this one isn't our fault, 
isn't limited to Debian, and doesn't only affect Ceph. There are 
actually two separate problems affecting bookworm (and other very 
updated distros):



LibFMT:  Any OS using FMT 9.0+ is going to have issues due to a breaking 
change in the library:


https://www.spinics.net/lists/fedora-devel/msg303183.html

https://github.com/fmtlib/fmt/releases/tag/9.0.0

If you just need to compile ceph itself you can get around this by 
passing the flag to do_cmake.sh:


*./do_cmake.sh -DCMAKE_BUILD_TYPE=RelWithDebInfo 
-DWITH_FMT_HEADER_ONLY:BOOL=ON*


**

*Snappy: This is an even more irritating problem imho. Snappy 1.1.9 
breaks RTTI runtime support and it's also breaking multiple applications 
including Ceph. *


**

*https://bugzilla.redhat.com/show_bug.cgi?id=1980614*

*https://github.com/google/snappy/pull/129*

**

*That second link includes a fix by a user and the response was:*

**

*"The project's CMake configuration reflects the way it's used in Google 
Chrome. This is the only configuration we can maintain ourselves. To be 
clear, this doesn't mean your changes are not valid -- I think you'll be 
able to use Snappy with the build tweaks you posted here just fine. It's 
just that we can't accept this change in the official repository."*


**

*I'm kind of in a ripping-out mood right now, but my inclination is to 
remove snappy support if google can't test it outside of how it's used 
in Chrome.*


**

*Mark *

**


..or in other words: it would be nice if you could more efforts into
upgrade-tests/QA as well as on releasing stuff that actually compiles on
non-RHEL/current Linux distributions.

Regards,
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of R&D (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm

This is why I even tried a full cluster shutdown. All Hosts were out, so 
there's not a possibility that there's any process hanging. After I 
started the nodes, it's just the same as before. All refresh times show 
"4 weeks". Like it stopped simoultanously on all nodes.


Some time ago we had a small change in name resolution so I thought, 
maybe the orchestrator can't connect via ssh anymore. But I tried all 
the steps in 
https://docs.ceph.com/docs/master/cephadm/troubleshooting/#ssh-errors . 
The only thing that's slightly suspicous is that, it said, it added the 
host key to known hosts. But since I tried via "cephadm shell" I guess, 
the known hosts are just not replicated to these containers. ssh works, 
too. (And I would have suspected that I get a warning if that failed)


I don't see any information about the orchestrator module having 
crashed. It's running as always.


From the the prior problem I had some issues in my cephfs pools. So, 
maybe there's something broken in the .mgr pool? Could that be a reason 
for this behaviour? I googled a while but didn't find any way how to 
check that explicitly.


On 15.05.23 19:15, Adam King wrote:

This is sort of similar to what I said in a previous email, but the only
way I've seen this happen in other setups is through hanging cephadm
commands. The debug process has been, do a mgr failover, wait a few
minutes, see in "ceph orch ps" and "ceph orch device ls" which hosts have
and have not been refreshed (the REFRESHED column should be some lower
value on the hosts where it refreshed), go to the hosts where it did not
refresh and check "ps aux | grep cephadm" looking for long running (and
therefore most likely hung) processes. I would still expect that's the most
likely thing you're experiencing here. I haven't seen any other causes for
cephadm to not refresh unless the module crashed, but that would be
explicitly stated in the cluster health.

On Mon, May 15, 2023 at 11:44 AM Thomas Widhalm 
wrote:


Hi,

I tried a lot of different approaches but I didn't have any success so far.

"ceph orch ps" still doesn't get refreshed.

Some examples:

mds.mds01.ceph06.huavsw  ceph06   starting  -
--- 
mds.mds01.ceph06.rrxmks  ceph06   error4w ago
3M-- 
mds.mds01.ceph07.omdisd  ceph07   error4w ago
4M-- 
mds.mds01.ceph07.vvqyma  ceph07   starting  -
--- 
mgr.ceph04.qaexpvceph04  *:8443,9283  running (4w) 4w ago
10M 551M-  17.2.6 9cea3956c04b  33df84e346a0
mgr.ceph05.jcmkbbceph05  *:8443,9283  running (4w) 4w ago
4M 441M-  17.2.6 9cea3956c04b  1ad485df4399
mgr.ceph06.xbduufceph06  *:8443,9283  running (4w) 4w ago
4M 432M-  17.2.6 9cea3956c04b  5ba5fd95dc48
mon.ceph04   ceph04   running (4w) 4w ago
4M 223M2048M  17.2.6 9cea3956c04b  8b6116dd216f
mon.ceph05   ceph05   running (4w) 4w ago
4M 326M2048M  17.2.6 9cea3956c04b  70520d737f29

Debug Log doesn't show anything that could help me, either.

2023-05-15T14:48:40.852088+ mgr.ceph05.jcmkbb (mgr.83897390) 1376 :
cephadm [INF] Schedule start daemon mds.mds01.ceph04.hcmvae
2023-05-15T14:48:43.620700+ mgr.ceph05.jcmkbb (mgr.83897390) 1380 :
cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.hcmvae
2023-05-15T14:48:45.124822+ mgr.ceph05.jcmkbb (mgr.83897390) 1392 :
cephadm [INF] Schedule start daemon mds.mds01.ceph04.krxszj
2023-05-15T14:48:46.493902+ mgr.ceph05.jcmkbb (mgr.83897390) 1394 :
cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.krxszj
2023-05-15T15:05:25.637079+ mgr.ceph05.jcmkbb (mgr.83897390) 2629 :
cephadm [INF] Saving service mds.mds01 spec with placement count:2
2023-05-15T15:07:27.625773+ mgr.ceph05.jcmkbb (mgr.83897390) 2780 :
cephadm [INF] Saving service mds.fs_name spec with placement count:3
2023-05-15T15:07:42.120912+ mgr.ceph05.jcmkbb (mgr.83897390) 2795 :
cephadm [INF] Saving service mds.mds01 spec with placement count:3

I'm seeing all the commands I give but I don't get any more information
on why it's not actually happening.

I tried to change different scheduling mechanisms. Host, Tag, unmanaged
and back again. I turned off orchestration and resumed. I failed mgr. I
even had full cluster stops (in the past). I made sure all daemons run
the same version. (If you remember, upgrade failed underway).

So my only way of getting daemons only is manually. I added two more
hosts, tagged them. But there isn't a single daemon started there.

Could you help me again with how to debug orchestration not working?


On 04.05.23 15:12, Thomas Widhalm wrote:

Thanks.

I set the log level to debug, try a few steps and then come back.

On 04.05.23 14:48, Eugen Block wrote:

Hi,

try setting debug logs for the mgr:

ceph conf

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

As you've already seem to have figured out, "ceph orch device ls" is
populated with the results from "ceph-volume inventory". My best guess to
try and debug this would be to manually run "cephadm ceph-volume --
inventory" (the same as "cephadm ceph-volume inventory", I just like to
separate the ceph-volume command from cephadm itself with the " -- ") and
then check /var/log/ceph//ceph-volume.log from when you ran the
command onward to try and see why it isn't seeing your devices. For example
I can see a line  like

[2023-05-15 19:11:58,048][ceph_volume.main][INFO  ] Running command:
ceph-volume  inventory

in there. Then if I look onward from there I can see it ran things like

lsblk -P -o
NAME,KNAME,PKNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL

as part of getting my device list. So if I was having issues I would try
running that directly and see what I got. Will note that ceph-volume on
certain more recent versions (not sure about octopus) runs commands through
nsenter, so you'd have to look past that part in the log lines to the
underlying command being used, typically something with lsblk, blkid,
udevadm, lvs, or pvs.

Also, if you want to see if it's an issue with a certain version of
ceph-volume, you can use different versions by passing the image flag to
cephadm. E.g.

cephadm --image quay.io/ceph/ceph:v17.2.6 ceph-volume -- inventory

would use the 17.2.6 version of ceph-volume for the inventory. It works by
running ceph-volume through the container, so you don't have to have to
worry about installing different packages to try them and it should pull
the container image on its own if it isn't on the machine already (but note
that means the command will take longer as it pulls the image the first
time).

On Sat, May 13, 2023 at 4:34 AM Patrick Begou <
patrick.be...@univ-grenoble-alpes.fr> wrote:

> Hi Joshua,
>
> I've tried these commands but it looks like CEPH is unable to see and
> configure these HDDs.
> [root@mostha1 ~]# cephadm ceph-volume inventory
>
> Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
> Using recent ceph image
>
> quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544
>
> Device Path   Size Device nodesrotates
> available Model name
>
> [root@mostha1 ~]# cephadm shell
>
> [ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices
>
> Scheduled osd.all-available-devices update...
>
> [ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]#
> ceph-volume lvm zap /dev/sdb
>
> --> Zapping: /dev/sdb
> --> --destroy was not specified, but zapping a whole device will
> remove the partition table
> Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10
> conv=fsync
>   stderr: 10+0 records in
> 10+0 records out
> 10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
> --> Zapping successful for: 
>
> I can check that /dev/sdb1 has been erased, so previous command is
> successful
> [ceph: root@mostha1 ceph]# lsblk
> NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> sda8:01 232.9G  0 disk
> |-sda1 8:11   3.9G  0 part /rootfs/boot
> |-sda2 8:21  78.1G  0 part
> | `-osvg-rootvol 253:00  48.8G  0 lvm  /rootfs
> |-sda3 8:31   3.9G  0 part [SWAP]
> `-sda4 8:41 146.9G  0 part
>|-secretvg-homevol 253:10   9.8G  0 lvm  /rootfs/home
>|-secretvg-tmpvol  253:20   9.8G  0 lvm  /rootfs/tmp
>`-secretvg-varvol  253:30   9.8G  0 lvm  /rootfs/var
> sdb8:16   1 465.8G  0 disk
> sdc8:32   1 232.9G  0 disk
>
> But still no visible HDD:
>
> [ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices
>
> Scheduled osd.all-available-devices update...
>
> [ceph: root@mostha1 ceph]# ceph orch device ls
> [ceph: root@mostha1 ceph]#
>
> May be I have done something bad at install time as in the container
> I've unintentionally run:
>
> dnf -y install
>
> https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm
>
> (an awful copy/paste launching the command). Can this break The
> container ? I do not know what should be available as ceph packages in
> the container to remove properly this install (no dnf.log file in the
> container)
>
> Patrick
>
>
> Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :
> > The most significant point I see there, is you have no OSD service
> > spec to tell orchestrator how to deploy OSDs.  The easiest fix for
> > that would be “cephorchapplyosd--all-available-devices”
> >
> > This will create a simple spec that should work for a test
> > environment.  Most likely it will collocate the block, block.db, and
> > WAL all on the same device.  Not ideal for prod environments, but fine

[ceph-users] Re: Orchestration seems not to work

If it persisted through a full restart, it's possible the conditions that
caused the hang are still present after the fact. The two known causes I'm
aware of are lack of space in the root partition and hanging mount points.
Both would show up as processes in "ps aux | grep cephadm" though. The
latter could possibly be related to cephfs pool issues if you have
something mounted on one of the host hosts. Still hard to say without
knowing what exactly got stuck. For clarity, without restarting or changing
anything else, can you verify  if "ps aux | grep cephadm" shows anything on
the nodes. I know I'm a bit of a broken record on mentioning the hanging
processes stuff, but outside of module crashes which don't appear to be
present here, 100% of other cases of this type of thing happening I've
looked at before have had those processes sitting around.

On Mon, May 15, 2023 at 3:10 PM Thomas Widhalm 
wrote:

> This is why I even tried a full cluster shutdown. All Hosts were out, so
> there's not a possibility that there's any process hanging. After I
> started the nodes, it's just the same as before. All refresh times show
> "4 weeks". Like it stopped simoultanously on all nodes.
>
> Some time ago we had a small change in name resolution so I thought,
> maybe the orchestrator can't connect via ssh anymore. But I tried all
> the steps in
> https://docs.ceph.com/docs/master/cephadm/troubleshooting/#ssh-errors .
> The only thing that's slightly suspicous is that, it said, it added the
> host key to known hosts. But since I tried via "cephadm shell" I guess,
> the known hosts are just not replicated to these containers. ssh works,
> too. (And I would have suspected that I get a warning if that failed)
>
> I don't see any information about the orchestrator module having
> crashed. It's running as always.
>
>  From the the prior problem I had some issues in my cephfs pools. So,
> maybe there's something broken in the .mgr pool? Could that be a reason
> for this behaviour? I googled a while but didn't find any way how to
> check that explicitly.
>
> On 15.05.23 19:15, Adam King wrote:
> > This is sort of similar to what I said in a previous email, but the only
> > way I've seen this happen in other setups is through hanging cephadm
> > commands. The debug process has been, do a mgr failover, wait a few
> > minutes, see in "ceph orch ps" and "ceph orch device ls" which hosts have
> > and have not been refreshed (the REFRESHED column should be some lower
> > value on the hosts where it refreshed), go to the hosts where it did not
> > refresh and check "ps aux | grep cephadm" looking for long running (and
> > therefore most likely hung) processes. I would still expect that's the
> most
> > likely thing you're experiencing here. I haven't seen any other causes
> for
> > cephadm to not refresh unless the module crashed, but that would be
> > explicitly stated in the cluster health.
> >
> > On Mon, May 15, 2023 at 11:44 AM Thomas Widhalm 
> > wrote:
> >
> >> Hi,
> >>
> >> I tried a lot of different approaches but I didn't have any success so
> far.
> >>
> >> "ceph orch ps" still doesn't get refreshed.
> >>
> >> Some examples:
> >>
> >> mds.mds01.ceph06.huavsw  ceph06   starting  -
> >> --- 
> >> mds.mds01.ceph06.rrxmks  ceph06   error4w ago
> >> 3M-- 
> >> mds.mds01.ceph07.omdisd  ceph07   error4w ago
> >> 4M-- 
> >> mds.mds01.ceph07.vvqyma  ceph07   starting  -
> >> --- 
> >> mgr.ceph04.qaexpvceph04  *:8443,9283  running (4w) 4w ago
> >> 10M 551M-  17.2.6 9cea3956c04b  33df84e346a0
> >> mgr.ceph05.jcmkbbceph05  *:8443,9283  running (4w) 4w ago
> >> 4M 441M-  17.2.6 9cea3956c04b  1ad485df4399
> >> mgr.ceph06.xbduufceph06  *:8443,9283  running (4w) 4w ago
> >> 4M 432M-  17.2.6 9cea3956c04b  5ba5fd95dc48
> >> mon.ceph04   ceph04   running (4w) 4w ago
> >> 4M 223M2048M  17.2.6 9cea3956c04b  8b6116dd216f
> >> mon.ceph05   ceph05   running (4w) 4w ago
> >> 4M 326M2048M  17.2.6 9cea3956c04b  70520d737f29
> >>
> >> Debug Log doesn't show anything that could help me, either.
> >>
> >> 2023-05-15T14:48:40.852088+ mgr.ceph05.jcmkbb (mgr.83897390) 1376 :
> >> cephadm [INF] Schedule start daemon mds.mds01.ceph04.hcmvae
> >> 2023-05-15T14:48:43.620700+ mgr.ceph05.jcmkbb (mgr.83897390) 1380 :
> >> cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.hcmvae
> >> 2023-05-15T14:48:45.124822+ mgr.ceph05.jcmkbb (mgr.83897390) 1392 :
> >> cephadm [INF] Schedule start daemon mds.mds01.ceph04.krxszj
> >> 2023-05-15T14:48:46.493902+ mgr.ceph05.jcmkbb (mgr.83897390) 1394 :
> >> cephadm [INF] Schedule redeploy daemon mds.mds01.ceph04.krxszj
> >> 2023-05-15T15:05:25.637079+ mgr.c

[ceph-users] Re: Orchestration seems not to work

2023-05-15 Thread Thomas Widhalm

I just checked every single host. The only processes of cephadm running 
where "cephadm shell" from debugging. I closed all of them, so now I can 
verify, there's not a single cephadm process running on any of my ceph 
hosts. (and since I found the shell processes, I can verify I didn't 
have a typo ;-) )

Regarding broken record: I'm extremly thankful for your support. And I 
should have checked that earlier. We all know that sometimes it's the 
least probable things that go sideways. So checking the things you're 
sure to be ok is always a good idea. Thanks for being adamant about 
that. But now we can be sure, at least.

On 15.05.23 21:27, Adam King wrote:
If it persisted through a full restart, it's possible the conditions 
that caused the hang are still present after the fact. The two known 
causes I'm aware of are lack of space in the root partition and hanging 
mount points. Both would show up as processes in "ps aux | grep cephadm" 
though. The latter could possibly be related to cephfs pool issues if 
you have something mounted on one of the host hosts. Still hard to say 
without knowing what exactly got stuck. For clarity, without restarting 
or changing anything else, can you verify  if "ps aux | grep cephadm" 
shows anything on the nodes. I know I'm a bit of a broken record on 
mentioning the hanging processes stuff, but outside of module crashes 
which don't appear to be present here, 100% of other cases of this type 
of thing happening I've looked at before have had those processes 
sitting around.

On Mon, May 15, 2023 at 3:10 PM Thomas Widhalm > wrote:

This is why I even tried a full cluster shutdown. All Hosts were
out, so
there's not a possibility that there's any process hanging. After I
started the nodes, it's just the same as before. All refresh times show
"4 weeks". Like it stopped simoultanously on all nodes.

Some time ago we had a small change in name resolution so I thought,
maybe the orchestrator can't connect via ssh anymore. But I tried all
the steps in
https://docs.ceph.com/docs/master/cephadm/troubleshooting/#ssh-errors 
 .
The only thing that's slightly suspicous is that, it said, it added the
host key to known hosts. But since I tried via "cephadm shell" I guess,
the known hosts are just not replicated to these containers. ssh works,
too. (And I would have suspected that I get a warning if that failed)

I don't see any information about the orchestrator module having
crashed. It's running as always.

  From the the prior problem I had some issues in my cephfs pools. So,
maybe there's something broken in the .mgr pool? Could that be a reason
for this behaviour? I googled a while but didn't find any way how to
check that explicitly.

On 15.05.23 19:15, Adam King wrote:
 > This is sort of similar to what I said in a previous email, but
the only
 > way I've seen this happen in other setups is through hanging cephadm
 > commands. The debug process has been, do a mgr failover, wait a few
 > minutes, see in "ceph orch ps" and "ceph orch device ls" which
hosts have
 > and have not been refreshed (the REFRESHED column should be some
lower
 > value on the hosts where it refreshed), go to the hosts where it
did not
 > refresh and check "ps aux | grep cephadm" looking for long
running (and
 > therefore most likely hung) processes. I would still expect
that's the most
 > likely thing you're experiencing here. I haven't seen any other
causes for
 > cephadm to not refresh unless the module crashed, but that would be
 > explicitly stated in the cluster health.
 >
 > On Mon, May 15, 2023 at 11:44 AM Thomas Widhalm
mailto:widha...@widhalm.or.at>>
 > wrote:
 >
 >> Hi,
 >>
 >> I tried a lot of different approaches but I didn't have any
success so far.
 >>
 >> "ceph orch ps" still doesn't get refreshed.
 >>
 >> Some examples:
 >>
 >> mds.mds01.ceph06.huavsw  ceph06               starting 
     -

 >> -        -        -         
 >> mds.mds01.ceph06.rrxmks  ceph06               error   
4w ago

 >> 3M        -        -         
 >> mds.mds01.ceph07.omdisd  ceph07               error   
4w ago

 >> 4M        -        -         
 >> mds.mds01.ceph07.vvqyma  ceph07               starting 
     -

 >> -        -        -         
 >> mgr.ceph04.qaexpv        ceph04  *:8443,9283  running (4w)   
  4w ago

 >> 10M     551M        -  17.2.6     9cea3956c04b  33df84e346a0
 >> mgr.ceph05.jcmkbb        ceph05  *:8443,9283  running (4w)   
  4w ago

 >> 4M     441M        -  17.2.6     9cea3956c04b  1ad485df4399
 >> mgr.ceph06.xbduuf        ceph06  *:8443,9283  running (4w)   
  4w ago

[ceph-users] cephadm and remoto package

2023-05-15 Thread Shashi Dahal

Hi,
I followed this documentation:

https://docs.ceph.com/en/pacific/cephadm/adoption/

This is the error I get when trying to enable cephadm.

ceph mgr module enable cephadm

Error ENOENT: module 'cephadm' reports that it cannot run on the active
manager daemon: loading remoto library:No module named 'remoto' (pass
--force to force enablement)

When I import remoto, it imports just fine.


OS is ubuntu 20.04 focal


-- 
Cheers,
Shashi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Orchestration seems not to work