[ceph-users] Re: ceph status not showing correct monitor services

2024-04-03 Thread Eugen Block
I have no idea what you did there ;-) I would remove that config  
though and rather configure the ceph image globally, there have been  
several issues when cephadm tries to launch daemons with different  
ceph versions. Although in your case it looks like they are actually  
the same images according to the digest (and also in the ceph orch ps  
output). But it might cause some trouble anyway, so I'd recommend to  
remove the individual config for mon.a001s016 and only use the global  
config. Can you add these outputs (mask sensitive data)?


ceph config get mon container_image
ceph config get osd container_image
ceph config get mgr mgr/cephadm/container_image_base

Zitat von "Adiga, Anantha" :


Hi Eugen,


Noticed this in the config dump:  Why  only   "mon.a001s016 "  
listed?And this is the one that is not listed in "ceph -s"



  mon  advanced   
auth_allow_insecure_global_id_reclaim  false
  mon  advanced   
auth_expose_insecure_global_id_reclaim false
  mon  advanced   
mon_compact_on_start   true
mon.a001s016   basic container_image  

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586   
*
  mgr  advanced   
mgr/cephadm/container_image_base   docker.io/ceph/daemon
  mgr  advanced   
mgr/cephadm/container_image_node_exporter   
docker.io/prom/node-exporter:v0.17.0



  cluster:
id: 604d56db-2fab-45db-a9ea-c418f9a8cca8
health: HEALTH_OK

  services:
mon: 2 daemons, quorum a001s018,a001s017 (age 45h)
mgr: a001s016.ctmoay(active, since 28h), standbys: a001s017.bpygfm
mds: 1/1 daemons up, 2 standby
osd: 36 osds: 36 up (since 29h), 36 in (since 2y)
rgw: 3 daemons active (3 hosts, 1 zones)

var lib mon unit.image


a001s016:
# cat  
/var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/unit.image

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586

a001s017:
# cat  
/var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s017/unit.image

docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
a001s018:
# cat  
/var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s018/unit.image

docker.io/ceph/daemon:latest-pacific

ceph image tag, digest from docker inspect of:  ceph/daemon   
latest-pacific   6e73176320aa   2 years ago 1.27GB

==
a001s016:
"Id":  
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",

"RepoTags": [
"ceph/daemon:latest-pacific"
"RepoDigests": [
 
"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"


a001s017:
"Id":  
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",

"RepoTags": [
"ceph/daemon:latest-pacific"
"RepoDigests": [
 
"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"


a001s018:
"Id":  
"sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",

"RepoTags": [
"ceph/daemon:latest-pacific"
"RepoDigests": [
 
"ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"


-Original Message-
From: Adiga, Anantha
Sent: Tuesday, April 2, 2024 10:42 AM
To: Eugen Block 
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Re: ceph status not showing correct  
monitor services


Hi Eugen,

Currently there are only three nodes, but I can add  a node to the  
cluster and check it out. I will take a look at the mon logs



Thank you,
Anantha

-Original Message-
From: Eugen Block 
Sent: Tuesday, April 2, 2024 12:19 AM
To: Adiga, Anantha 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph status not showing correct  
monitor services


You can add a mon manually to the monmap, but that requires a  
downtime of the mons. Here's an example [1] how to modify the monmap  
(including network change which you don't need, of course). But that  
would be my last resort, first I would try to find out why the MON  
fails to join the quorum. What is that mon.a001s016 logging, and  
what are the other two logging?
Do you have another host where you could place a mon daemon to see  
if that works?



[1]
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#example-procedure

Zitat von "Adiga, Anantha" :


# ceph mon stat
e6: 2 mons at
{a001s017=[v2:10.45.128.27:3300/0,v1:10.45.128.27:6789/0],a001s018=[v2
:10.45.128.28:3300/0,v1:10.45.128.28:6789/0]}, election epoch 162,
leader 0 a001s018, quorum 0,1
a001s

[ceph-users] Re: put bucket notification configuration - access denied

2024-04-03 Thread Yuval Lifshitz
Hi GM,
sorry for the late reply. anmyway, you are right.
in "quincy" (v17) only the owner of the bucket was allowed to set a
notification on the bucket.
in "reef" (v18) we fixed that, so that we follow the permissions set on the
bucket.
you can use the "s3PutBucketNotification" policy on the bucket to give
permissions to other users to set notifications on the bucket.

Yuval

On Tue, Mar 26, 2024 at 4:14 AM Giada Malatesta <
giada.malate...@cnaf.infn.it> wrote:

> Hello everyone,
>
> we are facing a problem regarding the s3 operation put bucket
> notification configuration.
>
> We are using Ceph version 17.2.6. We are trying to configure buckets in
> our cluster so that  a notification message is sent via amqps protocol
> when the content of the bucket change. To do so, we created a local rgw
> user with "special" capabilities and we wrote ad hoc policies for this
> user (list of all buckets, read access to all buckets and possibility to
> add, list and delete bucket configurations).
>
> The problems regards the configurations of all buckets except the one he
> owns, when doing this put bucket notification configuration
> cross-account operation  we get an access denied error.
>
> I have the suspect that this problem is related to the version we are
> using, because when we were doing tests on another cluster we were using
> version 18.2.1 and we did not face this problem. Can you confirm my
> hypothesis?
>
> Thanks,
>
> GM.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph and raid 1 replication

2024-04-03 Thread Roberto Maggi @ Debian

Hi every one,

I'm new to ceph and I'm still studying it.

In my company we decided to test ceph for possible further implementations.

Although I  undestood its capabilities I'm still doubtful about how to 
setup replication.


Once implemented in production I can accept a little lacking of 
performance in
favor of stability and night sleep, hence, if in testing area I can 
introduce ceph as network storage,
I'd like to replicate some osds' drives as I'd do with raid 1 once in 
production.


The goal would be hosting data for kubernetes storage classes.

so questions are:

1) what do you think about this kind of solution

2) How can I setup full replication between osds?


thanks in advance


Rob
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph and raid 1 replication

2024-04-03 Thread Janne Johansson
> Hi every one,
> I'm new to ceph and I'm still studying it.
> In my company we decided to test ceph for possible further implementations.
>
> Although I  undestood its capabilities I'm still doubtful about how to
> setup replication.

Default settings in ceph will give you replication = 3, which is like
RAID-1 but with three drives having the same data.
It is just not made on per-disk basis, but all stored data will have
two extra copies on other drives (on separate hosts)

> Once implemented in production I can accept a little lacking of
> performance in
> favor of stability and night sleep, hence, if in testing area I can
> introduce ceph as network storage,
> I'd like to replicate some osds' drives as I'd do with raid 1 once in
> production.

As with zfs (and btrfs and other storage solutions) you are most often
best served by handing over all drives raw as they are, and let the
storage system handle the redundancy on a higher level and not build
raid-1s and hand those over to the storage.

> The goal would be hosting data for kubernetes storage classes.
> so questions are:
>
> 1) what do you think about this kind of solution

Bad idea

> 2) How can I setup full replication between osds?

Not really needed. Go with the defaults, allow ceph to place 3 copies
of each piece of data spread out on three or more separate OSD hosts
and move on to the interesting parts of actually using the storage
instead of trying to "fix" something which isn't broken by default.

Ceph will not make full copies of whole OSDs, rather pools will be
made up of many PGs and each PG will be replicated as needed to give
you three copies of each, just not to the same OSDs.
It will also auto-repair to other drives and hosts and auto-balance
data, which a raid-1 set would not do unless you have unused hot
spares waiting for disasters.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph and raid 1 replication

2024-04-03 Thread Roberto Maggi @ Debian

Thanks for considerations.

On 4/3/24 13:08, Janne Johansson wrote:

Hi every one,
I'm new to ceph and I'm still studying it.
In my company we decided to test ceph for possible further implementations.

Although I  undestood its capabilities I'm still doubtful about how to
setup replication.

Default settings in ceph will give you replication = 3, which is like
RAID-1 but with three drives having the same data.
It is just not made on per-disk basis, but all stored data will have
two extra copies on other drives (on separate hosts)


Once implemented in production I can accept a little lacking of
performance in
favor of stability and night sleep, hence, if in testing area I can
introduce ceph as network storage,
I'd like to replicate some osds' drives as I'd do with raid 1 once in
production.

As with zfs (and btrfs and other storage solutions) you are most often
best served by handing over all drives raw as they are, and let the
storage system handle the redundancy on a higher level and not build
raid-1s and hand those over to the storage.


The goal would be hosting data for kubernetes storage classes.
so questions are:

1) what do you think about this kind of solution

Bad idea


2) How can I setup full replication between osds?

Not really needed. Go with the defaults, allow ceph to place 3 copies
of each piece of data spread out on three or more separate OSD hosts
and move on to the interesting parts of actually using the storage
instead of trying to "fix" something which isn't broken by default.

Ceph will not make full copies of whole OSDs, rather pools will be
made up of many PGs and each PG will be replicated as needed to give
you three copies of each, just not to the same OSDs.
It will also auto-repair to other drives and hosts and auto-balance
data, which a raid-1 set would not do unless you have unused hot
spares waiting for disasters.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph status not showing correct monitor services

2024-04-03 Thread Adiga, Anantha

removed the config setting for mon. a001s016. 
 
Here it is 
# ceph config get mon container_image
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
# ceph config get osd container_image
docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
# ceph config get mgr mgr/cephadm/container_image_base
docker.io/ceph/daemon

Thank you,
Anantha

-Original Message-
From: Eugen Block  
Sent: Wednesday, April 3, 2024 12:27 AM
To: Adiga, Anantha 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph status not showing correct monitor services

I have no idea what you did there ;-) I would remove that config though and 
rather configure the ceph image globally, there have been several issues when 
cephadm tries to launch daemons with different ceph versions. Although in your 
case it looks like they are actually the same images according to the digest 
(and also in the ceph orch ps output). But it might cause some trouble anyway, 
so I'd recommend to remove the individual config for mon.a001s016 and only use 
the global config. Can you add these outputs (mask sensitive data)?

ceph config get mon container_image
ceph config get osd container_image
ceph config get mgr mgr/cephadm/container_image_base

Zitat von "Adiga, Anantha" :

> Hi Eugen,
>
>
> Noticed this in the config dump:  Why  only   "mon.a001s016 "  
> listed?And this is the one that is not listed in "ceph -s"
>
>
>   mon  advanced   
> auth_allow_insecure_global_id_reclaim  false
>   mon  advanced   
> auth_expose_insecure_global_id_reclaim false
>   mon  advanced   
> mon_compact_on_start   true
> mon.a001s016   basic container_image  
> 
> docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586
>
> *
>   mgr  advanced   
> mgr/cephadm/container_image_base   docker.io/ceph/daemon
>   mgr  advanced   
> mgr/cephadm/container_image_node_exporter   
> docker.io/prom/node-exporter:v0.17.0
>
>
>   cluster:
> id: 604d56db-2fab-45db-a9ea-c418f9a8cca8
> health: HEALTH_OK
>
>   services:
> mon: 2 daemons, quorum a001s018,a001s017 (age 45h)
> mgr: a001s016.ctmoay(active, since 28h), standbys: a001s017.bpygfm
> mds: 1/1 daemons up, 2 standby
> osd: 36 osds: 36 up (since 29h), 36 in (since 2y)
> rgw: 3 daemons active (3 hosts, 1 zones)
>
> var lib mon unit.image
> 
>
> a001s016:
> # cat
> /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s016/unit.i
> mage
> docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a
> 2cb7ff7668f776b61b9d586
>
> a001s017:
> # cat
> /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s017/unit.i
> mage
> docker.io/ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a
> 2cb7ff7668f776b61b9d586
> a001s018:
> # cat
> /var/lib/ceph/604d56db-2fab-45db-a9ea-c418f9a8cca8/mon.a001s018/unit.i
> mage
> docker.io/ceph/daemon:latest-pacific
>
> ceph image tag, digest from docker inspect of:  ceph/daemon   
> latest-pacific   6e73176320aa   2 years ago 1.27GB
> ==
> 
> a001s016:
> "Id":  
> "sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
> "RepoTags": [
> "ceph/daemon:latest-pacific"
> "RepoDigests": [
>  
> "ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"
>
> a001s017:
> "Id":  
> "sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
> "RepoTags": [
> "ceph/daemon:latest-pacific"
> "RepoDigests": [
>  
> "ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"
>
> a001s018:
> "Id":  
> "sha256:6e73176320aaccf3b3fb660b9945d0514222bd7a83e28b96e8440c630ba6891f",
> "RepoTags": [
> "ceph/daemon:latest-pacific"
> "RepoDigests": [
>  
> "ceph/daemon@sha256:261bbe628f4b438f5bf10de5a8ee05282f2697a5a2cb7ff7668f776b61b9d586"
>
> -Original Message-
> From: Adiga, Anantha
> Sent: Tuesday, April 2, 2024 10:42 AM
> To: Eugen Block 
> Cc: ceph-users@ceph.io
> Subject: RE: [ceph-users] Re: ceph status not showing correct monitor 
> services
>
> Hi Eugen,
>
> Currently there are only three nodes, but I can add  a node to the 
> cluster and check it out. I will take a look at the mon logs
>
>
> Thank you,
> Anantha
>
> -Original Message-
> From: Eugen Block 
> Sent: Tuesday, April 2, 2024 12:19 AM
> To: Adiga, Anantha 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: ceph status not showin

[ceph-users] IO500 CFS ISC 2024

2024-04-03 Thread IO500 Committee

Call for Submission

Stabilization Period: Monday, April 1st - Friday, April 15th, 2024
Submission Deadline: Tuesday, May 3rd, 2024 AoE

The IO500 is now accepting and encouraging submissions for the upcoming 
14th semi-annual IO500 Production and Research lists, in conjunction 
with ISC24. Once again, we are also accepting submissions to both the 
Production and Research 10 Client Node Challenges to encourage the 
submission of small scale results. View the requirements for submitting 
to each list on the IO500 Webpage. The new ranked lists will be 
announced at the BoF [1]. We hope to see many new results.


Background
Following the success of the Top500 in collecting and analyzing 
historical trends in supercomputer technology and evolution, the IO500 
was created in 2017, published its first list at SC17, and has grown 
continually since then.  The benchmarks represent community accepted 
standards, including being used in Request for Proposals for new HPC 
platforms. The benchmarks showcase the IO access pattern extremes giving 
a full picture of storage system potential performance. The list is 
about much more than just the raw rank; all submissions help the 
community by collecting and publishing a wider corpus of data.


The multi-fold goals of the benchmark suite are as follows:
- Represent naive and optimized access patterns for the execution of a 
rich variety of HPC  applications, their achievable performance and the 
documentation of how the numbers are achieved.
- Support small to extreme-scale Research and Production HPC systems 
using flexible storage APIs
- Maximizing simplicity in running the benchmark suite while offering 
tunable parameters


Specifically, the benchmark suite includes a hero-run of both IOR and 
mdtest configured however possible to maximize performance and establish 
an upper-bound for performance. It also includes runs with highly 
prescribed parameters in an attempt to determine a lower performance 
bound. Finally, it includes a namespace search as this has been 
determined to be a highly sought-after feature in HPC storage systems 
that has historically not been well-measured. Supported Storage APIs are 
those that are part of IOR and mdtest. Extending these tools with a 
public pull request can be done to enable new storage APIs.


The goals of the community are also multi-fold:
1. Gather historical data for the sake of analysis and to aid 
predictions of storage futures
2. Collect tuning information to share valuable performance 
optimizations across the community
3. Encourage vendors and designers to optimize for workloads beyond 
"hero runs"
4. Establish bounded expectations for users, procurers, and 
administrators

5. Understand and be able to reproduce performance on storage systems

Using the IO500 Reproducibility guidelines, each submission is labeled 
according to the breadth of details provided and the access to the 
deployed storage software that enables the community to reproduce the 
results and study system design changes over time.


The IO500 follows a two-staged approach. First, there will be a two-week 
stabilization period during which we encourage the community to verify 
that the benchmark runs properly on a variety of storage systems. During 
this period the benchmark may be updated based upon feedback from the 
community. The final benchmark will then be released. We expect that 
runs compliant with the rules made during the stabilization period will 
be valid as a final submission unless a significant defect is found.


10 Client Node I/O Challenge
The 10 Client Node Challenge is conducted using the regular IO500 
benchmark, however, with the rule that exactly 10 client nodes must be 
used to run the benchmark. You may use any shared storage with any 
number of servers. We will announce the results in the Production and 
Research lists as well as in separate derived lists.


Birds-of-a-Feather
We encourage you to submit [2] to join our community, and to attend the 
ISC’24 BoF [1] on Tuesday, May 12, 2024 at 10:05am - 11:05am CEST, where 
we will announce the new IO500 Production and Research lists and their 
10 Client Node counterparts.


Be Part of the Community
Submissions of all sizes are welcome; the webpage has customizable 
sorting, so it is possible to submit on a small system and still get a 
very good per-client score, for example. We will also highlight new and 
interesting results with invited talk(s) at the BoF.


[1] https://io500.org/pages/bof-isc24
[2] https://io500.org/submission
[3] https://io500.org/rules-submission
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Lorenz Bausch

Hi everybody,

we upgraded our containerized Red Hat Pacific cluster to the latest 
Quincy release (Community Edition).
The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run 
the upgraded version:


 %< 
$ ceph -s
  cluster:
id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
health: HEALTH_OK

  services:
mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
mgr: node03.ztlair(active, since 25h), standbys: node01.koymku, 
node04.uvxgvp, node02.znqnhg, node05.iifmpc

osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
rgw: 19 daemons active (19 hosts, 1 zones)

  data:
pools:   11 pools, 8481 pgs
objects: 236.99M objects, 544 TiB
usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
pgs: 8385 active+clean
 79   active+clean+scrubbing+deep
 17   active+clean+scrubbing

  io:
client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr

---

$ ceph versions | jq .overall
{
  "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 437

}
 >% 

After all the daemons were upgraded we started noticing some RGW buckets 
which are inaccessible.

s3cmd failed with NoSuchKey:

 %< 
$ s3cmd la -l
ERROR: S3 error: 404 (NoSuchKey)
 >% 

The buckets still exists according to "radosgw-admin bucket list".
Out of the ~600 buckets, 13 buckets are unaccessible at the moment:

 %< 
$ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int 
RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*, 
RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const 
string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*, 
rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter): 
CLSRGWIssueBucketList for 
xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int 
RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*, 
RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const 
string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*, 
rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter): 
CLSRGWIssueBucketList for 
xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed

 >% 

The affected buckets are comparatively large, around 4 - 7 TB,
but not all buckets of that size are affected.

Using "rados -p rgw.buckets.data ls" it seems like all the objects are 
still there,
although "rados -p rgw.buckets.data get objectname -" only prints 
unusable (?) binary data,

even for objects of intact buckets.

Overall we're facing around 60 TB of customer data which are just gone 
at the moment.
Is there a way to recover from this situation or further narrowing down 
the root cause of the problem?


Kind regards,
Lorenz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
>
> Hi everybody,
>
> we upgraded our containerized Red Hat Pacific cluster to the latest
> Quincy release (Community Edition).

i'm afraid this is not an upgrade path that we try to test or support.
Red Hat makes its own decisions about what to backport into its
releases. my understanding is that Red Hat's pacific-based 5.3 release
includes all of the rgw multisite resharding changes which were not
introduced upstream until the Reef release. this includes changes to
data formats that an upstream Quincy release would not understand. in
this case, you might have more luck upgrading to Reef?

> The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> the upgraded version:
>
>  %< 
> $ ceph -s
>cluster:
>  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
>  health: HEALTH_OK
>
>services:
>  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
>  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> node04.uvxgvp, node02.znqnhg, node05.iifmpc
>  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
>  rgw: 19 daemons active (19 hosts, 1 zones)
>
>data:
>  pools:   11 pools, 8481 pgs
>  objects: 236.99M objects, 544 TiB
>  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
>  pgs: 8385 active+clean
>   79   active+clean+scrubbing+deep
>   17   active+clean+scrubbing
>
>io:
>  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
>
> ---
>
> $ ceph versions | jq .overall
> {
>"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)": 437
> }
>  >% 
>
> After all the daemons were upgraded we started noticing some RGW buckets
> which are inaccessible.
> s3cmd failed with NoSuchKey:
>
>  %< 
> $ s3cmd la -l
> ERROR: S3 error: 404 (NoSuchKey)
>  >% 
>
> The buckets still exists according to "radosgw-admin bucket list".
> Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
>
>  %< 
> $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
>  >% 
>
> The affected buckets are comparatively large, around 4 - 7 TB,
> but not all buckets of that size are affected.
>
> Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> still there,
> although "rados -p rgw.buckets.data get objectname -" only prints
> unusable (?) binary data,
> even for objects of intact buckets.
>
> Overall we're facing around 60 TB of customer data which are just gone
> at the moment.
> Is there a way to recover from this situation or further narrowing down
> the root cause of the problem?
>
> Kind regards,
> Lorenz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs creation error

2024-04-03 Thread elite_stu
I have the same issue, can someone help me, thanks in advance!
bash-4.4$ ceph fs new kingcephfs cephfs-king-metadata cephfs-king-data
new fs with metadata pool 7 and data pool 8
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds

  services:
mon: 3 daemons, quorum a,b,c (age 2d)
mgr: a(active, since 2d), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 2d), 3 in (since 2d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 2/2 healthy
pools:   14 pools, 233 pgs
objects: 592 objects, 450 MiB
usage:   1.5 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

  io:
client:   921 B/s rd, 1 op/s rd, 0 op/s wr






bash-4.4$ ceph fs status
myfs - 0 clients

RANK  STATEMDS   ACTIVITY DNSINOS   DIRS   CAPS
 0active  myfs-a  Reqs:0 /s10 13 12  0
0-s   standby-replay  myfs-b  Evts:0 /s 0  3  2  0
  POOL TYPE USED  AVAIL
 myfs-metadata   metadata   180k  65.9G
myfs-replicateddata   0   65.9G
kingcephfs - 0 clients
==
POOLTYPE USED  AVAIL
cephfs-king-metadata  metadata 0   65.9G
  cephfs-king-data  data   0   65.9G
MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) 
quincy (stable)
bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$ ceph mds stat
myfs:1 kingcephfs:0 {myfs:0=myfs-a=up:active} 1 up:standby-replay
bash-4.4$
bash-4.4$ ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-replicated ]
name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: 
[cephfs-king-data ]
bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD: failed decoding part header ERRORS

2024-04-03 Thread Mark Selby
We have a ceph cluster of only nvme drives.

 

Very recently our overall OSD write latency increase pretty dramatically and 
our overall thoughput has really decreased.

 

One thing that seems to correlate with the start of this problem are the below 
ERROR line from the logs. All our OSD nodes are creating these log lines now.

 

Can anyone tell me what this might be telling us? All and any help is greatly 
appreciated.

 

Mar 31 23:21:56 ceph1d03 
ceph-8797e570-96be-11ed-b022-506b4b7d76e1-osd-46[12898]: debug 
2024-04-01T03:21:56.953+ 7effbba51700  0  
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10/rpm/el8/BUILD/ceph-16.2.10/src/cls/fifo/cls_fifo.cc:112:
 ERROR: int 
rados::cls::fifo::{anonymous}::read_part_header(cls_method_context_t, 
rados::cls::fifo::part_header*): failed decoding part header

 

-- 

Mark Selby

Sr Linux Administrator, The Voleon Group

mse...@voleon.com 

 

 This email is subject to important conditions and disclosures that are listed 
on this web page: https://voleon.com/disclaimer/.

 



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Issue about execute "ceph fs new"

2024-04-03 Thread elite_stu
Everything goes fine except execute "ceph fs new kingcephfs 
cephfs-king-metadata cephfs-king-data", its shows 1 filesystem is offline  1 
filesystem is online with fewer MDS than max_mds. 
But i see there is one mds services running, please help me to fix the issue, 
thanks a lot. 

bash-4.4$
bash-4.4$ ceph fs new kingcephfs cephfs-king-metadata cephfs-king-data
new fs with metadata pool 7 and data pool 8
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds

  services:
mon: 3 daemons, quorum a,b,c (age 2d)
mgr: a(active, since 2d), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 2d), 3 in (since 2d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 2/2 healthy
pools:   14 pools, 233 pgs
objects: 592 objects, 450 MiB
usage:   1.5 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

  io:
client:   921 B/s rd, 1 op/s rd, 0 op/s wr

bash-4.4$ 


bash-4.4$ ceph fs status
myfs - 0 clients

RANK  STATEMDS   ACTIVITY DNSINOS   DIRS   CAPS
 0active  myfs-a  Reqs:0 /s10 13 12  0
0-s   standby-replay  myfs-b  Evts:0 /s 0  3  2  0
  POOL TYPE USED  AVAIL
 myfs-metadata   metadata   180k  65.9G
myfs-replicateddata   0   65.9G
kingcephfs - 0 clients
==
POOLTYPE USED  AVAIL
cephfs-king-metadata  metadata 0   65.9G
  cephfs-king-data  data   0   65.9G
MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) 
quincy (stable)
bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$ ceph mds stat
myfs:1 kingcephfs:0 {myfs:0=myfs-a=up:active} 1 up:standby-replay
bash-4.4$
bash-4.4$ ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-replicated ]
name: kingcephfs, metadata pool: cephfs-king-metadata, data pools: 
[cephfs-king-data ]
bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multi-MDS

2024-04-03 Thread quag...@bol.com.br
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Wesley Dillingham
I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
supports the RGW index pool causes crippling slow ops. If the OSD is marked
with primary-affinity of 0 prior to the OSD restart no slow ops are
observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
ops only occur during the recovery period of the OMAP data and further only
occur when client activity is allowed to pass to the cluster. Luckily I am
able to test this during periods when I can disable all client activity at
the upstream proxy.

Given the behavior of the primary affinity changes preventing the slow ops
I think this may be a case of recovery being more detrimental than
backfill. I am thinking that causing an pg_temp acting set by forcing
backfill may be the right method to mitigate the issue. [1]

I believe that reducing the PG log entries for these OSDs would accomplish
that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
also accomplish something similar. Not sure the appropriate tuning for that
config at this point or if there may be a better approach. Seeking any
input here.

Further if this issue sounds familiar or sounds like another condition
within the OSD may be at hand I would be interested in hearing your input
or thoughts. Thanks!

[1] https://docs.ceph.com/en/latest/dev/peering/#concepts
[2]
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
We've had success using osd_async_recovery_min_cost=0 to drastically
reduce slow ops during index recovery.

Josh

On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
wrote:
>
> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> supports the RGW index pool causes crippling slow ops. If the OSD is marked
> with primary-affinity of 0 prior to the OSD restart no slow ops are
> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> ops only occur during the recovery period of the OMAP data and further only
> occur when client activity is allowed to pass to the cluster. Luckily I am
> able to test this during periods when I can disable all client activity at
> the upstream proxy.
>
> Given the behavior of the primary affinity changes preventing the slow ops
> I think this may be a case of recovery being more detrimental than
> backfill. I am thinking that causing an pg_temp acting set by forcing
> backfill may be the right method to mitigate the issue. [1]
>
> I believe that reducing the PG log entries for these OSDs would accomplish
> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
> also accomplish something similar. Not sure the appropriate tuning for that
> config at this point or if there may be a better approach. Seeking any
> input here.
>
> Further if this issue sounds familiar or sounds like another condition
> within the OSD may be at hand I would be interested in hearing your input
> or thoughts. Thanks!
>
> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> [2]
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>
> Respectfully,
>
> *Wes Dillingham*
> LinkedIn 
> w...@wesdillingham.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
to expand on this diagnosis: with multisite resharding, we changed how
buckets name/locate their bucket index shard objects. any buckets that
were resharded under this Red Hat pacific release would be using the
new object names. after upgrading to the Quincy release, rgw would
look at the wrong object names when trying to list those buckets. 404
NoSuchKey is the response i would expect in that case

On Wed, Apr 3, 2024 at 12:20 PM Casey Bodley  wrote:
>
> On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
> >
> > Hi everybody,
> >
> > we upgraded our containerized Red Hat Pacific cluster to the latest
> > Quincy release (Community Edition).
>
> i'm afraid this is not an upgrade path that we try to test or support.
> Red Hat makes its own decisions about what to backport into its
> releases. my understanding is that Red Hat's pacific-based 5.3 release
> includes all of the rgw multisite resharding changes which were not
> introduced upstream until the Reef release. this includes changes to
> data formats that an upstream Quincy release would not understand. in
> this case, you might have more luck upgrading to Reef?
>
> > The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> > the upgraded version:
> >
> >  %< 
> > $ ceph -s
> >cluster:
> >  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
> >  health: HEALTH_OK
> >
> >services:
> >  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
> >  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> > node04.uvxgvp, node02.znqnhg, node05.iifmpc
> >  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
> >  rgw: 19 daemons active (19 hosts, 1 zones)
> >
> >data:
> >  pools:   11 pools, 8481 pgs
> >  objects: 236.99M objects, 544 TiB
> >  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
> >  pgs: 8385 active+clean
> >   79   active+clean+scrubbing+deep
> >   17   active+clean+scrubbing
> >
> >io:
> >  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
> >
> > ---
> >
> > $ ceph versions | jq .overall
> > {
> >"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> > (stable)": 437
> > }
> >  >% 
> >
> > After all the daemons were upgraded we started noticing some RGW buckets
> > which are inaccessible.
> > s3cmd failed with NoSuchKey:
> >
> >  %< 
> > $ s3cmd la -l
> > ERROR: S3 error: 404 (NoSuchKey)
> >  >% 
> >
> > The buckets still exists according to "radosgw-admin bucket list".
> > Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
> >
> >  %< 
> > $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> > 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> > 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> >  >% 
> >
> > The affected buckets are comparatively large, around 4 - 7 TB,
> > but not all buckets of that size are affected.
> >
> > Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> > still there,
> > although "rados -p rgw.buckets.data get objectname -" only prints
> > unusable (?) binary data,
> > even for objects of intact buckets.
> >
> > Overall we're facing around 60 TB of customer data which are just gone
> > at the moment.
> > Is there a way to recover from this situation or further narrowing down
> > the root cause of the problem?
> >
> > Kind regards,
> > Lorenz
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue about execute "ceph fs new"

2024-04-03 Thread Eugen Block

Hi,

you need to deploy more daemons because your current active MDS is  
responsible for the already existing CephFS. There are several ways to  
do this, I like the yaml file approach and increase the number of MDS  
daemons, just as an example from a test cluster with one CephFS I  
added the line "count_per_host: 2" to have two more daemons (one  
active, one standby for the new FS):


cat mds.yml
service_type: mds
service_id: cephfs
placement:
  hosts:
  - host5
  - host6
  count_per_host: 2

Then apply:

ceph orch apply -i mds.yaml

As soon as there are more daemons up, your second FS should become active.

Regards
Eugen


Zitat von elite_...@163.com:

Everything goes fine except execute "ceph fs new kingcephfs  
cephfs-king-metadata cephfs-king-data", its shows 1 filesystem is  
offline  1 filesystem is online with fewer MDS than max_mds.
But i see there is one mds services running, please help me to fix  
the issue, thanks a lot.


bash-4.4$
bash-4.4$ ceph fs new kingcephfs cephfs-king-metadata cephfs-king-data
new fs with metadata pool 7 and data pool 8
bash-4.4$
bash-4.4$ ceph -s
  cluster:
id: de9af3fe-d3b1-4a4b-bf61-929a990295f6
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds

  services:
mon: 3 daemons, quorum a,b,c (age 2d)
mgr: a(active, since 2d), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 2d), 3 in (since 2d)
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 2/2 healthy
pools:   14 pools, 233 pgs
objects: 592 objects, 450 MiB
usage:   1.5 GiB used, 208 GiB / 210 GiB avail
pgs: 233 active+clean

  io:
client:   921 B/s rd, 1 op/s rd, 0 op/s wr

bash-4.4$


bash-4.4$ ceph fs status
myfs - 0 clients

RANK  STATEMDS   ACTIVITY DNSINOS   DIRS   CAPS
 0active  myfs-a  Reqs:0 /s10 13 12  0
0-s   standby-replay  myfs-b  Evts:0 /s 0  3  2  0
  POOL TYPE USED  AVAIL
 myfs-metadata   metadata   180k  65.9G
myfs-replicateddata   0   65.9G
kingcephfs - 0 clients
==
POOLTYPE USED  AVAIL
cephfs-king-metadata  metadata 0   65.9G
  cephfs-king-data  data   0   65.9G
MDS version: ceph version 17.2.5  
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)

bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$
bash-4.4$ ceph mds stat
myfs:1 kingcephfs:0 {myfs:0=myfs-a=up:active} 1 up:standby-replay
bash-4.4$
bash-4.4$ ceph fs ls
name: myfs, metadata pool: myfs-metadata, data pools: [myfs-replicated ]
name: kingcephfs, metadata pool: cephfs-king-metadata, data pools:  
[cephfs-king-data ]

bash-4.4$
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Lorenz Bausch

Hi Casey,

thank you so much for analysis! We tested the upgraded intensively, but 
the buckets in our test environment were probably too small to get 
dynamically resharded.



after upgrading to the Quincy release, rgw would
look at the wrong object names when trying to list those buckets.
As we're currently running Quincy, do you think objects/bucket indexes 
might already be altered in a way which makes them also unusable for 
Reef?


Kind regards,
Lorenz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestrator for osds

2024-04-03 Thread Eugen Block

Hi,

how many OSDs do you have in total? Can you share your osd tree, please?

You could check the unit.meta file on each OSD host to see which  
service it refers to and simply change it according to the service you  
intend to keep:


host1:~ # grep -r service_name  
/var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd*
/var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.14/unit.meta:
 "service_name": "osd.osd-hdd-ssd-mix",
/var/lib/ceph/543967bc-e586-32b8-bd2c-2d8b8b168f02/osd.15/unit.meta:
 "service_name": "osd.osd-hdd-ssd-mix",


host1:~# ceph orch ls osd
NAME PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd.osd-hdd-ssd-mix  16  3m ago 2w
host1;host2;host3;host4;host5;host6;host7;host8


After you restart the OSDs they should show up correctly in the orch  
ls output, then you should be able to remove the unused specs. I'm  
just not sure if that works the same way in Octopus, it's already EOL  
and I don't have an Octopus cluster at hand to verify. :-)


Regards,
Eugen

Zitat von Jeffrey Turmelle :


Running on Octopus:

While attempting to install a bunch of new OSDs on multiple hosts, I  
ran some ceph orchestrator commands to install them, such as


ceph orch apply osd --all-available-devices
ceph orch apply osd -I HDD_drive_group.yaml

I assumed these were just helper processes, and they would be  
short-lived.  In fact, they didn’t actually work and I ended up  
installing each drive by hand like this:

ceph orch daemon add osd ceph4.iri.columbia.edu:/dev/sdag

However, now I have these services running:
# ceph orch ls --service-type=osd
NAME   RUNNING  REFRESHED  AGE  PLACEMENT 
   IMAGE NAME   IMAGE ID
osd.HDD_drive_group2/2  7m ago 6w
ceph[456].iri.columbia.edu  docker.io/ceph/ceph:v15  2cf504fded39
osd.None  54/0  7m ago -   
   docker.io/ceph/ceph:v15  2cf504fded39
osd.all-available-devices  1/0  7m ago -   
   docker.io/ceph/ceph:v15  2cf504fded39


I’m certain none of these actually created any of my running OSD  
daemons, but I’m not sure if it’s ok to remove them.


For example:
ceph orch daemon rm osd.all-available-devices
ceph orch daemon rm osd.HDD_drive_group
ceph orch daemon rm osd.None

Does anyone have any insight to this?  I can just leave them there,  
they don’t seem to be doing anything, but on the other hand, I don’t  
want any new devices to be automatically loaded or any other  
unintended consequences of these.


Thanks for any guidance,


Jeff Turmelle
International Research Institute for Climate & Society  

The Climate School  at Columbia  
University 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy-> reef upgrade non-cephadm

2024-04-03 Thread Eugen Block

Hi,

1. I see no systemd units with the fsid in them, as described in the  
document above. Both before and after the upgrade, my mon and other  
units are:

ceph-mon@.serviceceph-osd@[N].service
etc
Should I be concerned?


I think this is expected because it's not containerized, no reason to  
be concerned.


2. Does order matter? Based on past upgrades, I do not think so, but  
I wanted to be sure. For example, can I update:
mon/mds/radosgw/mgrs first, then afterwards update the osds? This is  
what i have done in previous updates and and all was well.


In a cephadm managed cluster the MGRs update themselves first, then  
continue with MONs, OSDs, MDS and all other services. In non-cephadm  
clusters the recommended order is MON/MGR (often or usually  
colocated), OSD and then the rest. We also never had issues upgrading  
several clusters over the years where for example MON servers were  
also OSD servers, it always worked out quite well for us. But I'd  
still recommend to try to keep the recommended order.



4. After upgrade of my mgr node I get:
"Module [several module names] has missing NOTIFY_TYPES member"
in ceph-mgr..log


I see this as well but haven't taken the time to look deeper into it  
yet, tbh, it doesn't seem to cause any harm.


Regards,
Eugen

Zitat von Christopher Durham :


Hi,
I am upgrading my test cluster from 17.2.6 (quincy) to 18.2.2 (reef).
As it was an rpm install, i am following the directions here:
Reef — Ceph Documentation

|
|
|  |
Reef — Ceph Documentation


 |

 |

 |


The upgrade worked, but I have some observations and questions  
before I move to my production cluster:


1. I see no systemd units with the fsid in them, as described in the  
document above. Both before and after the upgrade, my mon and other  
units are:

ceph-mon@.serviceceph-osd@[N].service
etc
Should I be concerned?
2. Does order matter? Based on past upgrades, I do not think so, but  
I wanted to be sure. For example, can I update:
mon/mds/radosgw/mgrs first, then afterwards update the osds? This is  
what i have done in previous updates and and all was well.
3. Again on order, if a server serves say, a mon and mds, I can't  
really easily update one without the other, based on shared  
libraries and such.
It appears that that is ok, based on my test cluster, but wanted to  
be sure. Again if an mds is one of the servers to update, I know I  
have to updatethe remaining one after max_mds is set to 1 and others  
are stopped, first.


4. After upgrade of my mgr node I get:
"Module [several module names] has missing NOTIFY_TYPES member"
in ceph-mgr..log

But the mgr starts up eventually

The system is Rocky Linux 8.9
Thanks for any thoughts
-Chris

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ext] Re: cephadm auto disk preparation and OSD installation incomplete

2024-04-03 Thread Eugen Block
Hi and sorry for the delay, I was on vacation last week. :-) I just  
read your responses. I have no idea how to modify the default timeout  
for cephadm, maybe Adam or someone else can comment on that. But  
everytime I've been watching cephadm (ceph-volume) create new OSDs  
they are not created in parallel but sequentially. That would probably  
explain why it takes that long and eventually runs into a timeout. I  
can't really confirm but it would make sense to me if this would be  
the reason for the "offline" hosts during the operation. Does it  
resolve after the process has finished?



 (Excerpt below. Is there any preferred method to provide bigger logs?).


It was just an example, I would focus on one OSD host and inspect the  
cephadm.log. The command I mentioned collects logs from all hosts, and  
it sometimes can be misleading because of missing timestamps.

Did you collect more infos in the meantime?

Thanks,
Eugen

Zitat von "Kuhring, Mathias" :

I'm afraid the parameter mgr  
mgr/cephadm/default_cephadm_command_timeout is buggy.
Once not on default anymore, MGR is preparing the parameter a bit  
(e.g. substracting 5 secs)
And there making it float, but cephadm is not having it (not even if  
I try the default 900 myself):


[WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s):  
osd.all-available-devices
osd.all-available-devices: cephadm exited with an error code: 2,  
stderr:usage:  
cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b

   [-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
   [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
   [--sysctl-dir SYSCTL_DIR] [--unit-dir UNIT_DIR] [--verbose]
   [--timeout TIMEOUT] [--retry RETRY] [--env ENV] [--no-container-init]
   [--no-cgroups-split]

{version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,zap-osds,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts,host-maintenance,agent,disk-rescan}

   ...
cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b:  
error: argument --timeout: invalid int value: '895.0'


This also let to a status panic spiral reporting plenty of host and  
services missing failing (I assume orch failing due to cephadm  
complaining about the parameter).
I got it under control by removing the parameter again from the  
config (ceph config rm mgr  
mgr/cephadm/default_cephadm_command_timeout).
And the restarting all MGRs manually (systemctl restart..., again  
since orch was kinda useless at this stage).


Anyhow, is there any other way I can adapt this parameter?
Or maybe look into speeding up LV creation (if this is the bootleneck)?

Thanks a lot,
Mathias

-Original Message-
From: Kuhring, Mathias 
Sent: Friday, March 22, 2024 5:38 PM
To: Eugen Block ; ceph-users@ceph.io
Subject: [ceph-users] Re: [ext] Re: cephadm auto disk preparation  
and OSD installation incomplete


Hey Eugen,

Thank you for the quick reply.

The 5 missing disks on the one host were completely installed after  
I fully cleaned them up as I described.

So, seems a smaller number of disks can make it.

Regarding the other host with 40 disks:
Failing the MGR didn't have any effect.
There are nor errors in `/var/log/ceph/cephadm.log`.
But a bunch of repeating image listings like:
cephadm --image  
quay.io/ceph/ceph@sha256:1fb108217b110c01c480e32d0cfea0e19955733537af7bb8cbae165222496e09 --timeout 895  
ls


But `ceph log last 200 debug cephadm` gave me a bunch of interesting  
errors (Excerpt below. Is there any preferred method to provide  
bigger logs?).


So, there are some timeouts, which might play into the assumption  
that ceph-volume is a bit overwhelmed by the number of disks.
Shy assumption, but maybe LV creation is taking way too long (is  
cephadm waiting for all of them in bulk?) and times out with the  
default 900 secs.
However, LVs are created and cephadm will not consider them next  
round ("has a filesystem").


I'm testing this theory right now by bumping up the limit to 2 hours  
(and the restart with "fresh" disks again):

ceph config set mgr mgr/cephadm/default_cephadm_command_timeout 7200

However, there are also mentions of the host being not reachable:  
"Unable to reach remote host ceph-3-11"
But this seems to be limited to cephadm / ceph orch, so basically  
MGR but not the rest of the cluster (i.e. MONs, OSDs, etc. are  
communicating happily, as far as I can tell).


During my fresh run, I do notice more hosts being apperently down:
0|0[root@ceph-3-10 ~]# ceph orch host ls | grep Offline
ceph-3-7  172.16.62.38  rgw,osd,_admin Offline
ceph-3-10 172.16.62.41  rgw,osd,_admin,prometheus  Offline
ceph-3-11 172.16.62.43  rgw,osd,_admin Offline
osd-mirror-2  172.16.62.23  rgw,osd,_admin Offline
osd-mirror-3  172.16.62.24  rgw,osd,_admin Offline

But I wonder if this just a

[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
On Wed, Apr 3, 2024 at 3:09 PM Lorenz Bausch  wrote:
>
> Hi Casey,
>
> thank you so much for analysis! We tested the upgraded intensively, but
> the buckets in our test environment were probably too small to get
> dynamically resharded.
>
> > after upgrading to the Quincy release, rgw would
> > look at the wrong object names when trying to list those buckets.
> As we're currently running Quincy, do you think objects/bucket indexes
> might already be altered in a way which makes them also unusable for
> Reef?

for multisite resharding support, the bucket instance metadata now
stores an additional 'layout' structure which contains all of the
information necessary to locate its bucket index objects. on reshard,
the Red Hat pacific release would have stored that information with
the bucket. the upstream Reef release should be able to interpret that
layout data correctly

however, if the Quincy release overwrites that bucket instance
metadata (via an operation like PutBucketAcl, PutBucketPolicy, etc),
the corresponding layout information would be erased such that an
upgrade to Reef would not be able to find the real bucket index shard
objects

>
> Kind regards,
> Lorenz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
We currently have in  src/common/options/global.yaml.in

- name: osd_async_recovery_min_cost
  type: uint
  level: advanced
  desc: A mixture measure of number of current log entries difference and 
historical
missing objects,  above which we switch to use asynchronous recovery when 
appropriate
  default: 100
  flags:
  - runtime

I'd like to rephrase the description there in a PR, might you be able to share 
your insight into the dynamics so I can craft a better description?  And do you 
have any thoughts on the default value?  Might appropriate values vary by pool 
type and/or media?



> On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
> 
> We've had success using osd_async_recovery_min_cost=0 to drastically
> reduce slow ops during index recovery.
> 
> Josh
> 
> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
> wrote:
>> 
>> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
>> supports the RGW index pool causes crippling slow ops. If the OSD is marked
>> with primary-affinity of 0 prior to the OSD restart no slow ops are
>> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
>> ops only occur during the recovery period of the OMAP data and further only
>> occur when client activity is allowed to pass to the cluster. Luckily I am
>> able to test this during periods when I can disable all client activity at
>> the upstream proxy.
>> 
>> Given the behavior of the primary affinity changes preventing the slow ops
>> I think this may be a case of recovery being more detrimental than
>> backfill. I am thinking that causing an pg_temp acting set by forcing
>> backfill may be the right method to mitigate the issue. [1]
>> 
>> I believe that reducing the PG log entries for these OSDs would accomplish
>> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
>> also accomplish something similar. Not sure the appropriate tuning for that
>> config at this point or if there may be a better approach. Seeking any
>> input here.
>> 
>> Further if this issue sounds familiar or sounds like another condition
>> within the OSD may be at hand I would be interested in hearing your input
>> or thoughts. Thanks!
>> 
>> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
>> [2]
>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>> 
>> Respectfully,
>> 
>> *Wes Dillingham*
>> LinkedIn 
>> w...@wesdillingham.com
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD image metric

2024-04-03 Thread Anthony D'Atri
Depending on your Ceph release you might need to enable rbdstats.

Are you after provisioned, allocated, or both sizes?  Do you have object-map 
and fast-diff enabled?  They speed up `rbd du` massively.

> On Apr 3, 2024, at 00:26, Szabo, Istvan (Agoda)  
> wrote:
> 
> Hi,
> 
> Trying to pull out some metrics from ceph about the rbd images sizes but 
> haven't found anything only pool related metrics.
> 
> Wonder is there any metric about images or I need to create by myself to 
> collect it with some third party tool?
> 
> Thank you
> 
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Joshua Baergen
Hey Anthony,

Like with many other options in Ceph, I think what's missing is the
user-visible effect of what's being altered. I believe the reason why
synchronous recovery is still used is that, assuming that per-object
recovery is quick, it's faster to complete than asynchronous recovery,
which has extra steps on either end of the recovery process. Of
course, as you know, synchronous recovery blocks I/O, so when
per-object recovery isn't quick, as in RGW index omap shards,
particularly large shards, IMO we're better off always doing async
recovery.

I don't know enough about the overheads involved here to evaluate
whether it's worth keeping synchronous recovery at all, but IMO RGW
index/usage(/log/gc?) pools are always better off using asynchronous
recovery.

Josh

On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri  wrote:
>
> We currently have in  src/common/options/global.yaml.in
>
> - name: osd_async_recovery_min_cost
>   type: uint
>   level: advanced
>   desc: A mixture measure of number of current log entries difference and 
> historical
> missing objects,  above which we switch to use asynchronous recovery when 
> appropriate
>   default: 100
>   flags:
>   - runtime
>
> I'd like to rephrase the description there in a PR, might you be able to 
> share your insight into the dynamics so I can craft a better description?  
> And do you have any thoughts on the default value?  Might appropriate values 
> vary by pool type and/or media?
>
>
>
> > On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
> >
> > We've had success using osd_async_recovery_min_cost=0 to drastically
> > reduce slow ops during index recovery.
> >
> > Josh
> >
> > On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
> > wrote:
> >>
> >> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> >> supports the RGW index pool causes crippling slow ops. If the OSD is marked
> >> with primary-affinity of 0 prior to the OSD restart no slow ops are
> >> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> >> ops only occur during the recovery period of the OMAP data and further only
> >> occur when client activity is allowed to pass to the cluster. Luckily I am
> >> able to test this during periods when I can disable all client activity at
> >> the upstream proxy.
> >>
> >> Given the behavior of the primary affinity changes preventing the slow ops
> >> I think this may be a case of recovery being more detrimental than
> >> backfill. I am thinking that causing an pg_temp acting set by forcing
> >> backfill may be the right method to mitigate the issue. [1]
> >>
> >> I believe that reducing the PG log entries for these OSDs would accomplish
> >> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
> >> also accomplish something similar. Not sure the appropriate tuning for that
> >> config at this point or if there may be a better approach. Seeking any
> >> input here.
> >>
> >> Further if this issue sounds familiar or sounds like another condition
> >> within the OSD may be at hand I would be interested in hearing your input
> >> or thoughts. Thanks!
> >>
> >> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> >> [2]
> >> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> LinkedIn 
> >> w...@wesdillingham.com
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Anthony D'Atri
Thanks.  I'll PR up some doc updates reflecting this and run them by the RGW / 
RADOS folks.

> On Apr 3, 2024, at 16:34, Joshua Baergen  wrote:
> 
> Hey Anthony,
> 
> Like with many other options in Ceph, I think what's missing is the
> user-visible effect of what's being altered. I believe the reason why
> synchronous recovery is still used is that, assuming that per-object
> recovery is quick, it's faster to complete than asynchronous recovery,
> which has extra steps on either end of the recovery process. Of
> course, as you know, synchronous recovery blocks I/O, so when
> per-object recovery isn't quick, as in RGW index omap shards,
> particularly large shards, IMO we're better off always doing async
> recovery.
> 
> I don't know enough about the overheads involved here to evaluate
> whether it's worth keeping synchronous recovery at all, but IMO RGW
> index/usage(/log/gc?) pools are always better off using asynchronous
> recovery.
> 
> Josh
> 
> On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri  wrote:
>> 
>> We currently have in  src/common/options/global.yaml.in
>> 
>> - name: osd_async_recovery_min_cost
>>  type: uint
>>  level: advanced
>>  desc: A mixture measure of number of current log entries difference and 
>> historical
>>missing objects,  above which we switch to use asynchronous recovery when 
>> appropriate
>>  default: 100
>>  flags:
>>  - runtime
>> 
>> I'd like to rephrase the description there in a PR, might you be able to 
>> share your insight into the dynamics so I can craft a better description?  
>> And do you have any thoughts on the default value?  Might appropriate values 
>> vary by pool type and/or media?
>> 
>> 
>> 
>>> On Apr 3, 2024, at 13:38, Joshua Baergen  wrote:
>>> 
>>> We've had success using osd_async_recovery_min_cost=0 to drastically
>>> reduce slow ops during index recovery.
>>> 
>>> Josh
>>> 
>>> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham  
>>> wrote:
 
 I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
 supports the RGW index pool causes crippling slow ops. If the OSD is marked
 with primary-affinity of 0 prior to the OSD restart no slow ops are
 observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
 ops only occur during the recovery period of the OMAP data and further only
 occur when client activity is allowed to pass to the cluster. Luckily I am
 able to test this during periods when I can disable all client activity at
 the upstream proxy.
 
 Given the behavior of the primary affinity changes preventing the slow ops
 I think this may be a case of recovery being more detrimental than
 backfill. I am thinking that causing an pg_temp acting set by forcing
 backfill may be the right method to mitigate the issue. [1]
 
 I believe that reducing the PG log entries for these OSDs would accomplish
 that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
 also accomplish something similar. Not sure the appropriate tuning for that
 config at this point or if there may be a better approach. Seeking any
 input here.
 
 Further if this issue sounds familiar or sounds like another condition
 within the OSD may be at hand I would be interested in hearing your input
 or thoughts. Thanks!
 
 [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
 [2]
 https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
 
 Respectfully,
 
 *Wes Dillingham*
 LinkedIn 
 w...@wesdillingham.com
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW services crashing randomly with same message

2024-04-03 Thread Reid Guyett
Hello,

We are currently experiencing a lot of rgw service crashes that all seem to
terminate with the same message. We have kept our RGW services at 17.2.5
but the rest of the cluster is 17.2.7 due to a bug introduced in 17.2.7.

terminate called after throwing an instance of
> 'ceph::buffer::v15_2_0::end_of_buffer'
>   what():  End of buffer
> *** Caught signal (Aborted) **
>  in thread 7fcfd1772700 thread_name:radosgw
>  ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy
> (stable)
>

 What would be the best way to figure out what is causing the error? The
rest of the crash is quite long in our logs with 10k lines of history.

Thanks,

Reid
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-04-03 Thread Zakhar Kirpichenko
Any comments regarding `osd noin`, please?

/Z

On Tue, 2 Apr 2024 at 16:09, Zakhar Kirpichenko  wrote:

> Hi,
>
> I'm adding a few OSDs to an existing cluster, the cluster is running with
> `osd noout,noin`:
>
>   cluster:
> id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86
> health: HEALTH_WARN
> noout,noin flag(s) set
>
> Specifically `noin` is documented as "prevents booting OSDs from being
> marked in". But freshly added OSDs were immediately marked `up` and `in`:
>
>   services:
> ...
> osd: 96 osds: 96 up (since 5m), 96 in (since 6m); 338 remapped pgs
>  flags noout,noin
>
> # ceph osd tree in | grep -E "osd.11|osd.12|osd.26"
>  11hdd9.38680  osd.11   up   1.0  1.0
>  12hdd9.38680  osd.12   up   1.0  1.0
>  26hdd9.38680  osd.26   up   1.0  1.0
>
> Is this expected behavior? Do I misunderstand the purpose of the `noin`
> option?
>
> Best regards,
> Zakhar
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io