[ceph-users] bluestore/bluefs: A large number of unfounded read bandwidth
Hi, all and Igor Have a case: https://tracker.ceph.com/issues/61973, I'm not sure if it's related to this PR(https://github.com/ceph/ceph/pull/38902), but it looks very similar. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Per minor-version view on docs.ceph.com
Hi Anthony, > The docs aren't necessarily structured that way, i.e. there isn't a 17.2.6 > docs site as such. We try to document changes in behavior in sync with code, > but don't currently have a process to ensure that a given docs build > corresponds exactly to a given dot release. In fact we sometimes go back and > correct things for earlier releases. I see. > For your purposes I might suggest: > > * Peruse the minor-version release notes for docs PRs > * Pull the release tree for a minor version from git and peruse the .rst > files directly Thank you for suggestion. > Neither is what you're asking for, but it's what we have today. Zac might > have additional thoughts. Zac, do you have any thought? Best, Satoru ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Reef release candidate - v18.1.2
Thanks for the report - this is being fixed in https://github.com/ceph/ceph/pull/52343 On Wed, Jul 12, 2023 at 2:53 PM Stefan Kooman wrote: > On 7/12/23 23:21, Yuri Weinstein wrote: > > Can you elaborate on how you installed cephadm? > > Add ceph repo (mirror): > cat /etc/apt/sources.list.d/ceph.list > deb http://ceph.download.bit.nl/debian-18.1.2 focal main > > wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key > add - > apt update > apt install cephadm > > It install cephadm 18.1.2 > > cephadm bootstrap --mon-ip $ip > > Then it pulls "quay.ceph.io/ceph-ci/ceph main" > > Instead of 18.1.2 container image. > > Gr. Stefan > > > > When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2 > > > > podman run -it quay.io/ceph/ceph:v18.1.2 > > Trying to pull quay.io/ceph/ceph:v18.1.2... > > Getting image source signatures > > Copying blob f3a0532868dc done > > Copying blob 9ba8dbcf96c4 done > > Copying config 3b66ad272b done > > Writing manifest to image destination > > Storing signatures > > [root@66c274be11ab /]# ceph --version > > ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc) > > > > On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman wrote: > >> > >> On 6/30/23 18:36, Yuri Weinstein wrote: > >> > >>> This RC has gone thru partial testing due to issues we are > >>> experiencing in the sepia lab. > >>> Please try it out and report any issues you encounter. Happy testing! > >> > >> If I install cephadm from package, 18.1.2 on ubuntu focal in my case, > >> cepadm usages the ceph-ci/ceph:main container images: "Pulling container > >> image quay.ceph.io/ceph-ci/ceph:main". And these container images are > >> out of date (18.0.0-4869-g05e449f9 > >> (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1). > >> > >> AFAIK there is no way to tell cephadm bootstrap to use a specific > >> version. Although the help mentions "--allow-mismatched-release", so it > >> might be possible? > >> > >> Gr. Stefan > >> > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Reef release candidate - v18.1.2
On 7/12/23 23:21, Yuri Weinstein wrote: Can you elaborate on how you installed cephadm? Add ceph repo (mirror): cat /etc/apt/sources.list.d/ceph.list deb http://ceph.download.bit.nl/debian-18.1.2 focal main wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add - apt update apt install cephadm It install cephadm 18.1.2 cephadm bootstrap --mon-ip $ip Then it pulls "quay.ceph.io/ceph-ci/ceph main" Instead of 18.1.2 container image. Gr. Stefan When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2 podman run -it quay.io/ceph/ceph:v18.1.2 Trying to pull quay.io/ceph/ceph:v18.1.2... Getting image source signatures Copying blob f3a0532868dc done Copying blob 9ba8dbcf96c4 done Copying config 3b66ad272b done Writing manifest to image destination Storing signatures [root@66c274be11ab /]# ceph --version ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc) On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman wrote: On 6/30/23 18:36, Yuri Weinstein wrote: This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you encounter. Happy testing! If I install cephadm from package, 18.1.2 on ubuntu focal in my case, cepadm usages the ceph-ci/ceph:main container images: "Pulling container image quay.ceph.io/ceph-ci/ceph:main". And these container images are out of date (18.0.0-4869-g05e449f9 (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1). AFAIK there is no way to tell cephadm bootstrap to use a specific version. Although the help mentions "--allow-mismatched-release", so it might be possible? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Reef release candidate - v18.1.2
Can you elaborate on how you installed cephadm? When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2 podman run -it quay.io/ceph/ceph:v18.1.2 Trying to pull quay.io/ceph/ceph:v18.1.2... Getting image source signatures Copying blob f3a0532868dc done Copying blob 9ba8dbcf96c4 done Copying config 3b66ad272b done Writing manifest to image destination Storing signatures [root@66c274be11ab /]# ceph --version ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc) On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman wrote: > > On 6/30/23 18:36, Yuri Weinstein wrote: > > > This RC has gone thru partial testing due to issues we are > > experiencing in the sepia lab. > > Please try it out and report any issues you encounter. Happy testing! > > If I install cephadm from package, 18.1.2 on ubuntu focal in my case, > cepadm usages the ceph-ci/ceph:main container images: "Pulling container > image quay.ceph.io/ceph-ci/ceph:main". And these container images are > out of date (18.0.0-4869-g05e449f9 > (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1). > > AFAIK there is no way to tell cephadm bootstrap to use a specific > version. Although the help mentions "--allow-mismatched-release", so it > might be possible? > > Gr. Stefan > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Reef release candidate - v18.1.2
On 6/30/23 18:36, Yuri Weinstein wrote: This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you encounter. Happy testing! If I install cephadm from package, 18.1.2 on ubuntu focal in my case, cepadm usages the ceph-ci/ceph:main container images: "Pulling container image quay.ceph.io/ceph-ci/ceph:main". And these container images are out of date (18.0.0-4869-g05e449f9 (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1). AFAIK there is no way to tell cephadm bootstrap to use a specific version. Although the help mentions "--allow-mismatched-release", so it might be possible? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster down after network outage
On Wed, Jul 12, 2023 at 1:26 AM Frank Schilder wrote: Hi all, > > one problem solved, another coming up. For everyone ending up in the same > situation, the trick seems to be to get all OSDs marked up and then allow > recovery. Steps to take: > > - set noout, nodown, norebalance, norecover > - wait patiently until all OSDs are shown as up > - unset norebalance, norecover > - wait wait wait, PGs will eventually become active as OSDs become > responsive > - unset nodown, noout > Nice work bringing the cluster back up. Looking into an OSD log would give more detail about why they were flapping. Are these HDDs? Are the block.dbs on flash? Generally, I've found that on clusters having OSDs which are slow to boot and flapping up and down, "nodown" is sufficient to recover from such issues. Cheers, Dan __ Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com > > Now the new problem. I now have an ever growing list of OSDs listed as > rebalancing, but nothing is actually rebalancing. How can I stop this > growth and how can I get rid of this list: > > [root@gnosis ~]# ceph status > cluster: > id: XXX > health: HEALTH_WARN > noout flag(s) set > Slow OSD heartbeats on back (longest 634775.858ms) > Slow OSD heartbeats on front (longest 635210.412ms) > 1 pools nearfull > > services: > mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m) > mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02, > ceph-03 > mds: con-fs2:8 4 up:standby 8 up:active > osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m) > flags noout > > data: > pools: 14 pools, 25065 pgs > objects: 1.97G objects, 3.5 PiB > usage: 4.4 PiB used, 8.7 PiB / 13 PiB avail > pgs: 25028 active+clean > 30active+clean+scrubbing+deep > 7 active+clean+scrubbing > > io: > client: 1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr > > progress: > Rebalancing after osd.135 marked in (1s) > [=...] > Rebalancing after osd.69 marked in (2s) > [] > Rebalancing after osd.75 marked in (2s) > [===.] > Rebalancing after osd.173 marked in (2s) > [] > Rebalancing after osd.42 marked in (1s) > [=...] (remaining: 2s) > Rebalancing after osd.104 marked in (2s) > [] > Rebalancing after osd.82 marked in (2s) > [] > Rebalancing after osd.107 marked in (2s) > [===.] > Rebalancing after osd.19 marked in (2s) > [===.] > Rebalancing after osd.67 marked in (2s) > [=...] > Rebalancing after osd.46 marked in (2s) > [===.] (remaining: 1s) > Rebalancing after osd.123 marked in (2s) > [===.] > Rebalancing after osd.66 marked in (2s) > [] > Rebalancing after osd.12 marked in (2s) > [==..] (remaining: 2s) > Rebalancing after osd.95 marked in (2s) > [=...] > Rebalancing after osd.134 marked in (2s) > [===.] > Rebalancing after osd.14 marked in (1s) > [===.] > Rebalancing after osd.56 marked in (2s) > [=...] > Rebalancing after osd.143 marked in (1s) > [] > Rebalancing after osd.118 marked in (2s) > [===.] > Rebalancing after osd.96 marked in (2s) > [] > Rebalancing after osd.105 marked in (2s) > [===.] > Rebalancing after osd.44 marked in (1s) > [===.] (remaining: 5s) > Rebalancing after osd.41 marked in (1s) > [==..] (remaining: 1s) > Rebalancing after osd.9 marked in (2s) > [=...] (remaining: 37s) > Rebalancing after osd.58 marked in (2s) > [==..] (remaining: 8s) > Rebalancing after osd.140 marked in (1s) > [===.] > Rebalancing after osd.132 marked in (2s) > [] > Rebalancing after osd.31 marked in (1s) > [=...] > Rebalancing after osd.110 marked in (2s) > [] > Rebalancing after osd.21 marked in (2s) > [=...] > Rebalancing after osd.114 marked in (2s) > [===.] > Rebalancing after osd.83 marked in (2s) >
[ceph-users] Re: radosgw + keystone breaks when projects have - in their names
For the sake of the archive and future readers: I think we now have an explanation for this issue. Our cloud is one of the few remaining OpenStack deploys which predates the use of UUIDs for OpenStack tenant names; instead our project ids are typically the same as project names. Radosgw checks project ids and rejects any that contain characters other than letters, numbers, and underscores[0]. So that check is actively rejecting many of our projects, including all with - in their names. IMO that check is wrong (see discussion of a similar issue at [1]) but in the meantime we're exploring various terrible workarounds. On the off-chance that anyone reading this has encountered and fixed this same issue, please reach out! -Andrew [0] https://github.com/ceph/ceph/commit/d50ef542372f541ac9411f655cddd5fcab4dceac [1] https://review.opendev.org/c/openstack/cinder/+/864585 On 7/10/23 2:59 PM, Andrew Bogott wrote: I'm in the process of adding the radosgw service to our OpenStack cloud and hoping to re-use keystone for discovery and auth. Things seem to work fine with many keystone tenants, but as soon as we try to do something in a project with a '-' in its name everything fails. Here's an example, using the openstack swift cli: root@cloudcontrol2001-dev:~# OS_PROJECT_ID="testlabs" openstack container create 'makethiscontainer' +---+---++ | account | container | x-trans-id | +---+---++ | AUTH_testlabs | makethiscontainer | tx008c311dbda86c695-0064ac5fad-6927acd-default | +---+---++ root@cloudcontrol2001-dev:~# OS_PROJECT_ID="service" openstack container create 'makethiscontainer' +--+---++ | account | container | x-trans-id | +--+---++ | AUTH_service | makethiscontainer | tx0b341a22866f65e44-0064ac5fb7-6927acd-default | +--+---++ root@cloudcontrol2001-dev:~# OS_PROJECT_ID="admin-monitoring" openstack container create 'makethiscontainer' Bad Request (HTTP 400) (Request-ID: tx0f7326bb541b4d2a9-0064ac5fc2-6927acd-default) Before I dive into the source code, is this a known issue and/or something I can configure? Dash-named-projects work fine in keystone and seem to also work fine with standalone rados; I assume the issue is somewhere in the communication between the two. I suspected the implicit user creation code, but that seems to be working properly: # radosgw-admin user list [ "cloudvirt-canary$cloudvirt-canary", "testlabs$testlabs", "paws-dev$paws-dev", "andrewtestproject$andrewtestproject", "admin-monitoring$admin-monitoring", "taavi-test-project$taavi-test-project", "admin$admin", "taavitestproject$taavitestproject", "bastioninfra-codfw1dev$bastioninfra-codfw1dev", ] Here is the radosgw section of my ceph.conf: [client.radosgw] host = 10.192.20.9 keyring = /etc/ceph/ceph.client.radosgw.keyring rgw frontends = "civetweb port=18080" rgw_keystone_verify_ssl = false rgw_keystone_api_version = 3 rgw_keystone_url = https://openstack.codfw1dev.wikimediacloud.org:25000 rgw_keystone_accepted_roles = 'reader, admin, member' rgw_keystone_implicit_tenants = true rgw_keystone_admin_domain = default rgw_keystone_admin_project = service rgw_keystone_admin_user = swift rgw_keystone_admin_password = (redacted) rgw_s3_auth_use_keystone = true rgw_swift_account_in_url = true rgw_user_default_quota_max_objects = 4096 rgw_user_default_quota_max_size = 8589934592 And here's a debug log of a failed transaction: https://phabricator.wikimedia.org/P49539 Thanks in advance! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Per minor-version view on docs.ceph.com
The docs aren't necessarily structured that way, i.e. there isn't a 17.2.6 docs site as such. We try to document changes in behavior in sync with code, but don't currently have a process to ensure that a given docs build corresponds exactly to a given dot release. In fact we sometimes go back and correct things for earlier releases. For your purposes I might suggest: * Peruse the minor-version release notes for docs PRs * Pull the release tree for a minor version from git and peruse the .rst files directly Neither is what you're asking for, but it's what we have today. Zac might have additional thoughts. > On Jul 11, 2023, at 23:44, Satoru Takeuchi wrote: > > Hi, > > I have a request about docs.ceph.com. Could you provide per minor-version > views > on docs.ceph.com? Currently, we can select the Ceph version > by using `https://docs.ceph.com/en/". In this case, we can > use the major > version's code names (e.g., "quincy") or "latest". However, we can't > use minor version > numbers like "v17.2.6". It's convenient for me (and I guess for many > other users, too) > to be able to select the document for the version which we actually use. > > In my recent case, I've read the mclock's document of quincy because I > use v17.2.6. > However, the document has changed a lot from v17.2.6 to the quincy's latest > one > because of the recent mclock's rework. > > Thanks, > Satoru > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster down after network outage
Answering myself for posteriority. The rebalancing list disappeared after waiting even longer. Might just have been an MGR that needed to catch up. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Wednesday, July 12, 2023 10:25 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Cluster down after network outage Hi all, one problem solved, another coming up. For everyone ending up in the same situation, the trick seems to be to get all OSDs marked up and then allow recovery. Steps to take: - set noout, nodown, norebalance, norecover - wait patiently until all OSDs are shown as up - unset norebalance, norecover - wait wait wait, PGs will eventually become active as OSDs become responsive - unset nodown, noout Now the new problem. I now have an ever growing list of OSDs listed as rebalancing, but nothing is actually rebalancing. How can I stop this growth and how can I get rid of this list: [root@gnosis ~]# ceph status cluster: id: XXX health: HEALTH_WARN noout flag(s) set Slow OSD heartbeats on back (longest 634775.858ms) Slow OSD heartbeats on front (longest 635210.412ms) 1 pools nearfull services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m) mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02, ceph-03 mds: con-fs2:8 4 up:standby 8 up:active osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m) flags noout data: pools: 14 pools, 25065 pgs objects: 1.97G objects, 3.5 PiB usage: 4.4 PiB used, 8.7 PiB / 13 PiB avail pgs: 25028 active+clean 30active+clean+scrubbing+deep 7 active+clean+scrubbing io: client: 1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr progress: Rebalancing after osd.135 marked in (1s) [=...] Rebalancing after osd.69 marked in (2s) [] Rebalancing after osd.75 marked in (2s) [===.] Rebalancing after osd.173 marked in (2s) [] Rebalancing after osd.42 marked in (1s) [=...] (remaining: 2s) Rebalancing after osd.104 marked in (2s) [] Rebalancing after osd.82 marked in (2s) [] Rebalancing after osd.107 marked in (2s) [===.] Rebalancing after osd.19 marked in (2s) [===.] Rebalancing after osd.67 marked in (2s) [=...] Rebalancing after osd.46 marked in (2s) [===.] (remaining: 1s) Rebalancing after osd.123 marked in (2s) [===.] Rebalancing after osd.66 marked in (2s) [] Rebalancing after osd.12 marked in (2s) [==..] (remaining: 2s) Rebalancing after osd.95 marked in (2s) [=...] Rebalancing after osd.134 marked in (2s) [===.] Rebalancing after osd.14 marked in (1s) [===.] Rebalancing after osd.56 marked in (2s) [=...] Rebalancing after osd.143 marked in (1s) [] Rebalancing after osd.118 marked in (2s) [===.] Rebalancing after osd.96 marked in (2s) [] Rebalancing after osd.105 marked in (2s) [===.] Rebalancing after osd.44 marked in (1s) [===.] (remaining: 5s) Rebalancing after osd.41 marked in (1s) [==..] (remaining: 1s) Rebalancing after osd.9 marked in (2s) [=...] (remaining: 37s) Rebalancing after osd.58 marked in (2s) [==..] (remaining: 8s) Rebalancing after osd.140 marked in (1s) [===.] Rebalancing after osd.132 marked in (2s) [] Rebalancing after osd.31 marked in (1s) [=...] Rebalancing after osd.110 marked in (2s) [] Rebalancing after osd.21 marked in (2s) [=...] Rebalancing after osd.114 marked in (2s) [===.] Rebalancing after osd.83 marked in (2s) [===.] Rebalancing after osd.23 marked in (1s) [===.] Rebalancing after osd.25 marked in (1s) [==..] Rebalancing after osd.147 marked in (2s) [] Rebalancing after osd.62
[ceph-users] Production random data not accessible(NoSuchKey)
Hi all, I'm facing a strange problem, where from time to time there are no accessible S3 objects. I've found similar issues [1] , [2] but our clusters have already upgraded to the latest Pacific version. I have noted in the bug report https://tracker.ceph.com/issues/61716 RGW logs [3] Maybe someone has an idea what's wrong? Thanks. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WQ2F2GWI2WRDAGLVRDA7PAAGBJTNN4PI/ [2] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RS2272EWAGCVZ4NOD6JLJVGGUNQYE6FV/#Y243AY63HAMFM7DH3BJ7ZT2BGMD4G4PF [3] https://pastebin.com/ZvBdNi5j -- Jonas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster down after network outage
Hi all, one problem solved, another coming up. For everyone ending up in the same situation, the trick seems to be to get all OSDs marked up and then allow recovery. Steps to take: - set noout, nodown, norebalance, norecover - wait patiently until all OSDs are shown as up - unset norebalance, norecover - wait wait wait, PGs will eventually become active as OSDs become responsive - unset nodown, noout Now the new problem. I now have an ever growing list of OSDs listed as rebalancing, but nothing is actually rebalancing. How can I stop this growth and how can I get rid of this list: [root@gnosis ~]# ceph status cluster: id: XXX health: HEALTH_WARN noout flag(s) set Slow OSD heartbeats on back (longest 634775.858ms) Slow OSD heartbeats on front (longest 635210.412ms) 1 pools nearfull services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m) mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02, ceph-03 mds: con-fs2:8 4 up:standby 8 up:active osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m) flags noout data: pools: 14 pools, 25065 pgs objects: 1.97G objects, 3.5 PiB usage: 4.4 PiB used, 8.7 PiB / 13 PiB avail pgs: 25028 active+clean 30active+clean+scrubbing+deep 7 active+clean+scrubbing io: client: 1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr progress: Rebalancing after osd.135 marked in (1s) [=...] Rebalancing after osd.69 marked in (2s) [] Rebalancing after osd.75 marked in (2s) [===.] Rebalancing after osd.173 marked in (2s) [] Rebalancing after osd.42 marked in (1s) [=...] (remaining: 2s) Rebalancing after osd.104 marked in (2s) [] Rebalancing after osd.82 marked in (2s) [] Rebalancing after osd.107 marked in (2s) [===.] Rebalancing after osd.19 marked in (2s) [===.] Rebalancing after osd.67 marked in (2s) [=...] Rebalancing after osd.46 marked in (2s) [===.] (remaining: 1s) Rebalancing after osd.123 marked in (2s) [===.] Rebalancing after osd.66 marked in (2s) [] Rebalancing after osd.12 marked in (2s) [==..] (remaining: 2s) Rebalancing after osd.95 marked in (2s) [=...] Rebalancing after osd.134 marked in (2s) [===.] Rebalancing after osd.14 marked in (1s) [===.] Rebalancing after osd.56 marked in (2s) [=...] Rebalancing after osd.143 marked in (1s) [] Rebalancing after osd.118 marked in (2s) [===.] Rebalancing after osd.96 marked in (2s) [] Rebalancing after osd.105 marked in (2s) [===.] Rebalancing after osd.44 marked in (1s) [===.] (remaining: 5s) Rebalancing after osd.41 marked in (1s) [==..] (remaining: 1s) Rebalancing after osd.9 marked in (2s) [=...] (remaining: 37s) Rebalancing after osd.58 marked in (2s) [==..] (remaining: 8s) Rebalancing after osd.140 marked in (1s) [===.] Rebalancing after osd.132 marked in (2s) [] Rebalancing after osd.31 marked in (1s) [=...] Rebalancing after osd.110 marked in (2s) [] Rebalancing after osd.21 marked in (2s) [=...] Rebalancing after osd.114 marked in (2s) [===.] Rebalancing after osd.83 marked in (2s) [===.] Rebalancing after osd.23 marked in (1s) [===.] Rebalancing after osd.25 marked in (1s) [==..] Rebalancing after osd.147 marked in (2s) [] Rebalancing after osd.62 marked in (1s) [==..] Rebalancing after osd.57 marked in (2s) [==..] Rebalancing after osd.61 marked in (2s) [] Rebalancing after osd.71 marked in (2s) [===.] Rebalancing after osd.80 marked in (2s) [==..]
[ceph-users] Re: Cluster down after network outage
On 7/12/23 09:53, Frank Schilder wrote: Hi all, we had a network outage tonight (power loss) and restored network in the morning. All OSDs were running during this period. After restoring network peering hell broke loose and the cluster has a hard time coming back up again. OSDs get marked down all the time and come back later. Peering never stops. Below is the current status, I had all OSDs shown as up for a while, but many were not responsive. Are there some flags that help bringing things up in a sequence that causes less overload on the system? osd_recovery_delay_start We have that set on 60 seconds. So the OSD first gets some time to peer before starting recovery. That might help in this case. Worth a shot. Maybe increase it to 5 minutes or more just to get all OSDs stable before recovery starts going? Good luck! Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cluster down after network outage
Hi all, we had a network outage tonight (power loss) and restored network in the morning. All OSDs were running during this period. After restoring network peering hell broke loose and the cluster has a hard time coming back up again. OSDs get marked down all the time and come back later. Peering never stops. Below is the current status, I had all OSDs shown as up for a while, but many were not responsive. Are there some flags that help bringing things up in a sequence that causes less overload on the system? [root@gnosis ~]# ceph status cluster: id: XXX health: HEALTH_WARN 2 clients failing to respond to capability release 6 MDSs report slow metadata IOs 3 MDSs report slow requests nodown,noout,nobackfill,norecover flag(s) set 176 osds down Slow OSD heartbeats on back (longest 551718.679ms) Slow OSD heartbeats on front (longest 549598.330ms) Reduced data availability: 8069 pgs inactive, 3786 pgs down, 3161 pgs peering, 1341 pgs stale Degraded data redundancy: 1187354920/16402772667 objects degraded (7.239%), 6222 pgs degraded, 6231 pgs undersized 1 pools nearfull 17386 slow ops, oldest one blocked for 1811 sec, daemons [osd.1128,osd.1152,osd.1154,osd.12,osd.1227,osd.1244,osd.328,osd.354,osd.381,osd.4]... have slow ops. services: mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 28m) mgr: ceph-25(active, since 30m), standbys: ceph-26, ceph-01, ceph-02, ceph-03 mds: con-fs2:8 4 up:standby 8 up:active osd: 1260 osds: 1082 up (since 6m), 1258 in (since 18m); 266 remapped pgs flags nodown,noout,nobackfill,norecover data: pools: 14 pools, 25065 pgs objects: 1.91G objects, 3.4 PiB usage: 3.1 PiB used, 6.0 PiB / 9.0 PiB avail pgs: 0.626% pgs unknown 31.566% pgs not active 1187354920/16402772667 objects degraded (7.239%) 51/16402772667 objects misplaced (0.000%) 11706 active+clean 4752 active+undersized+degraded 3286 down 2702 peering 799 undersized+degraded+peered 464 stale+down 418 stale+active+undersized+degraded 214 remapped+peering 157 unknown 128 stale+peering 117 stale+remapped+peering 101 stale+undersized+degraded+peered 57stale+active+undersized+degraded+remapped+backfilling 35down+remapped 26stale+undersized+degraded+remapped+backfilling+peered 23undersized+degraded+remapped+backfilling+peered 14active+clean+scrubbing+deep 9 stale+active+undersized+degraded+remapped+backfill_wait 7 active+recovering+undersized+degraded 7 stale+active+recovering+undersized+degraded 6 active+undersized+degraded+remapped+backfilling 6 active+undersized 5 active+undersized+degraded+remapped+backfill_wait 5 stale+remapped 4 stale+activating+undersized+degraded 3 active+undersized+remapped 3 stale+undersized+degraded+remapped+backfill_wait+peered 1 activating+undersized+degraded 1 activating+undersized+degraded+remapped 1 undersized+degraded+remapped+backfill_wait+peered 1 stale+active+clean 1 active+recovering 1 stale+down+remapped 1 undersized+peered 1 active+undersized+degraded+remapped 1 active+clean+scrubbing 1 active+clean+remapped 1 active+recovering+degraded io: client: 1.8 MiB/s rd, 18 MiB/s wr, 409 op/s rd, 796 op/s wr Thanks for any hints! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MON sync time depends on outage duration
My test with a single-host-cluster (virtual machine) finished after around 20 hours. I removed all purged_snap keys from the mon and it actually started again (wasn't sure if I could have expected that). Is that a valid approach in order to reduce the mon store size? Or can it be dangerous? How would that work in a real cluster with multiple MONs? If I stop the first, clean up the mon db, then start it again, wouldn't it sync the keys from its peers? Not sure how that would work... Zitat von Eugen Block : It was installed with Octopus and hasn't been upgraded yet: "require_osd_release": "octopus", Zitat von Josh Baergen : Out of curiosity, what is your require_osd_release set to? (ceph osd dump | grep require_osd_release) Josh On Tue, Jul 11, 2023 at 5:11 AM Eugen Block wrote: I'm not so sure anymore if that could really help here. The dump-keys output from the mon contains 42 million osd_snap prefix entries, 39 million of them are "purged_snap" keys. I also compared to other clusters as well, those aren't tombstones but expected "history" of purged snapshots. So I don't think removing a couple of hundred trash snapshots will actually reduce the number of osd_snap keys. At least doubling the payload_size seems to have a positive impact. The compaction during the sync has a negative impact, of course, same as not having the mon store on SSDs. I'm currently playing with a test cluster, removing all "purged_snap" entries from the mon db (not finished yet) to see what that will do with the mon and if it will even start correctly. But has anyone done that, removing keys from the mon store? Not sure what to expect yet... Zitat von Dan van der Ster : Oh yes, sounds like purging the rbd trash will be the real fix here! Good luck! __ Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com On Mon, Jul 10, 2023 at 6:10 AM Eugen Block wrote: Hi, I got a customer response with payload size 4096, that made things even worse. The mon startup time was now around 40 minutes. My doubts wrt decreasing the payload size seem confirmed. Then I read Dan's response again which also mentions that the default payload size could be too small. So I asked them to double the default (2M instead of 1M) and am now waiting for a new result. I'm still wondering why this only happens when the mon is down for more than 5 minutes. Does anyone have an explanation for that time factor? Another thing they're going to do is to remove lots of snapshot tombstones (rbd mirroring snapshots in the trash namespace), maybe that will reduce the osd_snap keys in the mon db, which then would increase the startup time. We'll see... Zitat von Eugen Block : > Thanks, Dan! > >> Yes that sounds familiar from the luminous and mimic days. >> The workaround for zillions of snapshot keys at that time was to use: >> ceph config set mon mon_sync_max_payload_size 4096 > > I actually did search for mon_sync_max_payload_keys, not bytes so I > missed your thread, it seems. Thanks for pointing that out. So the > defaults seem to be these in Octopus: > > "mon_sync_max_payload_keys": "2000", > "mon_sync_max_payload_size": "1048576", > >> So it could be in your case that the sync payload is just too small to >> efficiently move 42 million osd_snap keys? Using debug_paxos and debug_mon >> you should be able to understand what is taking so long, and tune >> mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly. > > I'm confused, if the payload size is too small, why would decreasing > it help? Or am I misunderstanding something? But it probably won't > hurt to try it with 4096 and see if anything changes. If not we can > still turn on debug logs and take a closer look. > >> And additional to Dan suggestion, the HDD is not a good choices for >> RocksDB, which is most likely the reason for this thread, I think >> that from the 3rd time the database just goes into compaction >> maintenance > > Believe me, I know... but there's not much they can currently do > about it, quite a long story... But I have been telling them that > for months now. Anyway, I will make some suggestions and report back > if it worked in this case as well. > > Thanks! > Eugen > > Zitat von Dan van der Ster : > >> Hi Eugen! >> >> Yes that sounds familiar from the luminous and mimic days. >> >> Check this old thread: >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F3W2HXMYNF52E7LPIQEJFUTAD3I7QE25/ >> (that thread is truncated but I can tell you that it worked for Frank). >> Also the even older referenced thread: >> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/ >> >> The workaround for zillions of snapshot keys at that time was to use: >> ceph config set mon mon_sync_max_payload_size 4096 >> >> That said, that sync issue was supposed to be fixed by way of adding the >> new option