[ceph-users] bluestore/bluefs: A large number of unfounded read bandwidth

2023-07-12 Thread yite gu
Hi, all and Igor
Have a case: https://tracker.ceph.com/issues/61973, I'm not sure if it's
related to this PR(https://github.com/ceph/ceph/pull/38902), but it looks
very similar.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Per minor-version view on docs.ceph.com

2023-07-12 Thread Satoru Takeuchi
Hi Anthony,

> The docs aren't necessarily structured that way, i.e. there isn't a 17.2.6 
> docs site as such.  We try to document changes in behavior in sync with code, 
> but don't currently have a process to ensure that a given docs build 
> corresponds exactly to a given dot release.  In fact we sometimes go back and 
> correct things for earlier releases.

I see.

> For your purposes I might suggest:
>
> * Peruse the minor-version release notes for docs PRs
> * Pull the release tree for a minor version from git and peruse the .rst 
> files directly

Thank you for suggestion.

> Neither is what you're asking for, but it's what we have today.  Zac might 
> have additional thoughts.

Zac, do you have any thought?

Best,
Satoru
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Josh Durgin
Thanks for the report - this is being fixed in
https://github.com/ceph/ceph/pull/52343

On Wed, Jul 12, 2023 at 2:53 PM Stefan Kooman  wrote:

> On 7/12/23 23:21, Yuri Weinstein wrote:
> > Can you elaborate on how you installed cephadm?
>
> Add ceph repo (mirror):
> cat /etc/apt/sources.list.d/ceph.list
> deb http://ceph.download.bit.nl/debian-18.1.2 focal main
>
> wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key
> add -
> apt update
> apt install cephadm
>
> It install cephadm 18.1.2
>
> cephadm bootstrap --mon-ip $ip
>
> Then it pulls "quay.ceph.io/ceph-ci/ceph  main"
>
> Instead of 18.1.2 container image.
>
> Gr. Stefan
>
>
> > When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2
> >
> > podman run -it quay.io/ceph/ceph:v18.1.2
> > Trying to pull quay.io/ceph/ceph:v18.1.2...
> > Getting image source signatures
> > Copying blob f3a0532868dc done
> > Copying blob 9ba8dbcf96c4 done
> > Copying config 3b66ad272b done
> > Writing manifest to image destination
> > Storing signatures
> > [root@66c274be11ab /]# ceph --version
> > ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc)
> >
> > On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman  wrote:
> >>
> >> On 6/30/23 18:36, Yuri Weinstein wrote:
> >>
> >>> This RC has gone thru partial testing due to issues we are
> >>> experiencing in the sepia lab.
> >>> Please try it out and report any issues you encounter. Happy testing!
> >>
> >> If I install cephadm from package, 18.1.2 on ubuntu focal in my case,
> >> cepadm usages the ceph-ci/ceph:main container images: "Pulling container
> >> image quay.ceph.io/ceph-ci/ceph:main". And these container images are
> >> out of date (18.0.0-4869-g05e449f9
> >> (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1).
> >>
> >> AFAIK there is no way to tell cephadm bootstrap to use a specific
> >> version. Although the help mentions "--allow-mismatched-release", so it
> >> might be possible?
> >>
> >> Gr. Stefan
> >>
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Stefan Kooman

On 7/12/23 23:21, Yuri Weinstein wrote:

Can you elaborate on how you installed cephadm?


Add ceph repo (mirror):
cat /etc/apt/sources.list.d/ceph.list
deb http://ceph.download.bit.nl/debian-18.1.2 focal main

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key 
add -

apt update
apt install cephadm

It install cephadm 18.1.2

cephadm bootstrap --mon-ip $ip

Then it pulls "quay.ceph.io/ceph-ci/ceph  main"

Instead of 18.1.2 container image.

Gr. Stefan



When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2

podman run -it quay.io/ceph/ceph:v18.1.2
Trying to pull quay.io/ceph/ceph:v18.1.2...
Getting image source signatures
Copying blob f3a0532868dc done
Copying blob 9ba8dbcf96c4 done
Copying config 3b66ad272b done
Writing manifest to image destination
Storing signatures
[root@66c274be11ab /]# ceph --version
ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc)

On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman  wrote:


On 6/30/23 18:36, Yuri Weinstein wrote:


This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!


If I install cephadm from package, 18.1.2 on ubuntu focal in my case,
cepadm usages the ceph-ci/ceph:main container images: "Pulling container
image quay.ceph.io/ceph-ci/ceph:main". And these container images are
out of date (18.0.0-4869-g05e449f9
(05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1).

AFAIK there is no way to tell cephadm bootstrap to use a specific
version. Although the help mentions "--allow-mismatched-release", so it
might be possible?

Gr. Stefan





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Yuri Weinstein
Can you elaborate on how you installed cephadm?
When I pull from quay.io/ceph/ceph:v18.1.2, I see the version v18.1.2

podman run -it quay.io/ceph/ceph:v18.1.2
Trying to pull quay.io/ceph/ceph:v18.1.2...
Getting image source signatures
Copying blob f3a0532868dc done
Copying blob 9ba8dbcf96c4 done
Copying config 3b66ad272b done
Writing manifest to image destination
Storing signatures
[root@66c274be11ab /]# ceph --version
ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc)

On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman  wrote:
>
> On 6/30/23 18:36, Yuri Weinstein wrote:
>
> > This RC has gone thru partial testing due to issues we are
> > experiencing in the sepia lab.
> > Please try it out and report any issues you encounter. Happy testing!
>
> If I install cephadm from package, 18.1.2 on ubuntu focal in my case,
> cepadm usages the ceph-ci/ceph:main container images: "Pulling container
> image quay.ceph.io/ceph-ci/ceph:main". And these container images are
> out of date (18.0.0-4869-g05e449f9
> (05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1).
>
> AFAIK there is no way to tell cephadm bootstrap to use a specific
> version. Although the help mentions "--allow-mismatched-release", so it
> might be possible?
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Stefan Kooman

On 6/30/23 18:36, Yuri Weinstein wrote:


This RC has gone thru partial testing due to issues we are
experiencing in the sepia lab.
Please try it out and report any issues you encounter. Happy testing!


If I install cephadm from package, 18.1.2 on ubuntu focal in my case, 
cepadm usages the ceph-ci/ceph:main container images: "Pulling container 
image quay.ceph.io/ceph-ci/ceph:main". And these container images are 
out of date (18.0.0-4869-g05e449f9 
(05e449f9a2a65c297f31628af8f01f63cf36f261) reef (dev)": 1).


AFAIK there is no way to tell cephadm bootstrap to use a specific 
version. Although the help mentions "--allow-mismatched-release", so it 
might be possible?


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Dan van der Ster
On Wed, Jul 12, 2023 at 1:26 AM Frank Schilder  wrote:

Hi all,
>
> one problem solved, another coming up. For everyone ending up in the same
> situation, the trick seems to be to get all OSDs marked up and then allow
> recovery. Steps to take:
>
> - set noout, nodown, norebalance, norecover
> - wait patiently until all OSDs are shown as up
> - unset norebalance, norecover
> - wait wait wait, PGs will eventually become active as OSDs become
> responsive
> - unset nodown, noout
>

Nice work bringing the cluster back up.
Looking into an OSD log would give more detail about why they were
flapping. Are these HDDs? Are the block.dbs on flash?

Generally, I've found that on clusters having OSDs which are slow to boot
and flapping up and down, "nodown" is sufficient to recover from such
issues.

Cheers, Dan

__ Clyso GmbH | Ceph
Support and Consulting | https://www.clyso.com





>
> Now the new problem. I now have an ever growing list of OSDs listed as
> rebalancing, but nothing is actually rebalancing. How can I stop this
> growth and how can I get rid of this list:
>
> [root@gnosis ~]# ceph status
>   cluster:
> id: XXX
> health: HEALTH_WARN
> noout flag(s) set
> Slow OSD heartbeats on back (longest 634775.858ms)
> Slow OSD heartbeats on front (longest 635210.412ms)
> 1 pools nearfull
>
>   services:
> mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m)
> mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02,
> ceph-03
> mds: con-fs2:8 4 up:standby 8 up:active
> osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m)
>  flags noout
>
>   data:
> pools:   14 pools, 25065 pgs
> objects: 1.97G objects, 3.5 PiB
> usage:   4.4 PiB used, 8.7 PiB / 13 PiB avail
> pgs: 25028 active+clean
>  30active+clean+scrubbing+deep
>  7 active+clean+scrubbing
>
>   io:
> client:   1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr
>
>   progress:
> Rebalancing after osd.135 marked in (1s)
>   [=...]
> Rebalancing after osd.69 marked in (2s)
>   []
> Rebalancing after osd.75 marked in (2s)
>   [===.]
> Rebalancing after osd.173 marked in (2s)
>   []
> Rebalancing after osd.42 marked in (1s)
>   [=...] (remaining: 2s)
> Rebalancing after osd.104 marked in (2s)
>   []
> Rebalancing after osd.82 marked in (2s)
>   []
> Rebalancing after osd.107 marked in (2s)
>   [===.]
> Rebalancing after osd.19 marked in (2s)
>   [===.]
> Rebalancing after osd.67 marked in (2s)
>   [=...]
> Rebalancing after osd.46 marked in (2s)
>   [===.] (remaining: 1s)
> Rebalancing after osd.123 marked in (2s)
>   [===.]
> Rebalancing after osd.66 marked in (2s)
>   []
> Rebalancing after osd.12 marked in (2s)
>   [==..] (remaining: 2s)
> Rebalancing after osd.95 marked in (2s)
>   [=...]
> Rebalancing after osd.134 marked in (2s)
>   [===.]
> Rebalancing after osd.14 marked in (1s)
>   [===.]
> Rebalancing after osd.56 marked in (2s)
>   [=...]
> Rebalancing after osd.143 marked in (1s)
>   []
> Rebalancing after osd.118 marked in (2s)
>   [===.]
> Rebalancing after osd.96 marked in (2s)
>   []
> Rebalancing after osd.105 marked in (2s)
>   [===.]
> Rebalancing after osd.44 marked in (1s)
>   [===.] (remaining: 5s)
> Rebalancing after osd.41 marked in (1s)
>   [==..] (remaining: 1s)
> Rebalancing after osd.9 marked in (2s)
>   [=...] (remaining: 37s)
> Rebalancing after osd.58 marked in (2s)
>   [==..] (remaining: 8s)
> Rebalancing after osd.140 marked in (1s)
>   [===.]
> Rebalancing after osd.132 marked in (2s)
>   []
> Rebalancing after osd.31 marked in (1s)
>   [=...]
> Rebalancing after osd.110 marked in (2s)
>   []
> Rebalancing after osd.21 marked in (2s)
>   [=...]
> Rebalancing after osd.114 marked in (2s)
>   [===.]
> Rebalancing after osd.83 marked in (2s)
>   

[ceph-users] Re: radosgw + keystone breaks when projects have - in their names

2023-07-12 Thread Andrew Bogott
For the sake of the archive and future readers: I think we now have an 
explanation for this issue.


Our cloud is one of the few remaining OpenStack deploys which predates 
the use of UUIDs for OpenStack tenant names; instead our project ids are 
typically the same as project names. Radosgw checks project ids and 
rejects any that contain characters other than letters, numbers, and 
underscores[0]. So that check is actively rejecting many of our 
projects, including all with - in their names.


IMO that check is wrong (see discussion of a similar issue at [1]) but 
in the meantime we're exploring various terrible workarounds. On the 
off-chance that anyone reading this has encountered and fixed this same 
issue, please reach out!


-Andrew



[0] 
https://github.com/ceph/ceph/commit/d50ef542372f541ac9411f655cddd5fcab4dceac

[1] https://review.opendev.org/c/openstack/cinder/+/864585


On 7/10/23 2:59 PM, Andrew Bogott wrote:
I'm in the process of adding the radosgw service to our OpenStack 
cloud and hoping to re-use keystone for discovery and auth. Things 
seem to work fine with many keystone tenants, but as soon as we try to 
do something in a project with a '-' in its name everything fails.


Here's an example, using the openstack swift cli:

root@cloudcontrol2001-dev:~# OS_PROJECT_ID="testlabs" openstack 
container create 'makethiscontainer'
+---+---++ 

| account   | container | 
x-trans-id |
+---+---++ 

| AUTH_testlabs | makethiscontainer | 
tx008c311dbda86c695-0064ac5fad-6927acd-default |
+---+---++ 

root@cloudcontrol2001-dev:~# OS_PROJECT_ID="service" openstack 
container create 'makethiscontainer'
+--+---++ 

| account  | container | 
x-trans-id |
+--+---++ 

| AUTH_service | makethiscontainer | 
tx0b341a22866f65e44-0064ac5fb7-6927acd-default |
+--+---++ 

root@cloudcontrol2001-dev:~# OS_PROJECT_ID="admin-monitoring" 
openstack container create 'makethiscontainer'
Bad Request (HTTP 400) (Request-ID: 
tx0f7326bb541b4d2a9-0064ac5fc2-6927acd-default)



Before I dive into the source code, is this a known issue and/or 
something I can configure? Dash-named-projects work fine in keystone 
and seem to also work fine with standalone rados; I assume the issue 
is somewhere in the communication between the two. I suspected the 
implicit user creation code, but that seems to be working properly:


# radosgw-admin user list
[
    "cloudvirt-canary$cloudvirt-canary",
    "testlabs$testlabs",
    "paws-dev$paws-dev",
    "andrewtestproject$andrewtestproject",
    "admin-monitoring$admin-monitoring",
    "taavi-test-project$taavi-test-project",
    "admin$admin",
    "taavitestproject$taavitestproject",
    "bastioninfra-codfw1dev$bastioninfra-codfw1dev",
]

Here is the radosgw section of my ceph.conf:

[client.radosgw]

    host = 10.192.20.9
    keyring = /etc/ceph/ceph.client.radosgw.keyring
    rgw frontends = "civetweb port=18080"
    rgw_keystone_verify_ssl = false
    rgw_keystone_api_version = 3
    rgw_keystone_url = 
https://openstack.codfw1dev.wikimediacloud.org:25000

    rgw_keystone_accepted_roles = 'reader, admin, member'
    rgw_keystone_implicit_tenants = true
    rgw_keystone_admin_domain = default
    rgw_keystone_admin_project = service
    rgw_keystone_admin_user = swift
    rgw_keystone_admin_password = (redacted)
    rgw_s3_auth_use_keystone = true
    rgw_swift_account_in_url = true

    rgw_user_default_quota_max_objects = 4096
    rgw_user_default_quota_max_size = 8589934592


And here's a debug log of a failed transaction:

    https://phabricator.wikimedia.org/P49539

Thanks in advance!


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Per minor-version view on docs.ceph.com

2023-07-12 Thread Anthony D'Atri
The docs aren't necessarily structured that way, i.e. there isn't a 17.2.6 docs 
site as such.  We try to document changes in behavior in sync with code, but 
don't currently have a process to ensure that a given docs build corresponds 
exactly to a given dot release.  In fact we sometimes go back and correct 
things for earlier releases.

For your purposes I might suggest:

* Peruse the minor-version release notes for docs PRs
* Pull the release tree for a minor version from git and peruse the .rst files 
directly

Neither is what you're asking for, but it's what we have today.  Zac might have 
additional thoughts.

> On Jul 11, 2023, at 23:44, Satoru Takeuchi  wrote:
> 
> Hi,
> 
> I have a request about docs.ceph.com. Could you provide per minor-version 
> views
> on docs.ceph.com? Currently, we can select the Ceph version
> by using `https://docs.ceph.com/en/". In this case, we can
> use the major
> version's code names (e.g., "quincy") or "latest". However, we can't
> use minor version
> numbers like "v17.2.6". It's convenient for me (and I guess for many
> other users, too)
> to be able to select the document for the version which we actually use.
> 
> In my recent case, I've read the mclock's document of quincy because I
> use v17.2.6.
> However, the document has changed a lot from v17.2.6 to the quincy's latest 
> one
> because of the recent mclock's rework.
> 
> Thanks,
> Satoru
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Frank Schilder
Answering myself for posteriority. The rebalancing list disappeared after 
waiting even longer. Might just have been an MGR that needed to catch up.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Wednesday, July 12, 2023 10:25 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Cluster down after network outage

Hi all,

one problem solved, another coming up. For everyone ending up in the same 
situation, the trick seems to be to get all OSDs marked up and then allow 
recovery. Steps to take:

- set noout, nodown, norebalance, norecover
- wait patiently until all OSDs are shown as up
- unset norebalance, norecover
- wait wait wait, PGs will eventually become active as OSDs become responsive
- unset nodown, noout

Now the new problem. I now have an ever growing list of OSDs listed as 
rebalancing, but nothing is actually rebalancing. How can I stop this growth 
and how can I get rid of this list:

[root@gnosis ~]# ceph status
  cluster:
id: XXX
health: HEALTH_WARN
noout flag(s) set
Slow OSD heartbeats on back (longest 634775.858ms)
Slow OSD heartbeats on front (longest 635210.412ms)
1 pools nearfull

  services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m)
mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02, 
ceph-03
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m)
 flags noout

  data:
pools:   14 pools, 25065 pgs
objects: 1.97G objects, 3.5 PiB
usage:   4.4 PiB used, 8.7 PiB / 13 PiB avail
pgs: 25028 active+clean
 30active+clean+scrubbing+deep
 7 active+clean+scrubbing

  io:
client:   1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr

  progress:
Rebalancing after osd.135 marked in (1s)
  [=...]
Rebalancing after osd.69 marked in (2s)
  []
Rebalancing after osd.75 marked in (2s)
  [===.]
Rebalancing after osd.173 marked in (2s)
  []
Rebalancing after osd.42 marked in (1s)
  [=...] (remaining: 2s)
Rebalancing after osd.104 marked in (2s)
  []
Rebalancing after osd.82 marked in (2s)
  []
Rebalancing after osd.107 marked in (2s)
  [===.]
Rebalancing after osd.19 marked in (2s)
  [===.]
Rebalancing after osd.67 marked in (2s)
  [=...]
Rebalancing after osd.46 marked in (2s)
  [===.] (remaining: 1s)
Rebalancing after osd.123 marked in (2s)
  [===.]
Rebalancing after osd.66 marked in (2s)
  []
Rebalancing after osd.12 marked in (2s)
  [==..] (remaining: 2s)
Rebalancing after osd.95 marked in (2s)
  [=...]
Rebalancing after osd.134 marked in (2s)
  [===.]
Rebalancing after osd.14 marked in (1s)
  [===.]
Rebalancing after osd.56 marked in (2s)
  [=...]
Rebalancing after osd.143 marked in (1s)
  []
Rebalancing after osd.118 marked in (2s)
  [===.]
Rebalancing after osd.96 marked in (2s)
  []
Rebalancing after osd.105 marked in (2s)
  [===.]
Rebalancing after osd.44 marked in (1s)
  [===.] (remaining: 5s)
Rebalancing after osd.41 marked in (1s)
  [==..] (remaining: 1s)
Rebalancing after osd.9 marked in (2s)
  [=...] (remaining: 37s)
Rebalancing after osd.58 marked in (2s)
  [==..] (remaining: 8s)
Rebalancing after osd.140 marked in (1s)
  [===.]
Rebalancing after osd.132 marked in (2s)
  []
Rebalancing after osd.31 marked in (1s)
  [=...]
Rebalancing after osd.110 marked in (2s)
  []
Rebalancing after osd.21 marked in (2s)
  [=...]
Rebalancing after osd.114 marked in (2s)
  [===.]
Rebalancing after osd.83 marked in (2s)
  [===.]
Rebalancing after osd.23 marked in (1s)
  [===.]
Rebalancing after osd.25 marked in (1s)
  [==..]
Rebalancing after osd.147 marked in (2s)
  []
Rebalancing after osd.62 

[ceph-users] Production random data not accessible(NoSuchKey)

2023-07-12 Thread Jonas Nemeiksis
Hi all,

I'm facing a strange problem, where from time to time there are no
accessible S3 objects.

I've found similar issues [1] , [2] but our clusters have already upgraded
to the latest Pacific version.

I have noted in the bug report https://tracker.ceph.com/issues/61716

RGW logs [3]

Maybe someone has an idea what's wrong?


Thanks.


[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WQ2F2GWI2WRDAGLVRDA7PAAGBJTNN4PI/
[2]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/RS2272EWAGCVZ4NOD6JLJVGGUNQYE6FV/#Y243AY63HAMFM7DH3BJ7ZT2BGMD4G4PF
[3]
https://pastebin.com/ZvBdNi5j

-- 
Jonas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Frank Schilder
Hi all,

one problem solved, another coming up. For everyone ending up in the same 
situation, the trick seems to be to get all OSDs marked up and then allow 
recovery. Steps to take:

- set noout, nodown, norebalance, norecover
- wait patiently until all OSDs are shown as up
- unset norebalance, norecover
- wait wait wait, PGs will eventually become active as OSDs become responsive
- unset nodown, noout

Now the new problem. I now have an ever growing list of OSDs listed as 
rebalancing, but nothing is actually rebalancing. How can I stop this growth 
and how can I get rid of this list:

[root@gnosis ~]# ceph status
  cluster:
id: XXX
health: HEALTH_WARN
noout flag(s) set
Slow OSD heartbeats on back (longest 634775.858ms)
Slow OSD heartbeats on front (longest 635210.412ms)
1 pools nearfull
 
  services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 6m)
mgr: ceph-25(active, since 57m), standbys: ceph-26, ceph-01, ceph-02, 
ceph-03
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1258 up (since 24m), 1258 in (since 45m)
 flags noout
 
  data:
pools:   14 pools, 25065 pgs
objects: 1.97G objects, 3.5 PiB
usage:   4.4 PiB used, 8.7 PiB / 13 PiB avail
pgs: 25028 active+clean
 30active+clean+scrubbing+deep
 7 active+clean+scrubbing
 
  io:
client:   1.3 GiB/s rd, 718 MiB/s wr, 7.71k op/s rd, 2.54k op/s wr
 
  progress:
Rebalancing after osd.135 marked in (1s)
  [=...] 
Rebalancing after osd.69 marked in (2s)
  [] 
Rebalancing after osd.75 marked in (2s)
  [===.] 
Rebalancing after osd.173 marked in (2s)
  [] 
Rebalancing after osd.42 marked in (1s)
  [=...] (remaining: 2s)
Rebalancing after osd.104 marked in (2s)
  [] 
Rebalancing after osd.82 marked in (2s)
  [] 
Rebalancing after osd.107 marked in (2s)
  [===.] 
Rebalancing after osd.19 marked in (2s)
  [===.] 
Rebalancing after osd.67 marked in (2s)
  [=...] 
Rebalancing after osd.46 marked in (2s)
  [===.] (remaining: 1s)
Rebalancing after osd.123 marked in (2s)
  [===.] 
Rebalancing after osd.66 marked in (2s)
  [] 
Rebalancing after osd.12 marked in (2s)
  [==..] (remaining: 2s)
Rebalancing after osd.95 marked in (2s)
  [=...] 
Rebalancing after osd.134 marked in (2s)
  [===.] 
Rebalancing after osd.14 marked in (1s)
  [===.] 
Rebalancing after osd.56 marked in (2s)
  [=...] 
Rebalancing after osd.143 marked in (1s)
  [] 
Rebalancing after osd.118 marked in (2s)
  [===.] 
Rebalancing after osd.96 marked in (2s)
  [] 
Rebalancing after osd.105 marked in (2s)
  [===.] 
Rebalancing after osd.44 marked in (1s)
  [===.] (remaining: 5s)
Rebalancing after osd.41 marked in (1s)
  [==..] (remaining: 1s)
Rebalancing after osd.9 marked in (2s)
  [=...] (remaining: 37s)
Rebalancing after osd.58 marked in (2s)
  [==..] (remaining: 8s)
Rebalancing after osd.140 marked in (1s)
  [===.] 
Rebalancing after osd.132 marked in (2s)
  [] 
Rebalancing after osd.31 marked in (1s)
  [=...] 
Rebalancing after osd.110 marked in (2s)
  [] 
Rebalancing after osd.21 marked in (2s)
  [=...] 
Rebalancing after osd.114 marked in (2s)
  [===.] 
Rebalancing after osd.83 marked in (2s)
  [===.] 
Rebalancing after osd.23 marked in (1s)
  [===.] 
Rebalancing after osd.25 marked in (1s)
  [==..] 
Rebalancing after osd.147 marked in (2s)
  [] 
Rebalancing after osd.62 marked in (1s)
  [==..] 
Rebalancing after osd.57 marked in (2s)
  [==..] 
Rebalancing after osd.61 marked in (2s)
  [] 
Rebalancing after osd.71 marked in (2s)
  [===.] 
Rebalancing after osd.80 marked in (2s)
  [==..] 

[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Stefan Kooman

On 7/12/23 09:53, Frank Schilder wrote:

Hi all,

we had a network outage tonight (power loss) and restored network in the 
morning. All OSDs were running during this period. After restoring network 
peering hell broke loose and the cluster has a hard time coming back up again. 
OSDs get marked down all the time and come back later. Peering never stops.

Below is the current status, I had all OSDs shown as up for a while, but many 
were not responsive. Are there some flags that help bringing things up in a 
sequence that causes less overload on the system?


osd_recovery_delay_start

We have that set on 60 seconds. So the OSD first gets some time to peer 
before starting recovery. That might help in this case. Worth a shot. 
Maybe increase it to 5 minutes or more just to get all OSDs stable 
before recovery starts going?


Good luck!

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cluster down after network outage

2023-07-12 Thread Frank Schilder
Hi all,

we had a network outage tonight (power loss) and restored network in the 
morning. All OSDs were running during this period. After restoring network 
peering hell broke loose and the cluster has a hard time coming back up again. 
OSDs get marked down all the time and come back later. Peering never stops.

Below is the current status, I had all OSDs shown as up for a while, but many 
were not responsive. Are there some flags that help bringing things up in a 
sequence that causes less overload on the system?

[root@gnosis ~]# ceph status
  cluster:
id: XXX
health: HEALTH_WARN
2 clients failing to respond to capability release
6 MDSs report slow metadata IOs
3 MDSs report slow requests
nodown,noout,nobackfill,norecover flag(s) set
176 osds down
Slow OSD heartbeats on back (longest 551718.679ms)
Slow OSD heartbeats on front (longest 549598.330ms)
Reduced data availability: 8069 pgs inactive, 3786 pgs down, 3161 
pgs peering, 1341 pgs stale
Degraded data redundancy: 1187354920/16402772667 objects degraded 
(7.239%), 6222 pgs degraded, 6231 pgs undersized
1 pools nearfull
17386 slow ops, oldest one blocked for 1811 sec, daemons 
[osd.1128,osd.1152,osd.1154,osd.12,osd.1227,osd.1244,osd.328,osd.354,osd.381,osd.4]...
 have slow ops.
 
  services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 28m)
mgr: ceph-25(active, since 30m), standbys: ceph-26, ceph-01, ceph-02, 
ceph-03
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1260 osds: 1082 up (since 6m), 1258 in (since 18m); 266 remapped pgs
 flags nodown,noout,nobackfill,norecover
 
  data:
pools:   14 pools, 25065 pgs
objects: 1.91G objects, 3.4 PiB
usage:   3.1 PiB used, 6.0 PiB / 9.0 PiB avail
pgs: 0.626% pgs unknown
 31.566% pgs not active
 1187354920/16402772667 objects degraded (7.239%)
 51/16402772667 objects misplaced (0.000%)
 11706 active+clean
 4752  active+undersized+degraded
 3286  down
 2702  peering
 799   undersized+degraded+peered
 464   stale+down
 418   stale+active+undersized+degraded
 214   remapped+peering
 157   unknown
 128   stale+peering
 117   stale+remapped+peering
 101   stale+undersized+degraded+peered
 57stale+active+undersized+degraded+remapped+backfilling
 35down+remapped
 26stale+undersized+degraded+remapped+backfilling+peered
 23undersized+degraded+remapped+backfilling+peered
 14active+clean+scrubbing+deep
 9 stale+active+undersized+degraded+remapped+backfill_wait
 7 active+recovering+undersized+degraded
 7 stale+active+recovering+undersized+degraded
 6 active+undersized+degraded+remapped+backfilling
 6 active+undersized
 5 active+undersized+degraded+remapped+backfill_wait
 5 stale+remapped
 4 stale+activating+undersized+degraded
 3 active+undersized+remapped
 3 stale+undersized+degraded+remapped+backfill_wait+peered
 1 activating+undersized+degraded
 1 activating+undersized+degraded+remapped
 1 undersized+degraded+remapped+backfill_wait+peered
 1 stale+active+clean
 1 active+recovering
 1 stale+down+remapped
 1 undersized+peered
 1 active+undersized+degraded+remapped
 1 active+clean+scrubbing
 1 active+clean+remapped
 1 active+recovering+degraded
 
  io:
client:   1.8 MiB/s rd, 18 MiB/s wr, 409 op/s rd, 796 op/s wr

Thanks for any hints!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MON sync time depends on outage duration

2023-07-12 Thread Eugen Block
My test with a single-host-cluster (virtual machine) finished after  
around 20 hours. I removed all purged_snap keys from the mon and it  
actually started again (wasn't sure if I could have expected that). Is  
that a valid approach in order to reduce the mon store size? Or can it  
be dangerous? How would that work in a real cluster with multiple  
MONs? If I stop the first, clean up the mon db, then start it again,  
wouldn't it sync the keys from its peers? Not sure how that would  
work...


Zitat von Eugen Block :


It was installed with Octopus and hasn't been upgraded yet:

"require_osd_release": "octopus",


Zitat von Josh Baergen :


Out of curiosity, what is your require_osd_release set to? (ceph osd
dump | grep require_osd_release)

Josh

On Tue, Jul 11, 2023 at 5:11 AM Eugen Block  wrote:


I'm not so sure anymore if that could really help here. The dump-keys
output from the mon contains 42 million osd_snap prefix entries, 39
million of them are "purged_snap" keys. I also compared to other
clusters as well, those aren't tombstones but expected "history" of
purged snapshots. So I don't think removing a couple of hundred trash
snapshots will actually reduce the number of osd_snap keys. At least
doubling the payload_size seems to have a positive impact. The
compaction during the sync has a negative impact, of course, same as
not having the mon store on SSDs.
I'm currently playing with a test cluster, removing all "purged_snap"
entries from the mon db (not finished yet) to see what that will do
with the mon and if it will even start correctly. But has anyone done
that, removing keys from the mon store? Not sure what to expect yet...

Zitat von Dan van der Ster :


Oh yes, sounds like purging the rbd trash will be the real fix here!
Good luck!

__
Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com




On Mon, Jul 10, 2023 at 6:10 AM Eugen Block  wrote:


Hi,
I got a customer response with payload size 4096, that made things
even worse. The mon startup time was now around 40 minutes. My doubts
wrt decreasing the payload size seem confirmed. Then I read Dan's
response again which also mentions that the default payload size could
be too small. So I asked them to double the default (2M instead of 1M)
and am now waiting for a new result. I'm still wondering why this only
happens when the mon is down for more than 5 minutes. Does anyone have
an explanation for that time factor?
Another thing they're going to do is to remove lots of snapshot
tombstones (rbd mirroring snapshots in the trash namespace), maybe
that will reduce the osd_snap keys in the mon db, which then would
increase the startup time. We'll see...

Zitat von Eugen Block :

> Thanks, Dan!
>
>> Yes that sounds familiar from the luminous and mimic days.
>> The workaround for zillions of snapshot keys at that time was to use:
>>   ceph config set mon mon_sync_max_payload_size 4096
>
> I actually did search for mon_sync_max_payload_keys, not bytes so I
> missed your thread, it seems. Thanks for pointing that out. So the
> defaults seem to be these in Octopus:
>
> "mon_sync_max_payload_keys": "2000",
> "mon_sync_max_payload_size": "1048576",
>
>> So it could be in your case that the sync payload is just too small to
>> efficiently move 42 million osd_snap keys? Using debug_paxos and
debug_mon
>> you should be able to understand what is taking so long, and tune
>> mon_sync_max_payload_size and mon_sync_max_payload_keys accordingly.
>
> I'm confused, if the payload size is too small, why would decreasing
> it help? Or am I misunderstanding something? But it probably won't
> hurt to try it with 4096 and see if anything changes. If not we can
> still turn on debug logs and take a closer look.
>
>> And additional to Dan suggestion, the HDD is not a good choices for
>> RocksDB, which is most likely the reason for this thread, I think
>> that from the 3rd time the database just goes into compaction
>> maintenance
>
> Believe me, I know... but there's not much they can currently do
> about it, quite a long story... But I have been telling them that
> for months now. Anyway, I will make some suggestions and report back
> if it worked in this case as well.
>
> Thanks!
> Eugen
>
> Zitat von Dan van der Ster :
>
>> Hi Eugen!
>>
>> Yes that sounds familiar from the luminous and mimic days.
>>
>> Check this old thread:
>>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F3W2HXMYNF52E7LPIQEJFUTAD3I7QE25/
>> (that thread is truncated but I can tell you that it worked  
for Frank).

>> Also the even older referenced thread:
>>
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/M5ZKF7PTEO2OGDDY5L74EV4QS5SDCZTH/
>>
>> The workaround for zillions of snapshot keys at that time was to use:
>>   ceph config set mon mon_sync_max_payload_size 4096
>>
>> That said, that sync issue was supposed to be fixed by way of  
adding the

>> new option