[ceph-users] delete s3 bucket too slow?

2024-06-18 Thread Simon Oosthoek
Hi

when deleting an S3 bucket, the operation took longer than the time-out for
the dashboard, causing the delete to fail. Some of these buckets are really
large (100+TB), also our cluster is very busy with replacing broken OSD
disks, but how can such an operation (delete bucket) take too long for the
dashboard to time out and not do it?

What is the right way to do this?

We are currently on Quincy (17.2.7) using packages for Ubuntu.

Cheers

/Simon

-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will instead
allow this when I send mail to a mailling list which has not yet been
configured to send mail "on behalf of" the sender, but rather do a kind of
"forward". The latter situation causes dkim/dmarc failures and the dmarc
policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Simon Oosthoek
Hi Wes,

thanks the `ceph tell mon.* sessions` got me the answer very quickly :-)

Cheers

/Simon

On Thu, 21 Dec 2023 at 18:27, Wesley Dillingham 
wrote:

> You can ask the monitor to dump its sessions (which should expose the IPs
> and the release / features) you can then track down by IP those with the
> undesirable features/release
>
> ceph daemon mon.`hostname -s` sessions
>
> Assuming your mon is named after the short hostname, you may need to do
> this for every mon.  Alternatively using the `ceph tell mon.* sessions` to
> hit every mon at once.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Thu, Dec 21, 2023 at 10:46 AM Anthony D'Atri 
> wrote:
>
>> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features
>> {
>> "mon": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 3
>> }
>> ],
>> "osd": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 600
>> }
>> ],
>> "client": [
>> {
>> "features": "0x2f018fb87aa4aafe",
>> "release": "luminous",
>> "num": 41
>> },
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 147
>> }
>> ],
>> "mgr": [
>> {
>> "features": "0x3f01cfbf7ffd",
>> "release": "luminous",
>> "num": 2
>> }
>> ]
>> }
>> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$
>>
>> IIRC there are nuances, there are case where a client can *look* like
>> Jewel but actually be okay.
>>
>>
>> > On Dec 21, 2023, at 10:41, Simon Oosthoek 
>> wrote:
>> >
>> > Hi,
>> >
>> > Our cluster is currently running quincy, and I want to set the minimal
>> > client version to luminous, to enable upmap balancer, but when I tried
>> to,
>> > I got this:
>> >
>> > # ceph osd set-require-min-compat-client luminous Error EPERM: cannot
>> set
>> > require_min_compat_client to luminous: 2 connected client(s) look like
>> > jewel (missing 0x800); add --yes-i-really-mean-it to do it
>> > anyway
>> >
>> > I think I know the most likely candidate (and I've asked them), but is
>> > there a way to find out, the way ceph seems to know?
>> >
>> > tnx
>> >
>> > /Simon
>> > --
>> > I'm using my gmail.com address, because the gmail.com dmarc policy is
>> > "none", some mail servers will reject this (microsoft?) others will
>> instead
>> > allow this when I send mail to a mailling list which has not yet been
>> > configured to send mail "on behalf of" the sender, but rather do a kind
>> of
>> > "forward". The latter situation causes dkim/dmarc failures and the dmarc
>> > policy will be applied. see https://wiki.list.org/DEV/DMARC for more
>> details
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>

-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will instead
allow this when I send mail to a mailling list which has not yet been
configured to send mail "on behalf of" the sender, but rather do a kind of
"forward". The latter situation causes dkim/dmarc failures and the dmarc
policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Simon Oosthoek
Hi,

Our cluster is currently running quincy, and I want to set the minimal
client version to luminous, to enable upmap balancer, but when I tried to,
I got this:

# ceph osd set-require-min-compat-client luminous Error EPERM: cannot set
require_min_compat_client to luminous: 2 connected client(s) look like
jewel (missing 0x800); add --yes-i-really-mean-it to do it
anyway

I think I know the most likely candidate (and I've asked them), but is
there a way to find out, the way ceph seems to know?

tnx

/Simon
-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will instead
allow this when I send mail to a mailling list which has not yet been
configured to send mail "on behalf of" the sender, but rather do a kind of
"forward". The latter situation causes dkim/dmarc failures and the dmarc
policy will be applied. see https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] planning upgrade from pacific to quincy

2023-11-16 Thread Simon Oosthoek
Hi All
(apologies if you get again, I suspect mails from my @science.ru.nl
account get dropped by most receiving mail servers, due to the strict
DMARC policy (p=reject) in place)

after a long while being in health_err state (due to an unfound
object, which we eventually decided to "forget"), we are now planning
to upgrade our cluster which is running Pacific (at least on the
mons/mdss/osds, the gateways are by accident running quincy already).
The installation is via packages from ceph.com, unless it's quincy
from ubuntu.

ceph versions:
"mon": {"ceph version 16.2.13
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"mgr": {"ceph version 16.2.13
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"osd": {"ceph version 16.2.13
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 252,
 "ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9)
pacific (stable)": 12 },
"mds": { "ceph version 16.2.13
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 2 },
"rgw": {"ceph version 17.2.6
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)": 8 },
"overall": {"ceph version 16.2.13
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 260,
"ceph version 16.2.14
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)": 12,
"ceph version 17.2.6
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)": 8 }

The OS on the mons and mdss are still ubuntu 18.04, the osds are a mix
of ubuntu 18 and ubuntu 20. The gateways are ubuntu 22.04, which is
why these are already on quincy.

The plan is to move to quincy and eventually cephadm/containered ceph,
since that is apparently "the way to go", though I have my doubts.

The steps we think are the right order are:
- reinstall the mons with ubuntu 22.04 + quincy
- reinstall the osds (same)
- reinstall the mdss (same)

Once this is up and running, we want to investigate and migrate to
cephadm orchestration.

Alternative appear to be: move to orchestration first and then upgrade
ceph to quincy (possibly skipping the ubuntu upgrade?)

Another alternative could be to upgrade to quincy on ubuntu 18.04
using packages, but I haven't investigated the availability of quincy
packages for ubuntu 18.04 (which is out of free (LTS) support by
canonical)

Cheers

/Simon


-- 
I'm using my gmail.com address, because the gmail.com dmarc policy is
"none", some mail servers will reject this (microsoft?) others will
instead allow this when I send mail to a mailling list which has not
yet been configured to send mail "on behalf of" the sender, but rather
do a kind of "forward". The latter situation causes dkim/dmarc
failures and the dmarc policy will be applied. see
https://wiki.list.org/DEV/DMARC for more details
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] planning upgrade from pacific to quincy

2023-11-15 Thread Simon Oosthoek

Hi All
(apologies if you get this twice, I suspect mails from my @science.ru.nl 
account get dropped by most receiving mail servers, due to the strict 
DMARC policies in place)


after a long while being in health_err state (due to an unfound object, 
which we eventually decided to "forget"), we are now planning to upgrade 
our cluster which is running Pacific (at least on the mons/mdss/osds, 
the gateways are by accident running quincy already). The installation 
is via packages from ceph.com, unless it's quincy from ubuntu.


ceph versions:
"mon": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"mgr": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"osd": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 252,
 "ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) 
pacific (stable)": 12 },
"mds": { "ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 2 },
"rgw": {"ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)": 8 },
"overall": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 260,
"ceph version 16.2.14 
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)": 12,
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) 
quincy (stable)": 8 }


The OS on the mons and mdss are still ubuntu 18.04, the osds are a mix 
of ubuntu 18 and ubuntu 20. The gateways are ubuntu 22.04, which is why 
these are already on quincy.


The plan is to move to quincy and eventually cephadm/containered ceph, 
since that is apparently "the way to go", though I have my doubts.


The steps we think are the right order are:
- reinstall the mons with ubuntu 22.04 + quincy
- reinstall the osds (same)
- reinstall the mdss (same)

Once this is up and running, we want to investigate and migrate to 
cephadm orchestration.


Alternative appear to be: move to orchestration first and then upgrade 
ceph to quincy (possibly skipping the ubuntu upgrade?)


Another alternative could be to upgrade to quincy on ubuntu 18.04 using 
packages, but I haven't investigated the availability of quincy packages 
for ubuntu 18.04 (which is out of free (LTS) support by canonical)


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] planning upgrade from pacific to quincy

2023-11-15 Thread Simon Oosthoek

Hi All

after a long while being in health_err state (due to an unfound object, 
which we eventually decided to "forget"), we are now planning to upgrade 
our cluster which is running Pacific (at least on the mons/mdss/osds, 
the gateways are by accident running quincy already). The installation 
is via packages from ceph.com, unless it's quincy from ubuntu.


ceph versions:
"mon": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"mgr": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 3},
"osd": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 252,
 "ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) 
pacific (stable)": 12 },
"mds": { "ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 2 },
"rgw": {"ceph version 17.2.6 
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)": 8 },
"overall": {"ceph version 16.2.13 
(5378749ba6be3a0868b51803968ee9cde4833a3e) pacific (stable)": 260,
"ceph version 16.2.14 
(238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)": 12,
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) 
quincy (stable)": 8 }


The OS on the mons and mdss are still ubuntu 18.04, the osds are a mix 
of ubuntu 18 and ubuntu 20. The gateways are ubuntu 22.04, which is why 
these are already on quincy.


The plan is to move to quincy and eventually cephadm/containered ceph, 
since that is apparently "the way to go", though I have my doubts.


The steps we think are the right order are:
- reinstall the mons with ubuntu 22.04 + quincy
- reinstall the osds (same)
- reinstall the mdss (same)

Once this is up and running, we want to investigate and migrate to 
cephadm orchestration.


Alternative appear to be: move to orchestration first and then upgrade 
ceph to quincy (possibly skipping the ubuntu upgrade?)


Another alternative could be to upgrade to quincy on ubuntu 18.04 using 
packages, but I haven't investigated the availability of quincy packages 
for ubuntu 18.04 (which is out of free (LTS) support by canonical)


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] compounded problems interfering with recovery

2023-10-08 Thread Simon Oosthoek

Hi

we're still struggling with our getting our ceph to health_ok. We're 
having compounded issues interfering with recovery, as I understand it.


To summarize, we have a cluster of 22 osd nodes running ceph 16.2.x. 
About a month back we had one of the OSDs break down (just the OS disk, 
but we didn't have a cold spare available, it took a week to get it 
fixed). Since the failure of the node, ceph has been repairing the 
situation of course, but then it became a problem that our OSDs are 
really unevenly balanced (lowest below 50%, highest around 85%). So 
whenever a disk fails (and there were 2 since then), the load spreads 
over the other OSDs and our fullest OSDs go over the 85% threshold, 
slowing down recovery, normal use and rebalancing.


We had issues with degraded PGs, but they weren't being repaired 
(because we had turned on the scrubbing during recovery, since we got 
messages that lots of PGs weren't being scrubbed in time.


Now there's still one remaining PG degraded because one object is 
unfound. The whole error state is taking far too long now and as this is 
going on, I was wondering how the balancer wasn't doing its job. Turns 
out this is dependent on the cluster being OK or at least not having any 
degraded things in it. The balancer hasn't done it's job even though our 
cluster was OK for a long time before; we added some 8 nodes a few years 
ago and still the newer nodes are having the lowest used OSDs.


Our cluster has about 70-71% usage overall, but with the unbalanced 
situation we cannot grow any more. The single node issue (though now 
resolved) and ongoing disk failures (we are seeing a handful of OSDs 
with read-repaired messages), it looks like we can't get back to health 
for a while.


I'm trying to mitigate this by reweighting the fullest OSDs, but the 
fuller OSDs keep going over the threshold, while the emptiest OSDs have 
plenty of space (just 55% full now).


If you read this far ;-) I'm wondering, can I force repair a PG around 
all the restrictions so it doesn't block auto rebalancing?


It seems to me, like that would help, but perhaps there are other things 
I can do as well?


(Budget wise, adding more OSD nodes is a bit difficult at the moment...)

Thanks for reading!

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

Hi Wesley,

On 06/10/2023 17:48, Wesley Dillingham wrote:
A repair is just a type of scrub and it is also limited by 
osd_max_scrubs which in pacific is 1.


We've increased that to 4 (and temporarily to 8) since we have so many 
OSDs and are running behind on scrubbing.





If another scrub is occurring on any OSD in the PG it wont start.


that explains a lot.



do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all 
scrubs to stop (a few seconds probably)


Then issue the pg repair command again. It may start.


The script Kai linked seems like a good idea to fix this when needed.



You also have pgs in backfilling state. Note that by default OSDs in 
backfill or backfill_wait also wont perform scrubs.


You can modify this behavior with `ceph config set osd 
osd_scrub_during_recovery true`


We've set this already



I would suggest only setting that after the noscub flags are set and the 
only scrub you want to get processed is your manual repair.


Then rm the scrub_during_recovery config item before unsetting the 
noscrub flags.


Thanks for the suggestion!

Cheers

/Simon





Respectfully,

*Wes Dillingham*
w...@wesdillingham.com <mailto:w...@wesdillingham.com>
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Oct 6, 2023 at 11:02 AM Simon Oosthoek <mailto:s.oosth...@science.ru.nl>> wrote:


On 06/10/2023 16:09, Simon Oosthoek wrote:
 > Hi
 >
 > we're still in HEALTH_ERR state with our cluster, this is the top
of the
 > output of `ceph health detail`
 >
 > HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors;
 > Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent;
 > Degraded data redundancy: 6/7118781559 objects degraded (0.000%),
1 pg
 > degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657
pgs not
 > scrubbed in time
 > [WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
 >      pg 26.323 has 1 unfound objects
 > [ERR] OSD_SCRUB_ERRORS: 248 scrub errors
 > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs
 > inconsistent
 >      pg 26.323 is active+recovery_unfound+degraded+remapped, acting
 > [92,109,116,70,158,128,243,189,256], 1 unfound
 >      pg 26.337 is active+clean+inconsistent, acting
 > [139,137,48,126,165,89,237,199,189]
 >      pg 26.3e2 is active+clean+inconsistent, acting
 > [12,27,24,234,195,173,98,32,35]
 > [WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects
 > degraded (0.000%), 1 pg degraded, 1 pg undersized
 >      pg 13.3a5 is stuck undersized for 4m, current state
 > active+undersized+remapped+backfilling, last acting
 > [2,45,32,62,2147483647,55,116,25,225,202,240]
 >      pg 26.323 is active+recovery_unfound+degraded+remapped, acting
 > [92,109,116,70,158,128,243,189,256], 1 unfound
 >
 >
 > For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc.,
 > however it fails to get resolved.
 >
 > The osd.116 is already marked out and is beginning to get empty.
I've
 > tried restarting the osd processes of the first osd listed for
each PG,
 > but that doesn't get it resolved either.
 >
 > I guess we should have enough redundancy to get the correct data
back,
 > but how can I tell ceph to fix it in order to get back to a
healthy state?

I guess this could be related to the number of scrubs going on, I read
somewhere that this may interfere with the repair request. I would
expect the repair would have priority over scrubs...

BTW, we're running pacific for now, we want to update when the cluster
is healthy again.

Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

On 06/10/2023 16:09, Simon Oosthoek wrote:

Hi

we're still in HEALTH_ERR state with our cluster, this is the top of the 
output of `ceph health detail`


HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors; 
Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent; 
Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg 
degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not 
scrubbed in time

[WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
     pg 26.323 has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 248 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs 
inconsistent
     pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound
     pg 26.337 is active+clean+inconsistent, acting 
[139,137,48,126,165,89,237,199,189]
     pg 26.3e2 is active+clean+inconsistent, acting 
[12,27,24,234,195,173,98,32,35]
[WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects 
degraded (0.000%), 1 pg degraded, 1 pg undersized
     pg 13.3a5 is stuck undersized for 4m, current state 
active+undersized+remapped+backfilling, last acting 
[2,45,32,62,2147483647,55,116,25,225,202,240]
     pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound



For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc., 
however it fails to get resolved.


The osd.116 is already marked out and is beginning to get empty. I've 
tried restarting the osd processes of the first osd listed for each PG, 
but that doesn't get it resolved either.


I guess we should have enough redundancy to get the correct data back, 
but how can I tell ceph to fix it in order to get back to a healthy state?


I guess this could be related to the number of scrubs going on, I read 
somewhere that this may interfere with the repair request. I would 
expect the repair would have priority over scrubs...


BTW, we're running pacific for now, we want to update when the cluster 
is healthy again.


Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cannot repair a handful of damaged pg's

2023-10-06 Thread Simon Oosthoek

Hi

we're still in HEALTH_ERR state with our cluster, this is the top of the 
output of `ceph health detail`


HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors; 
Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent; 
Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg 
degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not 
scrubbed in time

[WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
pg 26.323 has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 248 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs 
inconsistent
pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound
pg 26.337 is active+clean+inconsistent, acting 
[139,137,48,126,165,89,237,199,189]
pg 26.3e2 is active+clean+inconsistent, acting 
[12,27,24,234,195,173,98,32,35]
[WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects 
degraded (0.000%), 1 pg degraded, 1 pg undersized
pg 13.3a5 is stuck undersized for 4m, current state 
active+undersized+remapped+backfilling, last acting 
[2,45,32,62,2147483647,55,116,25,225,202,240]
pg 26.323 is active+recovery_unfound+degraded+remapped, acting 
[92,109,116,70,158,128,243,189,256], 1 unfound



For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc., 
however it fails to get resolved.


The osd.116 is already marked out and is beginning to get empty. I've 
tried restarting the osd processes of the first osd listed for each PG, 
but that doesn't get it resolved either.


I guess we should have enough redundancy to get the correct data back, 
but how can I tell ceph to fix it in order to get back to a healthy state?


Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd down doesn't seem to work

2023-10-03 Thread Simon Oosthoek

Hoi Josh,

thanks for the explanation, I want to mark it out, not down :-)

Most use of our cluster is in EC 8+3 or 5+4 pools, so one missing osd 
isn't bad, but if some of the blocks can still be read it may help to 
move them to safety. (This is how I imagine things anyway ;-)


I'll have to look into the manually correcting of those inconsistent PGs 
if they don't recover by ceph-magic alone...


Cheers

/Simon

On 03/10/2023 18:21, Josh Baergen wrote:

Hi Simon,

If the OSD is actually up, using 'ceph osd down` will cause it to flap
but come back immediately. To prevent this, you would want to 'ceph
osd set noup'. However, I don't think this is what you actually want:


I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD 
down+in, to try to read from it as long as possible


In this case, you actually want it up+out ('ceph osd out XXX'), though
if it's replicated then marking it out will switch primaries around so
that it's not actually read from anymore. It doesn't look like you
have that much recovery backfill left, so hopefully you'll be in a
clean state soon, though you'll have to deal with those 'inconsistent'
and 'recovery_unfound' PGs.

Josh

On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek  wrote:


Hi

I'm trying to mark one OSD as down, so we can clean it out and replace
it. It keeps getting medium read errors, so it's bound to fail sooner
rather than later. When I command ceph from the mon to mark the osd
down, it doesn't actually do it. When the service on the osd stops, it
is also marked out and I'm thinking (but perhaps incorrectly?) that it
would be good to keep the OSD down+in, to try to read from it as long as
possible. Why doesn't it get marked down and stay that way when I
command it?

Context: Our cluster is in a bit of a less optimal state (see below),
this is after one of OSD nodes had failed and took a week to get back up
(long story). Due to a seriously unbalanced filling of our OSDs we kept
having to reweight OSDs to keep below the 85% threshold. Several disks
are starting to fail now (they're 4+ years old and failures are expected
to occur more frequently).

I'm open to suggestions to help get us back to health_ok more quickly,
but I think we'll get there eventually anyway...

Cheers

/Simon


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph osd down doesn't seem to work

2023-10-03 Thread Simon Oosthoek

Hi

I'm trying to mark one OSD as down, so we can clean it out and replace 
it. It keeps getting medium read errors, so it's bound to fail sooner 
rather than later. When I command ceph from the mon to mark the osd 
down, it doesn't actually do it. When the service on the osd stops, it 
is also marked out and I'm thinking (but perhaps incorrectly?) that it 
would be good to keep the OSD down+in, to try to read from it as long as 
possible. Why doesn't it get marked down and stay that way when I 
command it?


Context: Our cluster is in a bit of a less optimal state (see below), 
this is after one of OSD nodes had failed and took a week to get back up 
(long story). Due to a seriously unbalanced filling of our OSDs we kept 
having to reweight OSDs to keep below the 85% threshold. Several disks 
are starting to fail now (they're 4+ years old and failures are expected 
to occur more frequently).


I'm open to suggestions to help get us back to health_ok more quickly, 
but I think we'll get there eventually anyway...


Cheers

/Simon



# ceph -s
  cluster:
health: HEALTH_ERR
1 clients failing to respond to cache pressure
1/843763422 objects unfound (0.000%)
noout flag(s) set
14 scrub errors
Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Degraded data redundancy: 13795525/7095598195 objects 
degraded (0.194%), 13 pgs degraded, 12 pgs undersized

70 pgs not deep-scrubbed in time
65 pgs not scrubbed in time

  services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h)
mgr: cephmon3(active, since 35h), standbys: cephmon1
mds: 1/1 daemons up, 1 standby
osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs
 flags noout
rgw: 8 daemons active (4 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   15 pools, 3681 pgs
objects: 843.76M objects, 1.2 PiB
usage:   2.0 PiB used, 847 TiB / 2.8 PiB avail
pgs: 13795525/7095598195 objects degraded (0.194%)
 54839263/7095598195 objects misplaced (0.773%)
 1/843763422 objects unfound (0.000%)
 3374 active+clean
 195  active+remapped+backfill_wait
 65   active+clean+scrubbing+deep
 20   active+remapped+backfilling
 11   active+clean+snaptrim
 10   active+undersized+degraded+remapped+backfill_wait
 2active+undersized+degraded+remapped+backfilling
 2active+clean+scrubbing
 1active+recovery_unfound+degraded
 1active+clean+inconsistent

  progress:
Global Recovery Event (8h)
  [==..] (remaining: 2h)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

2023-04-24 Thread Simon Oosthoek

Dear List

we upgraded to 16.2.12 on April 17th, since then we've seen some 
unexplained downed osd services in our cluster (264 osds), is there any 
risk of data loss, if so, would it be possible to downgrade or is a fix 
expected soon? if so, when? ;-)


FYI, we are running a cluster without cephadm, installed from packages.

Cheers

/Simon

On 23/04/2023 03:03, Yuri Weinstein wrote:

We are writing to inform you that Pacific v16.2.12, released on April
14th, has many unintended commits in the changelog than listed in the
release notes [1].

As these extra commits are not fully tested, we request that all users
please refrain from upgrading to v16.2.12 at this time. The current
v16.2.12 will be QE validated and released as soon as possible.

v16.2.12 was a hotfix release meant to resolve several performance
flaws in ceph-volume, particularly during osd activation. The extra
commits target v16.2.13.

We apologize for the inconvenience. Please reach out to the mailing
list with any questions.

[1] 
https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

On Fri, Apr 14, 2023 at 9:42 AM Yuri Weinstein  wrote:


We're happy to announce the 12th hot-fix release in the Pacific series.

https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$

Notable Changes
---
This is a hotfix release that resolves several performance flaws in ceph-volume,
particularly during osd activation 
(https://urldefense.com/v3/__https://tracker.ceph.com/issues/57627__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fg0yeu7U$
 )
Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at 
https://urldefense.com/v3/__https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fBEJl5p4$
* Containers at 
https://urldefense.com/v3/__https://quay.io/repository/ceph/ceph__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fc7HeSms$
* For packages, see 
https://urldefense.com/v3/__https://docs.ceph.com/en/latest/install/get-packages/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fAKdWZK4$
* Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] dashboard version of ceph versions shows N/A

2022-12-01 Thread Simon Oosthoek

Dear list

Yesterday we updated our ceph cluster from 15.2.17 to 16.2.10 using 
packages.


Our cluster is a mix of ubuntu 18 and ubuntu 20 with ceph coming from 
packages in the ceph.com repo. All went well and we now have all nodes 
running Pacific. However, there's something odd in the dashboard, 
because when I look at '/#/hosts', the dashboard shows 'N/A' for every 
column starting from "Model", the columns "Labels" and "Status" are empty.


We didn't change anything in the prometheus/grafana node, so I think 
this could be an issue, but I don't know if it's causing this particular 
problem.


NB, Ceph seems happy enough, it's just the dashboard not showing the 
versions anymore.


Does this ring a bell for anyone?

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to upgrade host os under ceph

2022-10-28 Thread Simon Oosthoek

Hi Anthony

On 27/10/2022 21:44, Anthony D'Atri wrote:

Another factor is “Do I *really* need to upgrade the OS?”


that's a good question, opinions vary on this I've noticed ;-)



If you have org-wide management/configuration that requires you to upgrade, 
that’s one thing, but presumably your hosts are not accessible from the 
outside, so do you have a compelling reason?  The “immutable infrastructure” 
folks may be on to something.  Upgrades always take a lot of engineer time and 
are when things tend to go wrong.



Obviously the ceph nodes are not publicly accessible, but we do like to 
keep the cluster as maintainable as possible by keeping things simple. 
Having an older, unsupported ubuntu version around is kind of a red 
flag, even though it could be fine to remain so. And of course there's 
the problem that we want to keep ceph not too far behind supported 
releases, and at some point (before the hardware is expiring) no new 
versions of ceph is available for the older unsupported ubuntu.


Furthermore, waiting until this happens is a recipe for having to 
re-invent the wheel, I believe we should get practice and comfortable 
doing this, so it's not such a looming big issue. Also useful to have in 
our fingers when e.g. the OS disk fails for some reason.


So that would be my reason to still want to upgrade, even though there 
may not be an urgent reason...


Cheers

/Simon


On Oct 27, 2022, at 03:16, Simon Oosthoek  wrote:

Dear list

thanks for the answers, it looks like we have worried about this far too much 
;-)

Cheers

/Simon

On 26/10/2022 22:21, shubjero wrote:

We've done 14.04 -> 16.04 -> 18.04 -> 20.04 all at various stages of our ceph 
cluster life.
The latest 18.04 to 20.04 was painless and we ran:
|apt update && apt dist-upgrade -y -o Dpkg::Options::=\"--force-confdef\" -o 
Dpkg::Options::=\"--force-confold\"|
|do||-release-upgrade --allow-third-party -f DistUpgradeViewNonInteractive|
|
|
On Wed, Oct 26, 2022 at 11:17 AM Reed Dier mailto:reed.d...@focusvq.com>> wrote:
You should be able to `do-release-upgrade` from bionic/18 to focal/20.
Octopus/15 is shipped for both dists from ceph.
Its been a while since I did this, the release upgrader might
disable the ceph repo, and uninstall the ceph* packages.
However, the OSDs should still be there, re-enable the ceph repo,
install ceph-osd, and then `ceph-volume lvm activate —all` should
find and start all of the OSDs.
Caveat, if you’re using cephadm, I’m sure the process is different.
And also, if you’re trying to go to jammy/22, thats a different
story, because ceph isn’t shipping packages for jammy yet for any
version of ceph.
I assume that they are going to ship quincy for jammy at some point,
which will give a stepping stone from focal to jammy with the quincy
release, because I don’t imagine that there will be a reef release
    for focal.
Reed
 > On Oct 26, 2022, at 9:14 AM, Simon Oosthoek
mailto:s.oosth...@science.ru.nl>> wrote:
 >
 > Dear list,
 >
 > I'm looking for some guide or pointers to how people upgrade the
underlying host OS in a ceph cluster (if this is the right way to
proceed, I don't even know...)
 >
 > Our cluster is nearing the 4.5 years of age and now our ubuntu
18.04 is nearing the end of support date. We have a mixed cluster of
u18 and u20 nodes, all running octopus at the moment.
 >
 > We would like to upgrade the OS on the nodes, without changing
the ceph version for now (or per se).
 >
 > Is it as easy as installing a new OS version, installing the
ceph-osd package and a correct ceph.conf file and restoring the host
key?
 >
 > Or is more needed regarding the specifics of the OSD
disks/WAL/journal?
 >
 > Or is it necessary to drain a node of all data and re-add the
OSDs as new units? (This would be too much work, so I doubt it ;-)
 >
 > The problem with searching for information about this, is that it
seems undocumented in the ceph documentation, and search results are
flooded with ceph version upgrades.
 >
 > Cheers
 >
 > /Simon
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
 > To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>
___
ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to upgrade host os under ceph

2022-10-27 Thread Simon Oosthoek

Dear list

thanks for the answers, it looks like we have worried about this far too 
much ;-)


Cheers

/Simon

On 26/10/2022 22:21, shubjero wrote:
We've done 14.04 -> 16.04 -> 18.04 -> 20.04 all at various stages of our 
ceph cluster life.


The latest 18.04 to 20.04 was painless and we ran:
|apt update && apt dist-upgrade -y -o 
Dpkg::Options::=\"--force-confdef\" -o Dpkg::Options::=\"--force-confold\"|

|do||-release-upgrade --allow-third-party -f DistUpgradeViewNonInteractive|
|
|

On Wed, Oct 26, 2022 at 11:17 AM Reed Dier <mailto:reed.d...@focusvq.com>> wrote:


You should be able to `do-release-upgrade` from bionic/18 to focal/20.

Octopus/15 is shipped for both dists from ceph.
Its been a while since I did this, the release upgrader might
disable the ceph repo, and uninstall the ceph* packages.
However, the OSDs should still be there, re-enable the ceph repo,
install ceph-osd, and then `ceph-volume lvm activate —all` should
find and start all of the OSDs.

Caveat, if you’re using cephadm, I’m sure the process is different.
And also, if you’re trying to go to jammy/22, thats a different
story, because ceph isn’t shipping packages for jammy yet for any
version of ceph.
I assume that they are going to ship quincy for jammy at some point,
which will give a stepping stone from focal to jammy with the quincy
release, because I don’t imagine that there will be a reef release
for focal.

    Reed

 > On Oct 26, 2022, at 9:14 AM, Simon Oosthoek
mailto:s.oosth...@science.ru.nl>> wrote:
 >
 > Dear list,
 >
 > I'm looking for some guide or pointers to how people upgrade the
underlying host OS in a ceph cluster (if this is the right way to
proceed, I don't even know...)
 >
 > Our cluster is nearing the 4.5 years of age and now our ubuntu
18.04 is nearing the end of support date. We have a mixed cluster of
u18 and u20 nodes, all running octopus at the moment.
 >
 > We would like to upgrade the OS on the nodes, without changing
the ceph version for now (or per se).
 >
 > Is it as easy as installing a new OS version, installing the
ceph-osd package and a correct ceph.conf file and restoring the host
key?
 >
 > Or is more needed regarding the specifics of the OSD
disks/WAL/journal?
 >
 > Or is it necessary to drain a node of all data and re-add the
OSDs as new units? (This would be too much work, so I doubt it ;-)
 >
 > The problem with searching for information about this, is that it
seems undocumented in the ceph documentation, and search results are
flooded with ceph version upgrades.
 >
 > Cheers
 >
 > /Simon
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
 > To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>

___
ceph-users mailing list -- ceph-users@ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to ceph-users-le...@ceph.io
<mailto:ceph-users-le...@ceph.io>



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] how to upgrade host os under ceph

2022-10-26 Thread Simon Oosthoek

Dear list,

I'm looking for some guide or pointers to how people upgrade the 
underlying host OS in a ceph cluster (if this is the right way to 
proceed, I don't even know...)


Our cluster is nearing the 4.5 years of age and now our ubuntu 18.04 is 
nearing the end of support date. We have a mixed cluster of u18 and u20 
nodes, all running octopus at the moment.


We would like to upgrade the OS on the nodes, without changing the ceph 
version for now (or per se).


Is it as easy as installing a new OS version, installing the ceph-osd 
package and a correct ceph.conf file and restoring the host key?


Or is more needed regarding the specifics of the OSD disks/WAL/journal?

Or is it necessary to drain a node of all data and re-add the OSDs as 
new units? (This would be too much work, so I doubt it ;-)


The problem with searching for information about this, is that it seems 
undocumented in the ceph documentation, and search results are flooded 
with ceph version upgrades.


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: post-mortem of a ceph disruption

2022-10-26 Thread Simon Oosthoek

On 26/10/2022 10:57, Stefan Kooman wrote:

On 10/25/22 17:08, Simon Oosthoek wrote:



At this point, one of noticed that a strange ip adress was mentioned; 
169.254.0.2, it turns out that a recently added package (openmanage) 
and some configuration had added this interface and address to 
hardware nodes from Dell. For us, our single interface assumption is 
now out the window and 0.0.0.0/0 is a bad idea in /etc/ceph/ceph.conf 
for public and cluster network (though it's the same network for us).


Our 3 datacenters are on three different subnets so it becomes a bit 
difficult to make it more specific. The nodes are all under the same 
/16, so we can choose that, but it is starting to look like a weird 
network setup.
I've always thought that this configuration was kind of non-intuitive 
and I still do. And now it has bitten us :-(



Thanks for reading and if you have any suggestions on how to 
fix/prevent this kind of error, we'll be glad to hear it!


We don't have the public_network specified in our cluster(s). AFAIK It's 
not needed (anymore). There is no default network address range 
configured. So I would just get rid of it. Same for cluster_network if 
you have that configured. There I fixed it! ;-).


Hi Stefan

thanks for the suggestions!

I've removed the cluster_network definition, but retained the 
public_network definition in a more specific way (list of the subnets 
that we are using for ceph nodes). In the code it isn't entirely clear 
to us what happens when public_network is undefined...




If you don't use IPv6, I would explicitly turn it off:

ms_bind_ipv6 = false


I just added this, it seems like a no brainer.

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] post-mortem of a ceph disruption

2022-10-25 Thread Simon Oosthoek

Dear list,

recently we experienced a short outage of our ceph storage, it was a 
surprising cause, and probably indicates a subtle misconfiguration on 
our part, I'm hoping for a useful suggestion ;-)


We are running a 3PB cluster with 21 osd nodes (spread across 3 
datacenters), 3 mon/mgrs and 2mds nodes. Currently we are on octopus 
15.2.16 (will upgrade to .17 soon).
The cluster has a single network interface (most are a bond) with 
25Gbit/s. The physical nodes are all Dell AMD EPYC hardware.


The "cluster network" and "public network" configurations in 
/etc/ceph/ceph.conf were all set to 0.0.0.0/0 since we only have a 
single interface for all Ceph nodes (or so we thought...)


Our nodes are managed using cfengine3 (community), though we avoid 
package upgrades during normal operation. New packages are installed 
though, if commanded by cfengine.


Last Sunday at around 23:05 (local time) we experienced a short network 
glitch (an MLAG link lost one sublink for 4 seconds)), our logs show 
that it should have been relatively painless, since the peer-link took 
over and after 4s the MLAG went back to FULL mode. However, it seems a 
lot of ceph-osd services restarted or re-connected to the network and 
failed to find the other ceph osd's. They consequently shut themselves 
down. Shortly after this happened, the ceph services became unavailable 
due to not enough osd nodes, so services of ours depending on ceph 
became unavailable as well.


At this point I was able to start trying to fix it, I tried rebooting a 
ceph osd machine and also tried restarting just the osd services on the 
nodes. Both seemed to work and I could soon turn in when all was well again.


When trying to understand what had happened, we obviously suspected all 
kinds of unrelated things (the ceph logs are way too noisy to quickly 
get to the point), but one thing "osd.54 662927 set_numa_affinity unable 
to identify public interface '' numa node: (2) No such file or 
directory" turned out to be more important than we first thought after 
some googling. 
(https://forum.proxmox.com/threads/ceph-set_numa_affinity-unable-to-identify-public-interface.58239/)


We couldn't understand why the network glitch could cause such a massive 
die-off of ceph-osd services.
In the assumption that sooner or later we were going to need some help 
with this, it seemed a good idea to first try to get busy updating the 
nodes to latest and then supported releases of ceph, so we started the 
upgrade to 15.2.17 today.


The upgrade of the 2 virtual and 1 physical mon went OK, also the first 
osd node was fine. But on the second osd node, the osd services would 
not keep running after the upgrade+reboot.


Again we noticed this numa message, but now 6 times in a row and then 
the nice: "_committed_osd_maps marked down 6 > osd_max_markdown_count 5 
in last 600.00 seconds, shutting down"

and
"received  signal: Interrupt from Kernel"

At this point, one of noticed that a strange ip adress was mentioned; 
169.254.0.2, it turns out that a recently added package (openmanage) and 
some configuration had added this interface and address to hardware 
nodes from Dell. For us, our single interface assumption is now out the 
window and 0.0.0.0/0 is a bad idea in /etc/ceph/ceph.conf for public and 
cluster network (though it's the same network for us).


Our 3 datacenters are on three different subnets so it becomes a bit 
difficult to make it more specific. The nodes are all under the same 
/16, so we can choose that, but it is starting to look like a weird 
network setup.
I've always thought that this configuration was kind of non-intuitive 
and I still do. And now it has bitten us :-(



Thanks for reading and if you have any suggestions on how to fix/prevent 
this kind of error, we'll be glad to hear it!


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] crush rule for 4 copy over 3 failure domains?

2021-12-17 Thread Simon Oosthoek

Dear ceph users,

Since recently we have 3 locations with ceph osd nodes, for 3 copy 
pools, it is trivial to create a crush rule that uses all 3 datacenters 
for each block, but 4 copy is harder. Our current "replicated" rule is this:


rule replicated_rule {
id 0
type replicated
min_size 2
max_size 4
step take default
step choose firstn 2 type datacenter
step chooseleaf firstn 2 type host
step emit
}

For 3 copy, the rule would be

rule replicated_rule_3copy {
id 5
type replicated
min_size 2
max_size 3
step take default
step choose firstn 3 type datacenter
step chooseleaf firstn 1 type host
step emit
}

But 4 copy requires an additional osd, so how to tell the crush 
algorithm to first take one from each datacenter and then take one more 
from any datacenter?


I'd be interested to know if this is possible and if so, how...

Having said that, I don't think there's much additional value for a 4 
copy pool, compared to a 3copy pool with 3 separate locations. Or is 
there (apart from the 1 more copy thing in general)?


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: crushtool -i; more info from output?

2021-12-09 Thread Simon Oosthoek
In case anyone is interested; I hacked up some more perl code to parse 
the tree output of crushtool to use the actual info from the new 
crushmap, instead of the production info from ceph itself.


See: https://gist.github.com/pooh22/53960df4744efd9d7e0261ff92e7e8f4

Cheers

/Simon

On 02/12/2021 13:23, Simon Oosthoek wrote:

On 02/12/2021 10:20, Simon Oosthoek wrote:

Dear ceph-users,

We want to optimise our crush rules further and to test adjustments 
without impact to the cluster, we use crushtool to show the mappings.


eg:
crushtool -i crushmap.16  --test --num-rep 4 --show-mappings --rule 
0|tail -n 10

CRUSH rule 0 x 1014 [121,125,195,197]
CRUSH rule 0 x 1015 [20,1,40,151]
CRUSH rule 0 x 1016 [194,244,158,3]
CRUSH rule 0 x 1017 [39,113,242,179]
CRUSH rule 0 x 1018 [131,113,199,179]
CRUSH rule 0 x 1019 [64,63,221,181]
CRUSH rule 0 x 1020 [26,111,188,179]
CRUSH rule 0 x 1021 [125,78,247,214]
CRUSH rule 0 x 1022 [48,125,246,258]
CRUSH rule 0 x 1023 [0,88,237,211]

The osd numbers in brackets are not the full story, of course...

It would be nice to see more info about the location hierarchy that is 
in the crushmap, because we want to make sure the redundancy is spread 
optimally accross our datacenters and racks/hosts.
In the current output, this requires lookups to find out the locations 
for the osds before we can be sure.


Since the info is already known in the crushmap, I was wondering if 
someone has already hacked up some wrapper script that looks up the 
locations of the osds, or if work is ongoing to add an option to 
crushtool to output the locations with the osd numbers?


If not, I might write a wrapper myself...



Dear list,

I created a very rudimentory parser; just pipe the output of the 
crushtool -i command to this script.


In the script you can either uncomment the full location tree info, or 
just the top level location.


The script is here:
https://gist.github.com/pooh22/5065d7c8777e6f07b0801d0b30c027d2

Please use as you like, I welcome comments and improvements of course...

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: crushtool -i; more info from output?

2021-12-02 Thread Simon Oosthoek

On 02/12/2021 10:20, Simon Oosthoek wrote:

Dear ceph-users,

We want to optimise our crush rules further and to test adjustments 
without impact to the cluster, we use crushtool to show the mappings.


eg:
crushtool -i crushmap.16  --test --num-rep 4 --show-mappings --rule 
0|tail -n 10

CRUSH rule 0 x 1014 [121,125,195,197]
CRUSH rule 0 x 1015 [20,1,40,151]
CRUSH rule 0 x 1016 [194,244,158,3]
CRUSH rule 0 x 1017 [39,113,242,179]
CRUSH rule 0 x 1018 [131,113,199,179]
CRUSH rule 0 x 1019 [64,63,221,181]
CRUSH rule 0 x 1020 [26,111,188,179]
CRUSH rule 0 x 1021 [125,78,247,214]
CRUSH rule 0 x 1022 [48,125,246,258]
CRUSH rule 0 x 1023 [0,88,237,211]

The osd numbers in brackets are not the full story, of course...

It would be nice to see more info about the location hierarchy that is 
in the crushmap, because we want to make sure the redundancy is spread 
optimally accross our datacenters and racks/hosts.
In the current output, this requires lookups to find out the locations 
for the osds before we can be sure.


Since the info is already known in the crushmap, I was wondering if 
someone has already hacked up some wrapper script that looks up the 
locations of the osds, or if work is ongoing to add an option to 
crushtool to output the locations with the osd numbers?


If not, I might write a wrapper myself...



Dear list,

I created a very rudimentory parser; just pipe the output of the 
crushtool -i command to this script.


In the script you can either uncomment the full location tree info, or 
just the top level location.


The script is here:
https://gist.github.com/pooh22/5065d7c8777e6f07b0801d0b30c027d2

Please use as you like, I welcome comments and improvements of course...

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] crushtool -i; more info from output?

2021-12-02 Thread Simon Oosthoek

Dear ceph-users,

We want to optimise our crush rules further and to test adjustments 
without impact to the cluster, we use crushtool to show the mappings.


eg:
crushtool -i crushmap.16  --test --num-rep 4 --show-mappings --rule 
0|tail -n 10

CRUSH rule 0 x 1014 [121,125,195,197]
CRUSH rule 0 x 1015 [20,1,40,151]
CRUSH rule 0 x 1016 [194,244,158,3]
CRUSH rule 0 x 1017 [39,113,242,179]
CRUSH rule 0 x 1018 [131,113,199,179]
CRUSH rule 0 x 1019 [64,63,221,181]
CRUSH rule 0 x 1020 [26,111,188,179]
CRUSH rule 0 x 1021 [125,78,247,214]
CRUSH rule 0 x 1022 [48,125,246,258]
CRUSH rule 0 x 1023 [0,88,237,211]

The osd numbers in brackets are not the full story, of course...

It would be nice to see more info about the location hierarchy that is 
in the crushmap, because we want to make sure the redundancy is spread 
optimally accross our datacenters and racks/hosts.
In the current output, this requires lookups to find out the locations 
for the osds before we can be sure.


Since the info is already known in the crushmap, I was wondering if 
someone has already hacked up some wrapper script that looks up the 
locations of the osds, or if work is ongoing to add an option to 
crushtool to output the locations with the osd numbers?


If not, I might write a wrapper myself...

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-ansible and crush location

2021-11-09 Thread Simon Oosthoek
On 03/11/2021 16:03, Simon Oosthoek wrote:
> On 03/11/2021 15:48, Stefan Kooman wrote:
>> On 11/3/21 15:35, Simon Oosthoek wrote:
>>> Dear list,
>>>
>>> I've recently found it is possible to supply ceph-ansible with
>>> information about a crush location, however I fail to understand how
>>> this is actually used. It doesn't seem to have any effect when create
>>> a cluster from scratch (I'm testing on a bunch of vm's generated by
>>> vagrant and cloud-init and some custom ansible playbooks).
>>>

It turns out (I think) that to be able to use this, you need both
"crush_rule_config: true" and "create_crush_tree: true"

Then it works as expected.

The unknown for now is what happens with existing crush rules and if
they would be removed, how we could define them in the osds.yml for
ceph-ansible...

Eventually I hope this can be a useful thing to enable, but perhaps not
for now.

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-ansible and crush location

2021-11-03 Thread Simon Oosthoek

On 03/11/2021 15:48, Stefan Kooman wrote:

On 11/3/21 15:35, Simon Oosthoek wrote:

Dear list,

I've recently found it is possible to supply ceph-ansible with 
information about a crush location, however I fail to understand how 
this is actually used. It doesn't seem to have any effect when create 
a cluster from scratch (I'm testing on a bunch of vm's generated by 
vagrant and cloud-init and some custom ansible playbooks).


Then I thought I may need to add the locations to the crushmap by hand 
and then rerun the site.yml, but this also doesn't update the crushmap.


Then I was looking at the documentation here:
https://docs.ceph.com/en/octopus/rados/operations/crush-map/#crush-location 



And it seems ceph is able to update the osd location upon startup, if 
configured to do so... I don't think this is being used in a cluster 
generated by ceph-ansible though...


osd_crush_update_on_start is true by default. So you would have to 
disable it explicitly.


OK, so this isn't happening, because there's no configuration for it in 
our nodes' /etc/ceph/ceph.conf files...






Would it be possible/wise to modify ceph-ansible to e.g. generate 
files like /etc/ceph/crushlocation and fill that with information from 
the inventory, like


Possible: yes. Wise: not sure. If you mess this up for whatever reason, 
and buckets / OSDs get reshuffled this might lead to massive data 
movement and possibly even worse, availability issues., i.e. when all 
your OSDs are moved to buckets that are are not matching any CRUSH rule.


Indeed, getting this wrong is a major PITA, but not having the OSDs in 
the correct location is also undesirable.


I prefer to document/configure everything in one place, so there aren't 
any contradicting data. In this light, I would say that ceph-ansible is 
the right way to set this up. (Now to figure out how and where ;-)


And of course, it's bothersome to maintain a patch on top of the stock 
ceph-ansible, so it would be really nice if this kind of change could be 
added upstream to ceph-ansible.


Cheers

/Simon

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-ansible and crush location

2021-11-03 Thread Simon Oosthoek

Dear list,

I've recently found it is possible to supply ceph-ansible with 
information about a crush location, however I fail to understand how 
this is actually used. It doesn't seem to have any effect when create a 
cluster from scratch (I'm testing on a bunch of vm's generated by 
vagrant and cloud-init and some custom ansible playbooks).


Then I thought I may need to add the locations to the crushmap by hand 
and then rerun the site.yml, but this also doesn't update the crushmap.


Then I was looking at the documentation here:
https://docs.ceph.com/en/octopus/rados/operations/crush-map/#crush-location

And it seems ceph is able to update the osd location upon startup, if 
configured to do so... I don't think this is being used in a cluster 
generated by ceph-ansible though...


Would it be possible/wise to modify ceph-ansible to e.g. generate files 
like /etc/ceph/crushlocation and fill that with information from the 
inventory, like


---
root=default
datacenter=d1
rack=r1
---

And place a small shell script to interpret this file and return the 
output like


---
#!/bin/sh
. /etc/ceph/crushlocation
echo "host=$(hostname -s) datacenter=$datacenter rack=$rack root=$root"
---

And configure /etc/ceph/ceph.conf to contain a configuration line (under 
which heading?) to do this?


---
crush location hook = /path/to/customized-ceph-crush-location
---

And finally, are locations automatically defined in the crushmap when 
they are invoked?


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-ansible stable-5.0 repository must be quincy?

2021-10-20 Thread Simon Oosthoek

Hi

we're trying to get ceph-ansible working again for our current version 
of ceph (octopus), in order to be able to add some osd nodes to our 
cluster. (Obviously there's a longer story here, but just a quick 
question for now...)


When we add in all.yml
ceph_origin: repository
ceph_repository: community
# Enabled when ceph_repository == 'community'
#
ceph_mirror: https://eu.ceph.com
ceph_stable_key: https://eu.ceph.com/keys/release.asc
ceph_stable_release: octopus
ceph_stable_repo: "{{ ceph_mirror }}/debian-{{ ceph_stable_release }}"

This fails with a message originating from

- name: validate ceph_repository_community
  fail:
msg: "ceph_stable_release must be 'quincy'"
  when:
- ceph_origin == 'repository'
- ceph_repository == 'community'
- ceph_stable_release not in ['quincy']

in: ceph-ansible/roles/ceph-validate/tasks/main.yml

This is from the "Stable-5.0" branch of ceph-ansible, which is 
specifically for Octopus, as I understand it...


Is this a bug in ceph-ansible in the stable-5.0 branch, or is this our 
problem in understanding what to put in all.yml to get the octopus 
repository for ubuntu 20.04?


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-04-09 Thread Simon Oosthoek
On 25/03/2021 21:08, Simon Oosthoek wrote:
> 
> I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19
> will be available quickly.
> 

Hi Dan,

Just FYI, I upgraded the cluster this week to 14.2.19 and all systems
are good now. I've removed the workaround configuration in the
/etc/ceph/ceph.conf again.

Thanks for the quick help at the time!

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-25 Thread Simon Oosthoek

On 25/03/2021 20:56, Stefan Kooman wrote:

On 3/25/21 8:47 PM, Simon Oosthoek wrote:

On 25/03/2021 20:42, Dan van der Ster wrote:

netstat -anp | grep LISTEN | grep mgr


# netstat -anp | grep LISTEN | grep mgr
tcp    0  0 127.0.0.1:6801  0.0.0.0:* LISTEN 
1310/ceph-mgr
tcp    0  0 127.0.0.1:6800  0.0.0.0:* LISTEN 
1310/ceph-mgr
tcp6   0  0 :::8443 :::* LISTEN  
1310/ceph-mgr
tcp6   0  0 :::9283 :::* LISTEN  
1310/ceph-mgr
unix  2  [ ACC ] STREAM LISTENING 26205    1564/master 
 private/tlsmgr
unix  2  [ ACC ] STREAM LISTENING 26410
1310/ceph-mgr     /var/run/ceph/ceph-mgr.cephmon1.asok



Looks like :-(


Ok, but that is easily fixable:

ceph config set osd.$id public_addr your_ip_here

Or you can put that in the ceph.conf for the OSDs on each storage server.

Do you have a cluster network as well? If so you should set that IP too.

Only when you run IPv6 only and have not yet set ms_bind_ipv4=false you 
should not do this. In that case you first have to make sure you set 
ms_bind_ipv4=false.


As soon as your OSDs are bound to their correct IP again they can peer 
with each other and it will fix itself.



@Ceph devs: a 14.2.19 with a fix for this issue would avoid other people 
running into this issue.


Gr. Stefan



Hoi Stefan

tnx, I only have one network (25Gbit should be enough), after fixing the 
mon/mgr nodes and the one OSD node that I upgraded, the cluster seems to 
be recovering.


At first I understood Dan's fix to put the mgr's address in all nodes' 
configs, but after watching the errors, I changed it to the node's own 
address on each node...


I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19 
will be available quickly.


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-25 Thread Simon Oosthoek

On 25/03/2021 20:42, Dan van der Ster wrote:

netstat -anp | grep LISTEN | grep mgr

has it bound to 127.0.0.1 ?

(also check the other daemons).

If so this is another case of https://tracker.ceph.com/issues/49938


Do you have any idea for a workaround (or should I downgrade?). I'm 
running ceph on ubuntu 18.04 LTS


this seems to be happening on the mons/mgrs and osds

Cheers

/Simon


-- dan

On Thu, Mar 25, 2021 at 8:34 PM Simon Oosthoek  wrote:


Hi

I'm in a bit of a panic :-(

Recently we started attempting to configure a radosgw to our ceph
cluster, which was until now only doing cephfs (and rbd wss working as
well). We were messing about with ceph-ansible, as this was how we
originally installed the cluster. Anyway, it installed nautilus 14.2.18
on the radosgw and I though it would be good to pull up the rest of the
cluster to that level as well using our tried and tested ceph upgrade
script (it basically does an update of all ceph nodes one by one and
checks whether ceph is ok again before doing the next)

After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...

The message deceptively is: HEALTH_WARN Reduced data availability: 5568
pgs inactive

That's all PGs!

I tried as a desperate measure to upgrade one ceph OSD node, but that
broke as well, the osd service on that node gets an interrupt from the
kernel

the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
  "mon": {
  "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
  },
  "mgr": {
  "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
  },
  "osd": {
  "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
  },
  "mds": {
  "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
  },
  "overall": {
  "ceph version 14.2.15
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
  "ceph version 14.2.18
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
  }
}


12 OSDs are down

# ceph -s
cluster:
  id: b489547c-ba50-4745-a914-23eb78e0e5dc
  health: HEALTH_WARN
  Reduced data availability: 5568 pgs inactive

services:
  mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
  mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
  mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
  osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722
remapped pgs

data:
  pools:   12 pools, 5568 pgs
  objects: 0 objects, 0 B
  usage:   0 B used, 0 B / 0 B avail
  pgs: 100.000% pgs unknown
   5568 unknown

progress:
  Rebalancing after osd.103 marked in
[..]


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-25 Thread Simon Oosthoek

On 25/03/2021 20:42, Dan van der Ster wrote:

netstat -anp | grep LISTEN | grep mgr


# netstat -anp | grep LISTEN | grep mgr
tcp0  0 127.0.0.1:6801  0.0.0.0:* 
LISTEN  1310/ceph-mgr
tcp0  0 127.0.0.1:6800  0.0.0.0:* 
LISTEN  1310/ceph-mgr
tcp6   0  0 :::8443 :::* 
LISTEN  1310/ceph-mgr
tcp6   0  0 :::9283 :::* 
LISTEN  1310/ceph-mgr
unix  2  [ ACC ] STREAM LISTENING 262051564/master 
private/tlsmgr
unix  2  [ ACC ] STREAM LISTENING 264101310/ceph-mgr 
   /var/run/ceph/ceph-mgr.cephmon1.asok



Looks like :-(

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-25 Thread Simon Oosthoek

Hi

I'm in a bit of a panic :-(

Recently we started attempting to configure a radosgw to our ceph 
cluster, which was until now only doing cephfs (and rbd wss working as 
well). We were messing about with ceph-ansible, as this was how we 
originally installed the cluster. Anyway, it installed nautilus 14.2.18 
on the radosgw and I though it would be good to pull up the rest of the 
cluster to that level as well using our tried and tested ceph upgrade 
script (it basically does an update of all ceph nodes one by one and 
checks whether ceph is ok again before doing the next)


After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...

The message deceptively is: HEALTH_WARN Reduced data availability: 5568 
pgs inactive


That's all PGs!

I tried as a desperate measure to upgrade one ceph OSD node, but that 
broke as well, the osd service on that node gets an interrupt from the 
kernel


the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
"mon": {
"ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3

},
"mgr": {
"ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3

},
"osd": {
"ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156

},
"mds": {
"ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2

},
"overall": {
"ceph version 14.2.15 
(afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158,
"ceph version 14.2.18 
(befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6

}
}


12 OSDs are down

# ceph -s
  cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
Reduced data availability: 5568 pgs inactive

  services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 
remapped pgs


  data:
pools:   12 pools, 5568 pgs
objects: 0 objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
 5568 unknown

  progress:
Rebalancing after osd.103 marked in
  [..]


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-25 Thread Simon Oosthoek
On 25/02/2021 11:19, Dylan McCulloch wrote:
> Simon Oosthoek wrote:
>> On 24/02/2021 22:28, Patrick Donnelly wrote:
>> >   Hello Simon,
>> >  
>> >  On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek
> <s.oosthoek(a)science.ru.nl> wrote:
>> >  
>> >  On 24/02/2021 12:40, Simon Oosthoek wrote:
>> >   Hi
>> >
>> >  we've been running our Ceph cluster for nearly 2 years now (Nautilus)
>> >  and recently, due to a temporary situation the cluster is at 80% full.
>> >
>> >  We are only using CephFS on the cluster.
>> >
>> >  Normally, I realize we should be adding OSD nodes, but this is a
>> >  temporary situation, and I expect the cluster to go to <60% full
> quite soon.
>> >
>> >  Anyway, we are noticing some really problematic slowdowns. There are
>> >  some things that could be related but we are unsure...
>> >
>> >  - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
>> >  but are not using more than 2GB, this looks either very inefficient, or
>> >  wrong ;-)
>> >  After looking at our monitoring history, it seems the mds cache is
>> >  actually used more fully, but most of our servers are getting a weekly
>> >  reboot by default. This clears the mds cache obviously. I wonder if
>> >  that's a smart idea for an MDS node...? ;-)  
>> >  No, it's not. Can you also check that you do not have mds_cache_size
>> >  configured, perhaps on the MDS local ceph.conf?
>> >  
>> Hi Patrick,
>>
>> I've already changed the reboot period to 1 month.
>>
>> The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
>> file, so I guess it's just the weekly reboot that cleared the memory of
>> cache data...
>>
>> I'm starting to think that a full ceph cluster could probably be the
>> only cause of performance problems. Though I don't know why that would be.
> 
> Did the performance issue only arise when OSDs in the cluster reached
> 80% usage? What is your osd nearfull_ratio?
> $ ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85


> Is the cluster in HEALTH_WARN with nearfull OSDs?

]# ceph -s
  cluster:
id: b489547c-ba50-4745-a914-23eb78e0e5dc
health: HEALTH_WARN
2 pgs not deep-scrubbed in time
957 pgs not scrubbed in time

  services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 7d)
mgr: cephmon3(active, since 2M), standbys: cephmon1, cephmon2
mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 168 up (since 11w), 168 in (since 9M); 43 remapped pgs

  task status:
scrub status:
mds.cephmds2: idle

  data:
pools:   10 pools, 5280 pgs
objects: 587.71M objects, 804 TiB
usage:   1.4 PiB used, 396 TiB / 1.8 PiB avail
pgs: 9634168/5101965463 objects misplaced (0.189%)
 5232 active+clean
 29   active+remapped+backfill_wait
 14   active+remapped+backfilling
 5active+clean+scrubbing+deep+repair

  io:
client:   136 MiB/s rd, 600 MiB/s wr, 386 op/s rd, 359 op/s wr
recovery: 328 MiB/s, 169 objects/s

> We noticed recently when one of our clusters had nearfull OSDs that
> cephfs client performance was heavily impacted.
> Our cluster is nautilus 14.2.15. Clients are kernel 4.19.154.
> We determined that it was most likely due to the ceph client forcing
> sync file writes when nearfull flag is present.
> https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f31444fce
> Increasing and decreasing the nearfull ratio confirmed that performance
> was only impacted while the nearfull flag was present.
> Not sure if that's relevant for your case.

I think this could be very similar in our cluster, thanks for sharing
your insights!

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-25 Thread Simon Oosthoek
On 24/02/2021 22:28, Patrick Donnelly wrote:
> Hello Simon,
> 
> On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek  
> wrote:
>>
>> On 24/02/2021 12:40, Simon Oosthoek wrote:
>>> Hi
>>>
>>> we've been running our Ceph cluster for nearly 2 years now (Nautilus)
>>> and recently, due to a temporary situation the cluster is at 80% full.
>>>
>>> We are only using CephFS on the cluster.
>>>
>>> Normally, I realize we should be adding OSD nodes, but this is a
>>> temporary situation, and I expect the cluster to go to <60% full quite soon.
>>>
>>> Anyway, we are noticing some really problematic slowdowns. There are
>>> some things that could be related but we are unsure...
>>>
>>> - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
>>> but are not using more than 2GB, this looks either very inefficient, or
>>> wrong ;-)
>>
>> After looking at our monitoring history, it seems the mds cache is
>> actually used more fully, but most of our servers are getting a weekly
>> reboot by default. This clears the mds cache obviously. I wonder if
>> that's a smart idea for an MDS node...? ;-)
> 
> No, it's not. Can you also check that you do not have mds_cache_size
> configured, perhaps on the MDS local ceph.conf?
> 

Hi Patrick,

I've already changed the reboot period to 1 month.

The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
file, so I guess it's just the weekly reboot that cleared the memory of
cache data...

I'm starting to think that a full ceph cluster could probably be the
only cause of performance problems. Though I don't know why that would be.

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph slow at 80% full, mds nodes lots of unused memory

2021-02-24 Thread Simon Oosthoek
On 24/02/2021 12:40, Simon Oosthoek wrote:
> Hi
> 
> we've been running our Ceph cluster for nearly 2 years now (Nautilus)
> and recently, due to a temporary situation the cluster is at 80% full.
> 
> We are only using CephFS on the cluster.
> 
> Normally, I realize we should be adding OSD nodes, but this is a
> temporary situation, and I expect the cluster to go to <60% full quite soon.
> 
> Anyway, we are noticing some really problematic slowdowns. There are
> some things that could be related but we are unsure...
> 
> - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
> but are not using more than 2GB, this looks either very inefficient, or
> wrong ;-)

After looking at our monitoring history, it seems the mds cache is
actually used more fully, but most of our servers are getting a weekly
reboot by default. This clears the mds cache obviously. I wonder if
that's a smart idea for an MDS node...? ;-)

> 
> "ceph config dump |grep mds":
>   mdsbasicmds_cache_memory_limit
> 107374182400
>   mdsadvanced mds_max_scrub_ops_in_progress   10
> 
> Perhaps we require more or different settings to properly use the MDS
> memory?
> 
> - On all our OSD nodes, the memory line is red in "atop", though no swap
> is in use, it seems the memory on the OSD nodes is taking quite a
> beating, is this normal, or can we tweak settings to make it less stressed?
> 
> This is the first time we are having performance issues like this, I
> think, I'd like to learn some commands to help me analyse this...
> 
> I hope this will ring a bell with someone...
> 
> Cheers
> 
> /Simon
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph slow at 80% full, mds nodes lots of unused memory

2021-02-24 Thread Simon Oosthoek
Hi

we've been running our Ceph cluster for nearly 2 years now (Nautilus)
and recently, due to a temporary situation the cluster is at 80% full.

We are only using CephFS on the cluster.

Normally, I realize we should be adding OSD nodes, but this is a
temporary situation, and I expect the cluster to go to <60% full quite soon.

Anyway, we are noticing some really problematic slowdowns. There are
some things that could be related but we are unsure...

- Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
but are not using more than 2GB, this looks either very inefficient, or
wrong ;-)

"ceph config dump |grep mds":
  mdsbasicmds_cache_memory_limit
107374182400
  mdsadvanced mds_max_scrub_ops_in_progress   10

Perhaps we require more or different settings to properly use the MDS
memory?

- On all our OSD nodes, the memory line is red in "atop", though no swap
is in use, it seems the memory on the OSD nodes is taking quite a
beating, is this normal, or can we tweak settings to make it less stressed?

This is the first time we are having performance issues like this, I
think, I'd like to learn some commands to help me analyse this...

I hope this will ring a bell with someone...

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: BlueFS spillover detected, why, what?

2020-08-20 Thread Simon Oosthoek

Hi Michael,

thanks for the pointers! This is our first production ceph cluster and 
we have to learn as we go... Small files is always a problem for all 
(networked) filesystems, usually it just trashes performance, but in 
this case it has another unfortunate side effect with the rocksdb :-(


Cheers

/Simon

On 20/08/2020 11:06, Michael Bisig wrote:

Hi Simon

Unfortunately, the other NVME space is wasted or at least, this is the 
information we gathered during our research. This fact is due to the RocksDB 
level management which is explained here 
(https://github.com/facebook/rocksdb/wiki/Leveled-Compaction). I don't think 
it's a hard limit but it will be something above these values. Also consult 
this thread 
(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-February/033286.html).
 It's probably better to go a bit over these limits to be on the safe side.

Exactly, reality is always different. We also struggle with small files which 
lead to further problems. Accordingly, the right initial setting is pretty 
important and depends on your individual usecase.

Regards,
Michael

On 20.08.20, 10:40, "Simon Oosthoek"  wrote:

 Hi Michael,

 thanks for the explanation! So if I understand correctly, we waste 93 GB
 per OSD on unused NVME space, because only 30GB is actually used...?

 And to improve the space for rocksdb, we need to plan for 300GB per
 rocksdb partition in order to benefit from this advantage

 Reducing the number of small files is something we always ask of our
 users, but reality is what it is ;-)

 I'll have to look into how I can get an informative view on these
 metrics... It's pretty overwhelming the amount of information coming out
 of the ceph cluster, even when you look only superficially...

 Cheers,

 /Simon

 On 20/08/2020 10:16, Michael Bisig wrote:
 > Hi Simon
 >
 > As far as I know, RocksDB only uses "leveled" space on the NVME 
partition. The values are set to be 300MB, 3GB, 30GB and 300GB. Every DB space above such a 
limit will automatically end up on slow devices.
 > In your setup where you have 123GB per OSD that means you only use 30GB 
of fast device. The DB which spills over this limit will be offloaded to the HDD 
and accordingly, it slows down requests and compactions.
 >
 > You can proof what your OSD currently consumes with:
 >ceph daemon osd.X perf dump
 >
 > Informative values are `db_total_bytes`, `db_used_bytes` and 
`slow_used_bytes`. This changes regularly because of the ongoing compactions but 
Prometheus mgr module exports these values such that you can track it.
 >
 > Small files generally leads to bigger RocksDB, especially when you use 
EC, but this depends on the actual amount and file sizes.
 >
 > I hope this helps.
 > Regards,
 > Michael
 >
 > On 20.08.20, 09:10, "Simon Oosthoek"  wrote:
 >
 >  Hi
 >
 >  Recently our ceph cluster (nautilus) is experiencing bluefs 
spillovers,
 >  just 2 osd's and I disabled the warning for these osds.
 >  (ceph config set osd.125 bluestore_warn_on_bluefs_spillover false)
 >
 >  I'm wondering what causes this and how this can be prevented.
 >
 >  As I understand it the rocksdb for the OSD needs to store more than 
fits
 >  on the NVME logical volume (123G for 12T OSD). A way to fix it 
could be
 >  to increase the logical volume on the nvme (if there was space on 
the
 >  nvme, which there isn't at the moment).
 >
 >  This is the current size of the cluster and how much is free:
 >
 >  [root@cephmon1 ~]# ceph df
 >  RAW STORAGE:
 >   CLASS SIZEAVAIL   USEDRAW USED 
%RAW USED
 >   hdd   1.8 PiB 842 TiB 974 TiB  974 TiB 
53.63
 >   TOTAL 1.8 PiB 842 TiB 974 TiB  974 TiB 
53.63
 >
 >  POOLS:
 >   POOLID STORED  OBJECTS USED
 >  %USED MAX AVAIL
 >   cephfs_data  1 572 MiB 121.26M 2.4 GiB
 >  0   167 TiB
 >   cephfs_metadata  2  56 GiB   5.15M  57 GiB
 >  0   167 TiB
 >   cephfs_data_3copy8 201 GiB  51.68k 602 GiB
 >  0.09   222 TiB
 >   cephfs_data_ec8313 643 TiB 279.75M 953 TiB
 >  58.86   485 TiB
 >   rbd 14  21 GiB   5.66k  64 GiB
 >  0   222 TiB
 >   .rgw.root   15 1.2 KiB   4   

[ceph-users] Re: BlueFS spillover detected, why, what?

2020-08-20 Thread Simon Oosthoek

Hi Michael,

thanks for the explanation! So if I understand correctly, we waste 93 GB 
per OSD on unused NVME space, because only 30GB is actually used...?


And to improve the space for rocksdb, we need to plan for 300GB per 
rocksdb partition in order to benefit from this advantage


Reducing the number of small files is something we always ask of our 
users, but reality is what it is ;-)


I'll have to look into how I can get an informative view on these 
metrics... It's pretty overwhelming the amount of information coming out 
of the ceph cluster, even when you look only superficially...


Cheers,

/Simon

On 20/08/2020 10:16, Michael Bisig wrote:

Hi Simon

As far as I know, RocksDB only uses "leveled" space on the NVME partition. The 
values are set to be 300MB, 3GB, 30GB and 300GB. Every DB space above such a limit will 
automatically end up on slow devices.
In your setup where you have 123GB per OSD that means you only use 30GB of fast 
device. The DB which spills over this limit will be offloaded to the HDD and 
accordingly, it slows down requests and compactions.

You can proof what your OSD currently consumes with:
   ceph daemon osd.X perf dump

Informative values are `db_total_bytes`, `db_used_bytes` and `slow_used_bytes`. 
This changes regularly because of the ongoing compactions but Prometheus mgr 
module exports these values such that you can track it.

Small files generally leads to bigger RocksDB, especially when you use EC, but 
this depends on the actual amount and file sizes.

I hope this helps.
Regards,
Michael

On 20.08.20, 09:10, "Simon Oosthoek"  wrote:

 Hi

 Recently our ceph cluster (nautilus) is experiencing bluefs spillovers,
 just 2 osd's and I disabled the warning for these osds.
 (ceph config set osd.125 bluestore_warn_on_bluefs_spillover false)

 I'm wondering what causes this and how this can be prevented.

 As I understand it the rocksdb for the OSD needs to store more than fits
 on the NVME logical volume (123G for 12T OSD). A way to fix it could be
 to increase the logical volume on the nvme (if there was space on the
 nvme, which there isn't at the moment).

 This is the current size of the cluster and how much is free:

 [root@cephmon1 ~]# ceph df
 RAW STORAGE:
  CLASS SIZEAVAIL   USEDRAW USED %RAW USED
  hdd   1.8 PiB 842 TiB 974 TiB  974 TiB 53.63
  TOTAL 1.8 PiB 842 TiB 974 TiB  974 TiB 53.63

 POOLS:
  POOLID STORED  OBJECTS USED
 %USED MAX AVAIL
  cephfs_data  1 572 MiB 121.26M 2.4 GiB
 0   167 TiB
  cephfs_metadata  2  56 GiB   5.15M  57 GiB
 0   167 TiB
  cephfs_data_3copy8 201 GiB  51.68k 602 GiB
 0.09   222 TiB
  cephfs_data_ec8313 643 TiB 279.75M 953 TiB
 58.86   485 TiB
  rbd 14  21 GiB   5.66k  64 GiB
 0   222 TiB
  .rgw.root   15 1.2 KiB   4   1 MiB
 0   167 TiB
  default.rgw.control 16 0 B   8 0 B
 0   167 TiB
  default.rgw.meta17   765 B   4   1 MiB
 0   167 TiB
  default.rgw.log 18 0 B 207 0 B
 0   167 TiB
  cephfs_data_ec5720 433 MiB 230 1.2 GiB
 0   278 TiB

 The amount used can still grow a bit before we need to add nodes, but
 apparently we are running into the limits of our rocskdb partitions.

 Did we choose a parameter (e.g. minimal object size) too small, so we
 have too much objects on these spillover OSDs? Or is it that too many
 small files are stored on the cephfs filesystems?

 When we expand the cluster, we can choose larger nvme devices to allow
 larger rocksdb partitions, but is that the right way to deal with this,
 or should we adjust some parameters on the cluster that will reduce the
 rocksdb size?

 Cheers

 /Simon
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] BlueFS spillover detected, why, what?

2020-08-20 Thread Simon Oosthoek

Hi

Recently our ceph cluster (nautilus) is experiencing bluefs spillovers, 
just 2 osd's and I disabled the warning for these osds.

(ceph config set osd.125 bluestore_warn_on_bluefs_spillover false)

I'm wondering what causes this and how this can be prevented.

As I understand it the rocksdb for the OSD needs to store more than fits 
on the NVME logical volume (123G for 12T OSD). A way to fix it could be 
to increase the logical volume on the nvme (if there was space on the 
nvme, which there isn't at the moment).


This is the current size of the cluster and how much is free:

[root@cephmon1 ~]# ceph df
RAW STORAGE:
CLASS SIZEAVAIL   USEDRAW USED %RAW USED
hdd   1.8 PiB 842 TiB 974 TiB  974 TiB 53.63
TOTAL 1.8 PiB 842 TiB 974 TiB  974 TiB 53.63

POOLS:
POOLID STORED  OBJECTS USED 
%USED MAX AVAIL
cephfs_data  1 572 MiB 121.26M 2.4 GiB 
   0   167 TiB
cephfs_metadata  2  56 GiB   5.15M  57 GiB 
   0   167 TiB
cephfs_data_3copy8 201 GiB  51.68k 602 GiB 
0.09   222 TiB
cephfs_data_ec8313 643 TiB 279.75M 953 TiB 
58.86   485 TiB
rbd 14  21 GiB   5.66k  64 GiB 
   0   222 TiB
.rgw.root   15 1.2 KiB   4   1 MiB 
   0   167 TiB
default.rgw.control 16 0 B   8 0 B 
   0   167 TiB
default.rgw.meta17   765 B   4   1 MiB 
   0   167 TiB
default.rgw.log 18 0 B 207 0 B 
   0   167 TiB
cephfs_data_ec5720 433 MiB 230 1.2 GiB 
   0   278 TiB


The amount used can still grow a bit before we need to add nodes, but 
apparently we are running into the limits of our rocskdb partitions.


Did we choose a parameter (e.g. minimal object size) too small, so we 
have too much objects on these spillover OSDs? Or is it that too many 
small files are stored on the cephfs filesystems?


When we expand the cluster, we can choose larger nvme devices to allow 
larger rocksdb partitions, but is that the right way to deal with this, 
or should we adjust some parameters on the cluster that will reduce the 
rocksdb size?


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Combining erasure coding and replication?

2020-03-27 Thread Simon Oosthoek
On 27/03/2020 09:56, Eugen Block wrote:
> Hi,
> 
>> I guess what you are suggesting is something like k+m with m>=k+2, for
>> example k=4, m=6. Then, one can distribute 5 shards per DC and sustain
>> the loss of an entire DC while still having full access to redundant
>> storage.
> 
> that's exactly what I mean, yes.

We have an EC pool of 5+7, which works that way. Currently we have no
demand for it, but it should do the job.

Cheers

/Simon

> 
>> Now, a long time ago I was in a lecture about error-correcting codes
>> (Reed-Solomon codes). From what I remember, the computational
>> complexity of these codes explodes at least exponentially with m. Out
>> of curiosity, how does m>3 perform in practice? What's the CPU
>> requirement per OSD?
> 
> Such a setup usually would be considered for archiving purposes so the
> performance requirements aren't very high, but so far we haven't heard
> any complaints performance-wise.
> I don't have details on CPU requirements at hand right now.
> 
> Regards,
> Eugen
> 
> 
> Zitat von Frank Schilder :
> 
>> Dear Eugen,
>>
>> I guess what you are suggesting is something like k+m with m>=k+2, for
>> example k=4, m=6. Then, one can distribute 5 shards per DC and sustain
>> the loss of an entire DC while still having full access to redundant
>> storage.
>>
>> Now, a long time ago I was in a lecture about error-correcting codes
>> (Reed-Solomon codes). From what I remember, the computational
>> complexity of these codes explodes at least exponentially with m. Out
>> of curiosity, how does m>3 perform in practice? What's the CPU
>> requirement per OSD?
>>
>> Best regards,
>>
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Eugen Block 
>> Sent: 27 March 2020 08:33:45
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] Re: Combining erasure coding and replication?
>>
>> Hi Brett,
>>
>>> Our concern with Ceph is the cost of having three replicas. Storage
>>> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
>>> if there are ways to do this more efficiently. Site-level redundancy
>>> is important to us so we can’t simply create an erasure-coded volume
>>> across two buildings – if we lose power to a building, the entire
>>> array would become unavailable.
>>
>> can you elaborate on that? Why is EC not an option? We have installed
>> several clusters with two datacenters resilient to losing a whole dc
>> (and additional disks if required). So it's basically the choice of
>> the right EC profile. Or did I misunderstand something?
>>
>>
>> Zitat von Brett Randall :
>>
>>> Hi all
>>>
>>> Had a fun time trying to join this list, hopefully you don’t get
>>> this message 3 times!
>>>
>>> On to Ceph… We are looking at setting up our first ever Ceph cluster
>>> to replace Gluster as our media asset storage and production system.
>>> The Ceph cluster will have 5pb of usable storage. Whether we use it
>>> as object-storage, or put CephFS in front of it, is still TBD.
>>>
>>> Obviously we’re keen to protect this data well. Our current Gluster
>>> setup utilises RAID-6 on each of the nodes and then we have a single
>>> replica of each brick. The Gluster bricks are split between
>>> buildings so that the replica is guaranteed to be in another
>>> premises. By doing it this way, we guarantee that we can have a
>>> decent number of disk or node failures (even an entire building)
>>> before we lose both connectivity and data.
>>>
>>> Our concern with Ceph is the cost of having three replicas. Storage
>>> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
>>> if there are ways to do this more efficiently. Site-level redundancy
>>> is important to us so we can’t simply create an erasure-coded volume
>>> across two buildings – if we lose power to a building, the entire
>>> array would become unavailable. Likewise, we can’t simply have a
>>> single replica – our fault tolerance would drop way down on what it
>>> is right now.
>>>
>>> Is there a way to use both erasure coding AND replication at the
>>> same time in Ceph to mimic the architecture we currently have in
>>> Gluster? I know we COULD just create RAID6 volumes on each node and
>>> use the entire volume as a single OSD, but that this is not the
>>> recommended way to use Ceph. So is there some other way?
>>>
>>> Apologies if this is a nonsensical question, I’m still trying to
>>> wrap my head around Ceph, CRUSH maps, placement rules, volume types,
>>> etc etc!
>>>
>>> TIA
>>>
>>> Brett
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To u

[ceph-users] Re: v15.2.0 Octopus released

2020-03-25 Thread Simon Oosthoek
On 25/03/2020 10:10, konstantin.ilya...@mediascope.net wrote:
> That is why i am asking that question about upgrade instruction.
> I really don`t understand, how to upgrade/reinstall CentOS 7 to 8 without 
> affecting the work of cluster.
> As i know, this process is easier on Debian, but we deployed our cluster 
> Nautilus on CentOS because there weren`t any packages for 14.x for Debian 
> Stretch (9) or Buster(10).
> P.s.: if this is even possible, i would like to know how to upgrade servers 
> with CentOs7 + ceph 14.2.8 to Debian 10 with ceph 15.2.0 (we have servers 
> with OSD only and 3 servers with Mon/Mgr/Mds)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

I guess you could upgrade each node one by one. So upgrade/reinstall the
OS, install Ceph 15 and re-initialise the OSDs if necessary. Though it
would be nice if there was a way to re-integrate the OSDs from the
previous installation...

Personally, I'm planning to wait for a while to upgrade to Ceph 15, not
in the least because it's not convenient to do stuff like OS upgrades
from home ;-)

Currently we're running ubuntu 18.04 on the ceph nodes, I'd like to
upgrade to ubuntu 20.04 and then to ceph 15.

Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io