[ceph-users] Re: [EXT] mclock scheduler kills clients IOs

2024-09-18 Thread Kai Stian Olstad

On Tue, Sep 17, 2024 at 08:48:11PM -0400, Anthony D'Atri wrote:

Were all three in the same failure domain?


No they were all in different failure domain.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXT] mclock scheduler kills clients IOs

2024-09-17 Thread Kai Stian Olstad

On Tue, Sep 17, 2024 at 04:22:40PM +0200, Denis Polom wrote:

Hi,

yes mclock scheduler doesn't looks like stable and ready for 
production Ceph cluster. I just switched back to wpq and everything 
goes smoothly.


In our cluster all IO stopped when I set 3 OSD to out when running Mclock.
After switching to WPQ and had run deep-scrub on all PG the result was 698
corrupted objects that Ceph could not fix.

So no, I would not say Mclock i production ready.
We have set all out cluster to WPQ.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to specify id on newly created OSD with Ceph Orchestrator

2024-07-29 Thread Kai Stian Olstad

On Fri, Jul 26, 2024 at 04:18:05PM +0200, Iztok Gregori wrote:

On 26/07/24 12:35, Kai Stian Olstad wrote:

On Tue, Jul 23, 2024 at 08:24:21AM +0200, Iztok Gregori wrote:
Am I missing something obvious or with Ceph orchestrator there are 
non way to specify an id during the OSD creation?


You can use osd_id_claims.


I tried the osd_id_claims in a yaml file like this:


service_type: osd
placement:
 hosts: - 
data_devices:paths: - /dev/
osd_id_claims:: ['']


An then applied it, but the created OSD didn't have the id I 
specified. It could be that the syntax of my yaml is wrong, but I gave 
me no errors when I applied it. I didn't try to directly specify the 
osd_id_claims on the command line. The command should be something 
like this:


# ceph orch daemon add osd :,osd_id_claims=


According to the documentation[1] you can use osd_id_claim.

I use:
ceph orch daemon add osd :data_devices=,osd_id_claims=

The difference is "data_devices=", if you need it or not I don't know.


I don't know if it matters, but I've deleted/removed (not replaced) 
the OSD (the OSD id wasn't present in the crush map anymore, not even 
as "destroyed").


It might, I don't think I have tried without --replace since I use a script to
replace devices in Ceph so I never forget to add the --replace.

[1] 
https://docs.ceph.com/en/reef/cephadm/services/osd/?highlight=osd_id_claims#ceph.deployment.drive_group.DriveGroupSpec.osd_id_claims


--
Kai Stian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to specify id on newly created OSD with Ceph Orchestrator

2024-07-26 Thread Kai Stian Olstad

On Tue, Jul 23, 2024 at 08:24:21AM +0200, Iztok Gregori wrote:
Am I missing something obvious or with Ceph orchestrator there are non 
way to specify an id during the OSD creation?


You can use osd_id_claims.

This command is for replacing a HDD in hybrid osd.344 and reuse the block.db 
device on the SSD.

ceph orch daemon add osd 
:data_devices=/dev/sdX,db_devices=/dev/ceph-/osd-block-,osd_id_claims=344

--
Kai Stian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm rgw ssl certificate config

2024-07-18 Thread Kai Stian Olstad

On Thu, Jul 18, 2024 at 10:49:02AM +, Eugen Block wrote:
And after restarting the daemon, it seems to work. So my question is, 
how do you deal with per-host certificates and rgw? Any comments are 
appreciated.


By not dealing with it, sort of.
Since we run our own CA, so I create one certificate with all the names of all 
the
rgw hosts including their IP addressees in the certificate Subject Alt 
Names(SAN).

--
Kai Stian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lot of spams on the list

2024-06-24 Thread Kai Stian Olstad

On 24.06.2024 19:15, Anthony D'Atri wrote:

* Subscription is now moderated
* The three worst spammers (you know who they are) have been removed
* I’ve deleted tens of thousands of crufty mail messages from the queue

The list should work normally now.  Working on the backlog of held 
messages.  99% are bogus, but I want to be careful wrt baby and 
bathwater.


Will the archive[1] also be clean up?

[1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lousy recovery for mclock and reef

2024-05-24 Thread Kai Stian Olstad

On 24.05.2024 21:07, Mazzystr wrote:
I did the obnoxious task of updating ceph.conf and restarting all my 
osds.


ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok config get 
osd_op_queue

{
"osd_op_queue": "wpq"
}

I have some spare memory on my target host/osd and increased the target
memory of that OSD to 10 Gb and restarted.  No effect observed.  In 
fact

mem usage on the host is stable so I don't think the change took effect
even with updating ceph.conf, restart and a direct asok config set.  
target

memory value is confirmed to be set via asok config get

Nothing has helped.  I still cannot break the 21 MiB/s barrier.

Does anyone have any more ideas?


For recovery you can adjust the following.

osd_max_backfills default is 1, in my system I get the best performance 
with 3 and wpq.


The following I have not adjusted myself, but you can try.
osd_recovery_max_active is default to 3.
osd_recovery_op_priority is default to 3, a lower number increases the 
priority for recovery.


All of them can be runtime adjusted.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2024-04-15 Thread Kai Stian Olstad

On 12.04.2024 20:54, Wesley Dillingham wrote:
Did you actually get this working? I am trying to replicate your steps 
but

am not being successful doing this with multi-tenant.


This is what we are using, the second statement is so that bucket owner 
will have access to the object that the user is uploading.


s3-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
{
  "Effect": "Allow",
  "Principal": {
"AWS": [
  "arn:aws:iamuser/"
]
  },
  "Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject"
  ],
  "Resource": [
"arn:aws:s3:::/*",
"arn:aws:s3:::"
  ]
},
{
  "Sid": "owner_full_access",
  "Effect": "Allow",
  "Principal": {
    "AWS": [
  "arn:aws:iamuser/"
]
  },
  "Action": "s3:*",
  "Resource": "arn:aws:s3:::*"
}
  ]
}

And then run
s3cmd setpolicy s3-policy.json s3://


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Kai Stian Olstad

On Mon, Mar 25, 2024 at 10:58:24PM +0100, Kai Stian Olstad wrote:

On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote:
My tally came to 412 out of 539 OSDs showing up in a blocked_by list 
and that is about every OSD with data prior to adding ~100 empty 
OSDs. How 400 read targets and 100 write targets can only equal ~60 
backfills with osd_max_backill set at 3 just makes no sense to me 
but alas.


It seems I can just increase osd_max_backfill even further to get 
the numbers I want so that will do. Thank you all for taking the 
time to look at this.


It's a huge change and 42% of you data need to be moved.
And this move is not only to the new OSD but also between the existing OSD, but
they are busy with backfilling so they have no free backfill reservation.

I do recommend this document by Joshua Baergen at Digital Ocean that explains
backfilling and the problem with it and there solution, a tool called 
pgremapper.


Forgot the link
https://ceph.io/assets/pdfs/user_dev_meeting_2023_10_19_joshua_baergen.pdf

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-25 Thread Kai Stian Olstad

On Mon, Mar 25, 2024 at 09:28:01PM +0100, Torkil Svensgaard wrote:
My tally came to 412 out of 539 OSDs showing up in a blocked_by list 
and that is about every OSD with data prior to adding ~100 empty OSDs. 
How 400 read targets and 100 write targets can only equal ~60 
backfills with osd_max_backill set at 3 just makes no sense to me but 
alas.


It seems I can just increase osd_max_backfill even further to get the 
numbers I want so that will do. Thank you all for taking the time to 
look at this.


It's a huge change and 42% of you data need to be moved.
And this move is not only to the new OSD but also between the existing OSD, but
they are busy with backfilling so they have no free backfill reservation.

I do recommend this document by Joshua Baergen at Digital Ocean that explains
backfilling and the problem with it and there solution, a tool called 
pgremapper.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large number of misplaced PGs but little backfill going on

2024-03-23 Thread Kai Stian Olstad

On Sat, Mar 23, 2024 at 12:09:29PM +0100, Torkil Svensgaard wrote:


The other output is too big for pastebin and I'm not familiar with 
paste services, any suggestion for a preferred way to share such 
output?


You can attached files to the mail here on the list.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Kai Stian Olstad

On Fri, Mar 22, 2024 at 06:51:44PM +0100, Frédéric Nass wrote:



The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every time 
the OSD is started.
If you check the OSD log you'll see it does the bench.

 
Are you sure about the update on every start? Does the update happen only if the 
benchmark result is < 500 iops?
 
Looks like the OSD does not remove any set configuration when the benchmark result 
is > 500 iops. Otherwise, the extremely low value that Michel reported earlier 
(less than 1 iops) would have been updated over time.
I guess.


I'm not completely sure, it's a couple a month since I used mclock, have switch
back to wpq because of a nasty bug in mclock that can freeze cluster I/O.

It could be because I was testing osd_mclock_force_run_benchmark_on_init.
The OSD had DB on SSD and data on HDD, so the measured to about 1700 IOPS and
was ignored because of the 500 limit.
So only the SSD got the osd_mclock_max_capacity_iops_ssd set.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Kai Stian Olstad

On Fri, Mar 22, 2024 at 04:29:21PM +0100, Frédéric Nass wrote:

A/ these incredibly low values were calculated a while back with an unmature 
version of the code or under some specific hardware conditions and you can hope 
this won't happen again


The OSD run bench and update osd_mclock_max_capacity_iops_{hdd,ssd} every time 
the OSD is started.
If you check the OSD log you'll see it does the bench.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-03-06 Thread Kai Stian Olstad

Hi Eugen, thank you for the reply.

The OSD was drained over the weekend, so OSD 223 and 269 have only the 
problematic PG 404.bc.


I don't think moving the PG would help since I don't have any empty OSD 
to move it to, and a move would not fix the hash mismatch.
The reason I just want to have the problematic PG on the OSDs is to 
reduce recovery time.
I would need to set min_size to 4 in an EC 4+2, and stop them both at 
the same time to force a rebuild of the corrupted part of PG that is on 
osd 223 and 269, since repair doesn't fix it.


I'm debating with myself if I should
1. Stop both OSD 223 and 269,
2. Just one of them.

Stopping them both, I'm guarantied that part of the PG on 223 and 269 is 
rebuild from the 4 other, 297, 276, 136 and 197 that doesn't have any 
errors.


OSD 223 is the master in the EC, pg 404.bc acting 
[223,297,269,276,136,197]
So maybe just stop that one, wait for recovery and the run deep-scrub to 
check if things look better.

But would it then use corrupted data on osd 269 to rebuild.


-
Kai Stian Olstad



On 26.02.2024 10:19, Eugen Block wrote:

Hi,

I think your approach makes sense. But I'm wondering if moving only  
the problematic PGs to different OSDs could have an effect as well. I  
assume that moving the 2 PGs is much quicker than moving all BUT those  
2 PGs. If that doesn't work you could still fall back to draining the  
entire OSDs (except for the problematic PG).


Regards,
Eugen

Zitat von Kai Stian Olstad :


Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't,  
should work and so one would be highly appreciated.



Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it  
I think the following might work.


pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to  
other OSD.
- Set min_since to 4, ceph osd pool set default.rgw.buckets.data 
min_size 4

- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard  
s0(osd.223) and s2(osd.269) since they are now down from the  
remaining shards

s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 
objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the  
PG after repair is finished, it still report scrub errors.


Why can't ceph pg repair repair this, it has 4 out of 6 should be  
able to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's  
forced to recreate them?



Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7:  
Backfilling deadlock / stall / stuck / standstill" [1].

 - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
 - Solution was to swap from mclock to wpq and restart alle OSD.
 - When all backfilling was finished all 4 OSD was replaced.
 - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run  
and reported errors

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg 
inconsistent

   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is active+clean+inconsistent, acting  
[223,297,269,276,136,197]


I then run repair
   ceph pg repair 404.bc

And ceph status showed this
   ceph status
   ---
   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART  
error or any I/O error in OS logs.

So I tried to run deep-scrub again on the PG.
   ceph pg deep-scrub 404.bc

And got this result.

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs;  
Possible data damage: 1 pg inconsistent

   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is  active+clean+scrubbing+deep+inconsistent+repair, 
acting  [223,297,269,276,136,197]


698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 1396 reads repaired
   osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since the

[ceph-users] Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-02-27 Thread Kai Stian Olstad

Hi Eugen, thank you for the reply.

The OSD was drained over the weekend, so OSD 223 and 269 have only the 
problematic PG 404.bc.


I don't think moving the PG would help since I don't have any empty OSD 
to move it to, and a move would not fix the hash mismatch.
The reason I just want to have the problematic PG on the OSDs is to 
reduce recovery time.
I would need to set min_size to 4 in an EC 4+2, and stop them both at 
the same time to force a rebuild of the corrupted part of PG that is on 
osd 223 and 269, since repair doesn't fix it.


I'm debating with myself if I should
1. Stop both OSD 223 and 269,
2. Just one of them.

Stopping them both, I'm guarantied that part of the PG on 223 and 269 is 
rebuild from the 4 other, 297, 276, 136 and 197 that doesn't have any 
errors.


OSD 223 is the master in the EC, pg 404.bc acting 
[223,297,269,276,136,197]
So maybe just stop that one, wait for recovery and the run deep-scrub to 
check if things look better.

But would it then use corrupted data on osd 269 to rebuild.


-
Kai Stian Olstad



On 26.02.2024 10:19, Eugen Block wrote:

Hi,

I think your approach makes sense. But I'm wondering if moving only  
the problematic PGs to different OSDs could have an effect as well. I  
assume that moving the 2 PGs is much quicker than moving all BUT those  
2 PGs. If that doesn't work you could still fall back to draining the  
entire OSDs (except for the problematic PG).


Regards,
Eugen

Zitat von Kai Stian Olstad :


Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't,  
should work and so one would be highly appreciated.



Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it  
I think the following might work.


pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to  
other OSD.
- Set min_since to 4, ceph osd pool set default.rgw.buckets.data 
min_size 4

- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard  
s0(osd.223) and s2(osd.269) since they are now down from the  
remaining shards

s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 
objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the  
PG after repair is finished, it still report scrub errors.


Why can't ceph pg repair repair this, it has 4 out of 6 should be  
able to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's  
forced to recreate them?



Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7:  
Backfilling deadlock / stall / stuck / standstill" [1].

 - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
 - Solution was to swap from mclock to wpq and restart alle OSD.
 - When all backfilling was finished all 4 OSD was replaced.
 - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run  
and reported errors

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg 
inconsistent

   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is active+clean+inconsistent, acting  
[223,297,269,276,136,197]


I then run repair
   ceph pg repair 404.bc

And ceph status showed this
   ceph status
   ---
   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART  
error or any I/O error in OS logs.

So I tried to run deep-scrub again on the PG.
   ceph pg deep-scrub 404.bc

And got this result.

   ceph status
   ---
   HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs;  
Possible data damage: 1 pg inconsistent

   [ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 698 reads repaired
   osd.269 had 698 reads repaired
   [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
   pg 404.bc is  active+clean+scrubbing+deep+inconsistent+repair, 
acting  [223,297,269,276,136,197]


698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

   HEALTH_WARN Too many repaired reads on 2 OSDs
   [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
   osd.223 had 1396 reads repaired
   osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since the

[ceph-users] Re: pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-02-23 Thread Kai Stian Olstad

Hi,

No one have any comment at all?
I'm not picky so any speculation, guessing, I would, I wouldn't, should 
work and so one would be highly appreciated.



Since 4 out of 6 in EC 4+2 is OK and ceph pg repair doesn't solve it I 
think the following might work.


pg 404.bc acting [223,297,269,276,136,197]

- Use pgremapper to move all PG on OSD 223 and 269 except 404.bc to 
other OSD.
- Set min_since to 4, ceph osd pool set default.rgw.buckets.data 
min_size 4

- Stop osd 223 and 269

What I hope will happen is that Ceph then recreate 404.bc shard 
s0(osd.223) and s2(osd.269) since they are now down from the remaining 
shards

s1(osd.297), s3(osd.276), s4(osd.136) and s5(osd.197)


_Any_ comment is highly appreciated.

-
Kai Stian Olstad


On 21.02.2024 13:27, Kai Stian Olstad wrote:

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 
objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the PG 
after repair is finished, it still report scrub errors.


Why can't ceph pg repair repair this, it has 4 out of 6 should be able 
to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's forced 
to recreate them?



Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7: Backfilling 
deadlock / stall / stuck / standstill" [1].

  - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
  - Solution was to swap from mclock to wpq and restart alle OSD.
  - When all backfilling was finished all 4 OSD was replaced.
  - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run and 
reported errors

ceph status
---
HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg 
inconsistent

[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+inconsistent, acting 
[223,297,269,276,136,197]


I then run repair
ceph pg repair 404.bc

And ceph status showed this
ceph status
---
HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART error 
or any I/O error in OS logs.

So I tried to run deep-scrub again on the PG.
ceph pg deep-scrub 404.bc

And got this result.

ceph status
---
HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs; 
Possible data damage: 1 pg inconsistent

[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+scrubbing+deep+inconsistent+repair, 
acting [223,297,269,276,136,197]


698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 1396 reads repaired
osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they 
reappear again after a deep-scrub.


The log for osd.223 and osd.269 contain "got incorrect hash on read" 
and "candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the same 
for the other 697 objects.


osd.223 log (only showing 1 of 698 object named 
2021-11-08T19%3a43%3a50,145489260+00%3a00)

---
Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch: 
231235 pg[404.bcs0( v 231235'1636919 (231078'1632435,231235'1636919] 
local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 
les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) 
r=0 lpr=226263 crt=231235'1636919 lcod 231235'1636918 mlcod 
231235'1636918 active+clean+scrubbing+deep+inconsistent+repair [ 
404.bcs0:  REQ_SCRUB ]  MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned 
REQ_SCRUB] _scan_list  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
got incorrect hash on read 0xc5d1dd1b !=  expected 0x7c2f86d7
Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) 
log [ERR] : 404.bc shard 223(0) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
: candidate had an ec hash mismatch
Feb 20 10:31:01 ceph-hd-003 ceph-osd[366

[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Kai Stian Olstad

On 21.02.2024 17:07, wodel youchi wrote:
   - The documentation of ceph does not indicate what versions of 
grafana,

   prometheus, ...etc should be used with a certain version.
  - I am trying to deploy Quincy, I did a bootstrap to see what
  containers were downloaded and their version.
  - I am asking because I need to use a local registry to deploy 
those

  images.


You need to check the cephadm source for the version you would like to 
use

https://github.com/ceph/ceph/blob/v17.2.7/src/cephadm/cephadm#L46

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-02-21 Thread Kai Stian Olstad
r) log 
[ERR] : 404.bc shard 269(2) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
: candidate had an ec hash mismatch


osd.269 log (only showing 1 of 698 object named 
2021-11-08T19%3a43%3a50,145489260+00%3a00)

---
Feb 20 10:31:00 ceph-hd-001 ceph-osd[3656897]: osd.269 pg_epoch: 
231235 pg[404.bcs2( v 231235'1636919 (231078'1632435,231235'1636919] 
local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 
les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=2 
lpr=226263 luod=0'0 crt=231235'1636919 mlcod 231235'1636919 active 
mbc={}] _scan_list  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
got incorrect hash on read 0x7c0871dc !=  expected 0xcf6f4c58


The log for the other osd in the PG osd.297, osd.276, osd.136 and 
osd.197 doesn't show any error.


If I try to get the object it failes
$ s3cmd s3://benchfiles/2021-11-08T19:43:50,145489260+00:00
download: 's3://benchfiles/2021-11-08T19:43:50,145489260+00:00' -> 
'./2021-11-08T19:43:50,145489260+00:00'  [1 of 1]
ERROR: Download of './2021-11-08T19:43:50,145489260+00:00' failed 
(Reason: 500 (UnknownError))

ERROR: S3 error: 500 (UnknownError)

And the RGW log show this
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: == starting new 
request req=0x7f94b744d660 =
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: WARNING: set_req_state_err 
err_no=5 resorting to 500
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: == starting new 
request req=0x7f94b6e41660 =
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: == req done 
req=0x7f94b744d660 op status=-5 http_status=500 latency=0.02568s 
==
Feb 21 08:27:06 ceph-mon-1 radosgw[1747]: beast: 0x7f94b744d660: 
110.2.0.46 - test1 [21/Feb/2024:08:27:06.021 +] "GET 
/benchfiles/2021-11-08T19%3A43%3A50%2C145489260%2B00%3A00 HTTP/1.1" 500 
226 - - - latency=0.020000568s


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG stuck at recovery

2024-02-19 Thread Kai Stian Olstad

On 19.02.2024 23:23, Anthony D'Atri wrote:
After wrangling with this myself, both with 17.2.7 and to an extent 
with 17.2.5, I'd like to follow up here and ask:


Those who have experienced this, were the affected PGs

* Part of an EC pool?
* Part of an HDD pool?
* Both?


Both in my case, EC is 4+2 jerasure blaum_roth and the HDD is hybrid 
where DB is on SSD shared by 5 HDD.

And in your cases?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Installing ceph s3.

2024-02-12 Thread Kai Stian Olstad

On 12.02.2024 18:15, Albert Shih wrote:
I couldn't find a documentation about how to install a S3/Swift API (as 
I

understand it's RadosGW) on quincy.


It depends on how you have install Ceph.
If your are using Cephadm the docs is here 
https://docs.ceph.com/en/reef/cephadm/services/rgw/




I can find some documentation on octupus
(https://docs.ceph.com/en/octopus/install/ceph-deploy/install-ceph-gateway/)


ceph-deploy is deprecated
https://docs.ceph.com/en/reef/install/

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG stuck at recovery

2024-02-07 Thread Kai Stian Olstad

You don't say anything about the Ceph version you are running.
I had an similar issue with 17.2.7, and is seams to be an issue with mclock,
when I switch to wpq everything worked again.

You can read more about it here
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#OE3QXLAJIY6NU7PNMGHP47UK2CBZJPUG

- 
Kai Stian Olstad



On Tue, Feb 06, 2024 at 06:35:26AM -, LeonGao  wrote:

Hi community

We have a new Ceph cluster deployment with 100 nodes. When we are draining an 
OSD host from the cluster, we see a small amount of PGs that cannot make any 
progress to the end. From the logs and metrics, it seems like the recovery 
progress is stuck (0 recovery ops for several days). Would like to get some 
ideas on this. Re-peering and OSD restart do resolve to mitigate the issue but 
we want to get to the root cause of it as draining and recovery happen 
frequently.

I have put some debugging information below. Any help is appreciated, thanks!

ceph -s
   pgs: 4210926/7380034104 objects misplaced (0.057%)
41198 active+clean
71active+remapped+backfilling
12active+recovering

One of the stuck PG:
6.38f1   active+remapped+backfilling [313,643,727] 313  
   [313,643,717] 313

PG query result:

ceph pg 6.38f1 query
{
   "snap_trimq": "[]",
   "snap_trimq_len": 0,
   "state": "active+remapped+backfilling",
   "epoch": 246856,
   "up": [
   313,
   643,
   727
   ],
   "acting": [
   313,
   643,
   717
   ],
   "backfill_targets": [
   "727"
   ],
   "acting_recovery_backfill": [
   "313",
   "643",
   "717",
   "727"
   ],
   "info": {
   "pgid": "6.38f1",
   "last_update": "212333'38916",
   "last_complete": "212333'38916",
   "log_tail": "80608'37589",
   "last_user_version": 38833,
   "last_backfill": "MAX",
   "purged_snaps": [],
   "history": {
   "epoch_created": 3726,
   "epoch_pool_created": 3279,
   "last_epoch_started": 243987,
   "last_interval_started": 243986,
   "last_epoch_clean": 220174,
   "last_interval_clean": 220173,
   "last_epoch_split": 3726,
   "last_epoch_marked_full": 0,
   "same_up_since": 238347,
   "same_interval_since": 243986,
   "same_primary_since": 3728,
   "last_scrub": "212333'38916",
   "last_scrub_stamp": "2024-01-29T13:43:10.654709+",
   "last_deep_scrub": "212333'38916",
   "last_deep_scrub_stamp": "2024-01-28T07:43:45.920198+",
   "last_clean_scrub_stamp": "2024-01-29T13:43:10.654709+",
   "prior_readable_until_ub": 0
   },
   "stats": {
   "version": "212333'38916",
   "reported_seq": 413425,
   "reported_epoch": 246856,
   "state": "active+remapped+backfilling",
   "last_fresh": "2024-02-05T21:14:40.838785+",
   "last_change": "2024-02-03T22:33:43.052272+",
   "last_active": "2024-02-05T21:14:40.838785+",
   "last_peered": "2024-02-05T21:14:40.838785+",
   "last_clean": "2024-02-03T04:26:35.168232+",
   "last_became_active": "2024-02-03T22:31:16.037823+",
   "last_became_peered": "2024-02-03T22:31:16.037823+",
   "last_unstale": "2024-02-05T21:14:40.838785+",
   "last_undegraded": "2024-02-05T21:14:40.838785+",
   "last_fullsized": "2024-02-05T21:14:40.838785+",
   "mapping_epoch": 243986,
   "log_start": "80608'37589",
   "ondisk_log_start": "80608'37589",
   "created": 3726,
   "last_epoch_clean": 220174,
   "parent": "0.0",
   "parent_split_bits": 14,
   "last_scrub": "212333'38916",
   "last_scrub_stamp": "2024-01-29T13:43:10.654709+",
   "last_deep_scrub": "212333'38916"

[ceph-users] Re: how can install latest dev release?

2024-01-31 Thread Kai Stian Olstad

On 31.01.2024 09:38, garcetto wrote:

good morning,
 how can i install latest dev release using cephadm?


Have you looked at this page?
https://docs.ceph.com/en/latest/install/containers/#development-builds

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-28 Thread Kai Stian Olstad

On 26.01.2024 23:09, Mark Nelson wrote:
For what it's worth, we saw this last week at Clyso on two separate 
customer clusters on 17.2.7 and also solved it by moving back to wpq.  
We've been traveling this week so haven't created an upstream tracker 
for it yet, but we're back to recommending wpq to our customers for all 
production cluster deployments until we figure out what's going on.


Thank you for confirming, switching to wpq solved my problem too,
and I have switch all production clusters to wpq.

I guess all my logs is gone by now, but I try to recreate the situation 
in the test cluster.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-28 Thread Kai Stian Olstad

On 26.01.2024 22:08, Wesley Dillingham wrote:
I faced a similar issue. The PG just would never finish recovery. 
Changing
all OSDs in the PG to "osd_op_queue wpq" and then restarting them 
serially
ultimately allowed the PG to recover. Seemed to be some issue with 
mclock.


Thank you Wes, switching to wpq and restart the OSDs fixed it for me 
too.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-26 Thread Kai Stian Olstad

Hi,

This is a cluster running 17.2.7 upgraded from 16.2.6 on the 15 January 
2024.


On Monday 22 January we had 4 HDD all on different server with I/O-error 
because of some damage sectors, the OSD is hybrid so the DB is on SSD, 5 
HDD share 1 SSD.
I set the OSD out, ceph osd out 223 269 290 318 and all hell broke 
loose.


I took only minutes before the users complained about Ceph not working.
Ceph status reportet slow OPS on the OSDs that was set to out, and “ceph 
tell osd. dump_ops_in_flight” against the out OSDs it just hang, 
after 30 minutes I stopped the dump command.
Long story short I ended up running “ceph osd set nobackfill” to slow 
ops was gone and then unset it when the slow ops message disappeared.
I needed to run that all the time so the cluster didn’t come to a holt 
so this oneliner loop was used
“while true; do ceph -s | grep -qE "oldest one blocked for [0-9]{2,}" && 
(date; ceph osd set nobackfill; sleep 15; ceph osd unset nobackfill); 
sleep 10; done”



But now 4 days later the backfilling has stopped progressing completely 
and the number of misplaced object is increasing.
Some PG has 0 misplaced object but sill have backfilling state, and been 
in this state for over 24 hours now.


I have a hunch that it’s because of PG 404.6e7 is in state 
“active+recovering+degraded+remapped” it’s been in this state for over 
48 hours.
It’s has possible 2 missing object, but since they are not unfound I 
can’t delete them with “ceph pg 404.6e7 mark_unfound_lost delete”


Could someone please help to solve this?
Down below is some output of ceph commands, I’ll also attache them.


ceph status (only removed information about no running scrub and 
deep_scrub)

---
  cluster:
id: b321e76e-da3a-11eb-b75c-4f948441dcd0
health: HEALTH_WARN
Degraded data redundancy: 2/6294904971 objects degraded 
(0.000%), 1 pg degraded


  services:
mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 11d)
mgr: ceph-mon-1.ptrsea(active, since 11d), standbys: 
ceph-mon-2.mfdanx

mds: 1/1 daemons up, 1 standby
osd: 355 osds: 355 up (since 22h), 351 in (since 4d); 18 remapped 
pgs

rgw: 7 daemons active (7 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 3945 pgs
objects: 1.14G objects, 1.1 PiB
usage:   1.8 PiB used, 1.2 PiB / 3.0 PiB avail
pgs: 2/6294904971 objects degraded (0.000%)
 2980455/6294904971 objects misplaced (0.047%)
 3901 active+clean
 22   active+clean+scrubbing+deep
 17   active+remapped+backfilling
 4active+clean+scrubbing
 1active+recovering+degraded+remapped

  io:
client:   167 MiB/s rd, 13 MiB/s wr, 6.02k op/s rd, 2.35k op/s wr


ceph health detail (only removed information about no running scrub and 
deep_scrub)

---
HEALTH_WARN Degraded data redundancy: 2/6294902067 objects degraded 
(0.000%), 1 pg degraded
[WRN] PG_DEGRADED: Degraded data redundancy: 2/6294902067 objects 
degraded (0.000%), 1 pg degraded
pg 404.6e7 is active+recovering+degraded+remapped, acting 
[223,274,243,290,286,283]



ceph pg 202.6e7 list_unfound
---
{
"num_missing": 2,
"num_unfound": 0,
"objects": [],
"state": "Active",
"available_might_have_unfound": true,
"might_have_unfound": [],
"more": false
}

ceph pg 404.6e7 query | jq .recovery_state
---
[
  {
"name": "Started/Primary/Active",
"enter_time": "2024-01-26T09:08:41.918637+",
"might_have_unfound": [
  {
"osd": "243(2)",
"status": "already probed"
  },
  {
"osd": "274(1)",
"status": "already probed"
  },
  {
"osd": "275(0)",
"status": "already probed"
  },
  {
"osd": "283(5)",
"status": "already probed"
  },
  {
"osd": "286(4)",
"status": "already probed"
  },
  {
"osd": "290(3)",
"status": "already probed"
  },
  {
"osd": "335(3)",
"status": "already probed"
  }
],
"recovery_progress": {
  "backfill_targets": [
"275(0)",
"335(3)"
  ],
  "waiting_on_backfill": [],
  "last_backfill_started": 
"404:e76011a9:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.18_56463c71-286c-4399-8d5d-0c278b7c97fd:head",

  "backfill_info": {
"begin": "MIN",
"end": "MIN",
"objects": []
  },
  "peer_backfill_info": [],
  "backfills_in_flight": [],
  "recovering": [],
  "pg_backend": {
"recovery_ops": [],
"read_ops": []
  }
}
  },
  {
"name": "Started",
"enter_time": "2024-01-26T09:08:40.909151+"
  }
]


ceph pg ls recovering backfilling
---
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES 
OMAP_BYTES*  OMAP_KEYS*  LOGLOG_DUPS  STATE  
  SINCE  VERSION  REPORTED  UP   
  ACTING
404.bc287986 0

[ceph-users] Re: podman / docker issues

2024-01-25 Thread Kai Stian Olstad

On 25.01.2024 18:19, Marc wrote:
More and more I am annoyed with the 'dumb' design decisions of redhat. 
Just now I have an issue on an 'air gapped' vm that I am unable to 
start a docker/podman container because it tries to contact the 
repository to update the image and instead of using the on disk image 
it just fails. (Not to mention the %$#$%#$ that design containers to 
download stuff from the internet on startup)


I was wondering if this is also an issue with ceph-admin. Is there an 
issue with starting containers when container image repositories are 
not available or when there is no internet connection.


Of course cephadm will fail if the container registry is not available 
avaiable and the image isn't pulled locally.


But you don't need to use the official registry, so using it air-gaped 
is not a problem.
Just download the images you need to your local registry and specify it, 
some details are here

https://docs.ceph.com/en/reef/cephadm/install/#deployment-in-an-isolated-environment

The containers themself don't need to download anything at start.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm orchestrator and special label _admin in 17.2.7

2024-01-24 Thread Kai Stian Olstad

On 23.01.2024 18:19, Albert Shih wrote:
Just like to known if it's a very bad idea to do a rsync of /etc/ceph 
from

the «_admin» server to the other ceph cluster server.

I in fact add something like

for host in `cat /usr/local/etc/ceph_list_noeuds.txt`
do
  /usr/bin/rsync -av /etc/ceph/ceph* $host:/etc/ceph/
done

in a cronjob


Why not just add the _admin label to the host and let Ceph do the job?

You can also run this to get the ceph.conf copied to all host
ceph config set mgr/cephadm/manage_etc_ceph_ceph_conf true

Anyway, I don't se any problem with rsync it, it's just ceph.conf and 
the admin key.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: About lost disk with erasure code

2023-12-28 Thread Kai Stian Olstad

On 27.12.2023 04:54, Phong Tran Thanh wrote:

Thank you for your knowledge. I have a question. Which pool is affected
when the PG is down, and how can I show it?
When a PG is down, is only one pool affected or are multiple pools 
affected?


If only 1 PG is down only 1 pool is affected.
The name of a PG is {pool-num}.{pg-id} and the pools number you find 
with "ceph osd lspools".


ceph health detail
will show which PG is down and all other issues.

ceph pg ls
will show you all PG, their status and the OSD they are running on.

Some useful links
https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/#monitoring-pg-states
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/
https://docs.ceph.com/en/latest/dev/placement-group/#user-visible-pg-states


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-03 Thread Kai Stian Olstad

On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote:

One of our 16.2.14 cluster OSDs crashed again because of the dreaded
https://tracker.ceph.com/issues/53906 bug.





It would be good to understand what has triggered this condition and how it
can be resolved without rebooting the whole host. I would very much
appreciate any suggestions.


If you look closely at 53906 you'll see it's a duplicate of
https://tracker.ceph.com/issues/53907

In there you have the fix and a workaround until next minor is released.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd dump_historic_ops

2023-12-01 Thread Kai Stian Olstad

On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote:

I have a problem with my osd, i want to show dump_historic_ops of osd
I follow the guide:
https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops
But when i run command

ceph daemon osd.8 dump_historic_ops show the error, the command run on node
with osd.8
Can't get admin socket path: unable to get conf option admin_socket for
osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
types are: auth, mon, osd, mds, mgr, client\n"

I am running ceph cluster reef version by cephadmin install

What should I do?


The easiest is use tell, then you can run it on any node that have access to 
ceph.

ceph tell osd.8 dump_historic_ops


ceph tell osd.8 help
will give you all you can do with tell.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to speed up rgw lifecycle

2023-11-28 Thread Kai Stian Olstad

On Tue, Nov 28, 2023 at 02:55:56PM +0700, VÔ VI wrote:

My ceph cluster is using s3 with three pools and obj/s approximately 4.5k
obj/s and the rgw lifecycle delete per pool is only 60-70 objects/s

How can I speed up the lc rgw process? 60 70 objects/s is too slow


It explained in the documentation, have you tried that?
https://docs.ceph.com/en/reef/radosgw/config-ref/#lifecycle-settings

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.x excessive logging, how to reduce?

2023-10-09 Thread Kai Stian Olstad

On 09.10.2023 10:05, Zakhar Kirpichenko wrote:

I did try to play with various debug settings. The issue is that mons
produce logs of all commands issued by clients, not just mgr. For 
example,

an Openstack Cinder node asking for space it can use:

Oct  9 07:59:01 ceph03 bash[4019]: debug 2023-10-09T07:59:01.303+


This log say that it's bash with PID 4019 that is creating the log 
entry.
Maybe start there, check what what other thing you are running on the 
server that creates this messages.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Kai Stian Olstad

On 06.10.2023 17:48, Wesley Dillingham wrote:
A repair is just a type of scrub and it is also limited by 
osd_max_scrubs

which in pacific is 1.

If another scrub is occurring on any OSD in the PG it wont start.

do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all
scrubs to stop (a few seconds probably)

Then issue the pg repair command again. It may start.

You also have pgs in backfilling state. Note that by default OSDs in
backfill or backfill_wait also wont perform scrubs.

You can modify this behavior with `ceph config set osd
osd_scrub_during_recovery
true`

I would suggest only setting that after the noscub flags are set and 
the

only scrub you want to get processed is your manual repair.

Then rm the scrub_during_recovery config item before unsetting the 
noscrub

flags.


Hi Simon

Just to add to Wes's answer, Cern have made a nice script that do the 
steps Wes explained above

https://github.com/cernceph/ceph-scripts/blob/master/tools/scrubbing/autorepair.sh
that you might want to take a look at.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about PG auto-scaling and node addition

2023-09-14 Thread Kai Stian Olstad

On Wed, Sep 13, 2023 at 04:33:32PM +0200, Christophe BAILLON wrote:

We have a cluster with 21 nodes, each having 12 x 18TB, and 2 NVMe for db/wal.
We need to add more nodes.
The last time we did this, the PGs remained at 1024, so the number of PGs per 
OSD decreased.
Currently, we are at 43 PGs per OSD.

Does auto-scaling work correctly in Ceph version 17.2.5?


I would believe so, it's working as designed, default the auto-scaler increasing
number PGs based on how much data is stored.
So when you add OSDs, data usage is the same and therefor no scaling is done.



Should we increase the number of PGs before adding nodes?


Adding nodes/OSDs and changing number of PGs involves a lot of data being
copied around.
So if those two could be combined you only need to copied the data once instead
of twice.
But if that is smart or possible I'm not sure of.



Should we keep PG auto-scaling active?

If we disable auto-scaling, should we increase the number of PGs to reach 100 
PGs per OSD?


If you know how much of the data is going to be stored in a pool the best way
is to set the number of PG up front.
Because every time the auto-scaler changed the number of PGs you will have a
huge amount of data being copied around to other OSDs.

You can set the target size or target ratio[1] and the auto-scaler with set the
appropriate number of PGs on the pool.

But if you know how much data is going to be stored in a pool you can turn it
of and just set it manually.

100 is a rule of thumb, but with so large disk you could or maybe should
consider having a higher number of PGs per OSD.


[1] 
https://docs.ceph.com/en/quincy/rados/operations/placement-groups/#viewing-pg-scaling-recommendations

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: precise/best way to check ssd usage

2023-07-29 Thread Kai Stian Olstad

On Fri, Jul 28, 2023 at 07:13:33PM +, Marc wrote:

I have a use % between 48% and 57%, and assume that with a node failure 1/3 
(only using 3x repl.) of this 57% needs to be able to migrate and added to a 
different node.


If you by this mean you have 3 nodes with 3x replica and failure domain set to
host, it's my understanding no data will be migrated/backfilled when a node
fails.

The reason is that there is nowhere to copy the data to, to fulfill the crush 
rule
one copy on 3 different hosts.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] How to change RGW certificate in Cephadm?

2023-06-16 Thread Kai Stian Olstad

On Thu, Jun 15, 2023 at 03:58:40PM +, Beaman, Joshua wrote:

We resolved our HAProxy woes by creating a custom jinja2 template and deploying 
as:
ceph config-key set mgr/cephadm/services/ingress/haproxy.cfg -i 
/tmp/haproxy.cfg.j2


Thanks, wish I knew that a few month ago before I threw out ingress.



But we redeploy new certs the same way you described, and then:
ceph orch reconfig ingress.rgw.default.default
ceph orch restart rgw.default.default

This is all done in the same ansible playbook we use to do initial deployment, 
but I don’t see anything else in there that looks like it would be needed to 
update the certs.


After testing this I will claim this is a bug.

The first time "ceph orch apply -i /etc/ceph/rgw.yml" is run it creates to keys
  mgr/cephadm/spex.rgw.pech
and
  rgw/cert/rgw.pech

But later when the spec file is updated and apply is run again only
  mgr/cephadm/spex.rgw.pech
is updated.

When the RGW start the log says it using the certificate in
  rgw/cert/rgw.pech

So, if I read out the certificate from
  mgr/cephadm/spex.rgw.pech
and add that in
  rgw/cert/rgw.pech
and then restart the RGW it picks up the new certificate.

The command to do this
  ceph config-key get mgr/cephadm/spex.rgw.pech | jq -r 
.spec.spec.rgw_frontend_ssl_certificate | ceph config-key set rgw/cert/rgw.pech 
-
  ceph orch restart rgw.pech

My claim is that Ceph should update "rgw/cert/rgw.pech" when 
"mgr/cephadm/spex.rgw.pech" is updated.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bottleneck between loadbalancer and rgws

2023-06-15 Thread Kai Stian Olstad

On Wed, Jun 14, 2023 at 02:19:14PM +, Szabo, Istvan (Agoda) wrote:

I'll try to increase in my small cluster, let's see is there any improvement 
there, thank you.

Any reason if has memory enough to not increase?


I tried to find where I read it but with no luck.
I think it said it's more beneficial to run more RGW on same host than
increasing rgw_max_concurrent_requests without any explanation.

In my search for where I read it I did find this

https://ceph.io/en/news/blog/2022/three-large-scale-clusters/
witch links to
https://tracker.ceph.com/issues/54124

And here they set rgw_max_concurrent_requests to 10240
https://www.seagate.com/content/dam/seagate/migrated-assets/www-content/solutions/partners/red-hat/_shared/files/st-seagate-rhcs5-detail-f29951wg-202110-en.pdf

So I think the only way to find out it to increase it and see what happens.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] How to change RGW certificate in Cephadm?

2023-06-15 Thread Kai Stian Olstad

On Wed, Jun 14, 2023 at 03:43:17PM +, Beaman, Joshua wrote:

Do you have an ingress service for HAProxy/keepalived?  If so, that’s the 
service that you will need to have orch redeploy/restart.  If not, maybe try 
`ceph orch redeploy pech` ?


No ingress, but we did have it running at one time with spec file

  service_type: ingress
  service_id: rgw.pech

This was removed a while ago with

  ceph orch rm ingress.rgw.pech

because haproxy did not have sane values for our environment, timeout was to
low and it was hard coded.

We then applied the spec file in my previous mail. So we are only running
multiple RGW with SSL. Load balancing and HA is done with PowerDNS with
LUA-records.


ceph orch redeploy pech only gives me an error

  pech is not a valid daemon name


We have a servie named rgw.pech

  ceph orch ls --service_name=rgw.pech
  NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
  rgw.pech  ?:443  7/7  4m ago 22h  label:cog

But running

  ceph orch redeploy rgw.pech

will redeploy all 7 RGW, and would be the same as

  ceph orch daemon redeploy rgw.pech.pech-mon-3.upnvrd

but only redeploy one of them.


From: Kai Stian Olstad 
The certificate is about to expire so I would like to update it.
I updated rgw.yml spec with the new certificate and run
  ceph orch apply -i /etc/ceph/rgw.yml

But nothing happened, so I tried to redeploy one of them with
  ceph orch daemon redeploy rgw.pech.pech-mon-3.upnvrd

It redeployed the RGW, but still uses the old certificate.


  ceph config-key list | grep rgw
gives me two keys of interest mgr/cephadm/spec.rgw.pech and rgw/cert/rgw.pech

The content of mgr/cephadm/spec.rgw.pech is the new spec file with the updated
certificates, but the rgw/cert/rgw.pech only contains certificate and private
key, but the certificate is the old ones about to expire.


When I run

  ceph orch daemon redeploy rgw.pech.pech-mon-3.upnvrd

The log says it using rgw/cert/rgw.pech witch contains the old certificate.

  0 framework: beast
  0 framework conf key: ssl_port, val: 443
  0 framwwork conf key: ssl_certificate, val: config://rgw/cert/rgw.pech

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bottleneck between loadbalancer and rgws

2023-06-14 Thread Kai Stian Olstad

On Wed, Jun 14, 2023 at 01:44:40PM +, Szabo, Istvan (Agoda) wrote:

I have a dedicated loadbalancer pairs separated on 2x baremetal servers and 
behind the haproxy balancers I have 3 mon/mgr/rgw nodes.
Each rgw node has 2rgw on it so in the cluster altogether 6, (now I just added 
one more so currently 9).

Today I see pretty high GET latency in the cluster (3-4s) and seems like the 
limitations are the gateways:
https://i.ibb.co/ypXFL34/1.png
In this netstat seems like maxed out the established connections around 2-3k. 
When I've added one more gateway it increased.

Seems like the gateway node or the gateway instance has some limitation. What 
is the value which is around 1000,I haven't really found it and affect GET and 
limit the connections on linux?


It could be rgw_max_concurrent_requests[1] witch is default at 1024.
I read somewhere that it should not be increased, but could be increase it to 
2048.
But the recommended action was to add more gateways instead.


[1] 
https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_max_concurrent_requests

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to change RGW certificate in Cephadm?

2023-06-14 Thread Kai Stian Olstad

When I enabled RGW in cephadm I used this spec file rgw.yml

  service_type: rgw
  service_id: pech
  placement:
label: cog
  spec:
ssl: true
rgw_frontend_ssl_certificate: |
  -BEGIN CERTIFICATE-
  
  -END CERTIFICATE-
  -BEGIN CERTIFICATE-
  
  -END CERTIFICATE-
  -BEGIN CERTIFICATE-
  
  -END CERTIFICATE-
  -BEGIN RSA PRIVATE KEY-
  
  -END RSA PRIVATE KEY-

And enabled it with
  ceph orch apply -i /etc/ceph/rgw.yml


The certificate is about to expire so I would like to update it.
I updated rgw.yml spec with the new certificate and run
  ceph orch apply -i /etc/ceph/rgw.yml

But nothing happened, so I tried to redeploy one of them with
  ceph orch daemon redeploy rgw.pech.pech-mon-3.upnvrd

It redeployed the RGW, but still uses the old certificate.


  ceph config-key list | grep rgw
gives me two keys of interest mgr/cephadm/spec.rgw.pech and rgw/cert/rgw.pech

The content of mgr/cephadm/spec.rgw.pech is the new spec file with the updated
certificates, but the rgw/cert/rgw.pech only contains certificate and private
key, but the certificate is the old ones about to expire.


I have looked in the documentation and can't find how to update the certificate
for RGW.

Can anyone shed some light on how to replace the certificate?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-03 Thread Kai Stian Olstad

On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote:
We're actually writing this for RGW right now.  It'll be a bit before 
it's productized, but it's in the works.


Just curious, what is the use cases for this feature?
S3 against CephFS?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 pg recovery_unfound after multiple crash of an OSD

2023-01-09 Thread Kai Stian Olstad

Hi

Just a follow up, the issue was solved by running command

  ceph pg 404.1ff mark_unfound_lost delete

-
Kai Stian Olstad



On 04.01.2023 13:00, Kai Stian Olstad wrote:

Hi

We are running Ceph 16.2.6 deployed with Cephadm.

Around Christmas OSD 245 and 327 had about 20 read error so I set them 
to out.


Around new year another OSD 313 more or less died since is become so
slow that it triggered Linux default I/O-timeout of 30 seconds.
In this period the OSD crashed 8 times and was restartet by Systemd
and we ended up with

  [WRN] OBJECT_UNFOUND: 1/416287126 objects unfound (0.000%)
 pg 404.1ff has 1 unfound objects
  [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
 pg 404.1ff is active+recovery_unfound+degraded+remapped, acting
[208,220,269,175,313,329], 1 unfound
  [WRN] PG_DEGRADED: Degraded data redundancy: 5/2364745884 objects
degraded (0.000%), 1 pg degraded
 pg 404.1ff is active+recovery_unfound+degraded+remapped, acting
[208,220,269,175,313,329], 1 unfound

The pool 404 is "default.rgw.buckets.data" and pool 404 is erasure 
encoding 4+2.


I have search for a solution but with no luck, what I have tried is

  - Restarted all 6 OSD for the PG one by one
  - Running repair of 404.1ff

Output of following command
  - ceph -s
  - ceph health detail
  - ceph pg ls | grep -e PG -e ^404.1ff
  - ceph osd pool ls detail | grep 404
  - ceph osd tree out
  - ceph crash ls | grep -e ID -e osd.313
  - ceph pg 404.1ff list_unfound
  - ceph pg 404.1ff

Is appended below, can also be read here 
https://gitlab.com/-/snippets/2479624

or cloned with "git clone https://gitlab.com/-/snippets/2479624";

Does anyone have any idea on how to resolv the problem?
Any help is much appreciated.

-
Kai Stian Olstad



::
ceph-s.txt
::
ceph -s
---
  cluster:
id: d13c6b81-51ee-4d22-84e9-456f9307296c
health: HEALTH_ERR
1/416287125 objects unfound (0.000%)
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 5/2364745860 objects degraded
(0.000%), 1 pg degraded

  services:
mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 2M)
mgr: ceph-mon-2.mfdanx(active, since 3w), standbys: 
ceph-mon-1.ptrsea

mds: 1/1 daemons up, 1 standby
osd: 355 osds: 355 up (since 20h), 352 in (since 2d); 1 remapped 
pgs

rgw: 4 daemons active (4 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 2505 pgs
objects: 416.29M objects, 540 TiB
usage:   939 TiB used, 2.1 PiB / 3.0 PiB avail
pgs: 5/2364745860 objects degraded (0.000%)
 137931/2364745860 objects misplaced (0.006%)
 1/416287125 objects unfound (0.000%)
 2489 active+clean
 14   active+clean+scrubbing+deep
 1active+recovery_unfound+degraded+remapped
 1active+clean+scrubbing

  io:
client:   38 MiB/s rd, 23 MiB/s wr, 2.58k op/s rd, 326 op/s wr

  progress:
Global Recovery Event (6d)
  [===.] (remaining: 3m)


::
ceph_health_detail.txt
::
ceph health detail
--
HEALTH_ERR 1/416287126 objects unfound (0.000%); Possible data damage:
1 pg recovery_unfound; Degraded data redundancy: 5/2364745884 objects
degraded (0.000%), 1 pg degraded
[WRN] OBJECT_UNFOUND: 1/416287126 objects unfound (0.000%)
pg 404.1ff has 1 unfound objects
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
pg 404.1ff is active+recovery_unfound+degraded+remapped, acting
[208,220,269,175,313,329], 1 unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 5/2364745884 objects
degraded (0.000%), 1 pg degraded
pg 404.1ff is active+recovery_unfound+degraded+remapped, acting
[208,220,269,175,313,329], 1 unfound


::
ceph_pg_ls.txt
::
ceph pg ls | grep -e PG -e ^404.1ff
---
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES
OMAP_BYTES*  OMAP_KEYS*  LOGSTATE
SINCE  VERSION  REPORTED   UP
   ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP
404.1ff   137912 5 1379081  282417561722
 0   0   5528  active+recovery_unfound+degraded+remapped
19h141748'724163 141748:3558203  [208,220,269,175,343,329]p208
 [208,220,269,175,313,329]p208  2022-12-31T19:27:10.993286+
2022-12-31T19:27:10.993286+


::
ceph_osd_pool_ls_detail.txt
::
ceph osd pool ls detail | grep 404
--
pool 404 'default.rgw.buckets.data' erasure profile
ec42-jerasure-blaum_roth-hdd size 6 min_size 5 crush_rule 2
object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode on
last_change 124077 lfor 0/52091/108555 flags hashpspool stripe_width
229376 target_size_bytes 1099511627776000 application rgw


::
ceph_osd_tree_out.txt
::
ceph osd tree out

[ceph-users] 1 pg recovery_unfound after multiple crash of an OSD

2023-01-04 Thread Kai Stian Olstad

Hi

We are running Ceph 16.2.6 deployed with Cephadm.

Around Christmas OSD 245 and 327 had about 20 read error so I set them 
to out.


Around new year another OSD 313 more or less died since is become so 
slow that it triggered Linux default I/O-timeout of 30 seconds.
In this period the OSD crashed 8 times and was restartet by Systemd and 
we ended up with


  [WRN] OBJECT_UNFOUND: 1/416287126 objects unfound (0.000%)
 pg 404.1ff has 1 unfound objects
  [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
 pg 404.1ff is active+recovery_unfound+degraded+remapped, acting 
[208,220,269,175,313,329], 1 unfound
  [WRN] PG_DEGRADED: Degraded data redundancy: 5/2364745884 objects 
degraded (0.000%), 1 pg degraded
 pg 404.1ff is active+recovery_unfound+degraded+remapped, acting 
[208,220,269,175,313,329], 1 unfound


The pool 404 is "default.rgw.buckets.data" and pool 404 is erasure 
encoding 4+2.


I have search for a solution but with no luck, what I have tried is

  - Restarted all 6 OSD for the PG one by one
  - Running repair of 404.1ff

Output of following command
  - ceph -s
  - ceph health detail
  - ceph pg ls | grep -e PG -e ^404.1ff
  - ceph osd pool ls detail | grep 404
  - ceph osd tree out
  - ceph crash ls | grep -e ID -e osd.313
  - ceph pg 404.1ff list_unfound
  - ceph pg 404.1ff

Is appended below, can also be read here 
https://gitlab.com/-/snippets/2479624

or cloned with "git clone https://gitlab.com/-/snippets/2479624";

Does anyone have any idea on how to resolv the problem?
Any help is much appreciated.

-
Kai Stian Olstad



::
ceph-s.txt
::
ceph -s
---
  cluster:
id: d13c6b81-51ee-4d22-84e9-456f9307296c
health: HEALTH_ERR
1/416287125 objects unfound (0.000%)
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 5/2364745860 objects degraded 
(0.000%), 1 pg degraded


  services:
mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 2M)
mgr: ceph-mon-2.mfdanx(active, since 3w), standbys: 
ceph-mon-1.ptrsea

mds: 1/1 daemons up, 1 standby
osd: 355 osds: 355 up (since 20h), 352 in (since 2d); 1 remapped pgs
rgw: 4 daemons active (4 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   14 pools, 2505 pgs
objects: 416.29M objects, 540 TiB
usage:   939 TiB used, 2.1 PiB / 3.0 PiB avail
pgs: 5/2364745860 objects degraded (0.000%)
 137931/2364745860 objects misplaced (0.006%)
 1/416287125 objects unfound (0.000%)
 2489 active+clean
 14   active+clean+scrubbing+deep
 1active+recovery_unfound+degraded+remapped
 1active+clean+scrubbing

  io:
client:   38 MiB/s rd, 23 MiB/s wr, 2.58k op/s rd, 326 op/s wr

  progress:
Global Recovery Event (6d)
  [===.] (remaining: 3m)


::
ceph_health_detail.txt
::
ceph health detail
--
HEALTH_ERR 1/416287126 objects unfound (0.000%); Possible data damage: 1 
pg recovery_unfound; Degraded data redundancy: 5/2364745884 objects 
degraded (0.000%), 1 pg degraded

[WRN] OBJECT_UNFOUND: 1/416287126 objects unfound (0.000%)
pg 404.1ff has 1 unfound objects
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound
pg 404.1ff is active+recovery_unfound+degraded+remapped, acting 
[208,220,269,175,313,329], 1 unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 5/2364745884 objects 
degraded (0.000%), 1 pg degraded
pg 404.1ff is active+recovery_unfound+degraded+remapped, acting 
[208,220,269,175,313,329], 1 unfound



::
ceph_pg_ls.txt
::
ceph pg ls | grep -e PG -e ^404.1ff
---
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES 
OMAP_BYTES*  OMAP_KEYS*  LOGSTATE
  SINCE  VERSION  REPORTED   UP  
   ACTING SCRUB_STAMP  
DEEP_SCRUB_STAMP
404.1ff   137912 5 1379081  282417561722
0   0   5528  active+recovery_unfound+degraded+remapped19h   
 141748'724163 141748:3558203  [208,220,269,175,343,329]p208  
[208,220,269,175,313,329]p208  2022-12-31T19:27:10.993286+  
2022-12-31T19:27:10.993286+



::
ceph_osd_pool_ls_detail.txt
::
ceph osd pool ls detail | grep 404
--
pool 404 'default.rgw.buckets.data' erasure profile 
ec42-jerasure-blaum_roth-hdd size 6 min_size 5 crush_rule 2 object_hash 
rjenkins pg_num 2048 pgp_num 2048 autoscale_mode on last_change 124077 
lfor 0/52091/108555 flags hashpspool stripe_width 229376 
target_size_bytes 1099511627776000 application rgw



::
ceph_osd_tree_out.txt
::
ceph osd tree out
-
ID   CLASS  WEIGHT  TYPE NAME STATUS  REWEI

[ceph-users] Re: CephFS: Isolating folders for different users

2022-12-23 Thread Kai Stian Olstad

On 22.12.2022 15:47, Jonas Schwab wrote:

Now the question: Since I established this setup more or less through
trial and error, I was wondering if there is a more elegant/better
approach than what is outlined above?


You can use namespace so you don't need separate pools.
Unfortunately the documentation is sparse on the subject, I use it with 
subvolume like this



# Create a subvolume

ceph fs subvolume create   
--pool_layout  --namespace-isolated


The subvolume is created with namespace fsvolume_
You can also find the name with

ceph fs subvolume info   | jq -r 
.pool_namespace



# Create a user with access to the subvolume and the namespace

## First find the path to the subvolume

ceph fs subvolume getpath  

## Create the user

ceph auth get-or-create client. mon 'allow r' mds 'allow 
rw path=' osd 'allow rw pool= 
namespace=fsvolumens_'



I have found this by looking at how Openstack does it and some trial and 
error.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mails not getting through?

2022-11-16 Thread Kai Stian Olstad

On 16.11.2022 13:21, E Taka wrote:

gmail marks too many messages on this mailing list as spam.


You can fix that by creating a filter in Gmail for ceph-users@ceph.io 
and check the "Never send it to Spam".



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mails not getting through?

2022-11-16 Thread Kai Stian Olstad

On 16.11.2022 00:25, Daniel Brunner wrote:

are my mails not getting through?

is anyone receiving my emails?


You can check this yourself by checking the archives 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/

If you see your mail there, they are getting through.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitoring drives

2022-10-18 Thread Kai Stian Olstad

On 17.10.2022 12:52, Ernesto Puerta wrote:

   - Ceph already exposes SMART-based health-checks, metrics and alerts
   from the devicehealth/diskprediction modules

<https://docs.ceph.com/en/latest/rados/operations/devices/#enabling-monitoring>.
   I find this kind of high-level monitoring more digestible to 
operators than

   low-level SMART metrics.


Marc that started this thread was asking about SAS disk.
smartctl doesn't show much SMART Attributes on SAS disk, but some drive 
only have error log like this


Error counter log:
   Errors Corrected by   Total   Correction 
GigabytesTotal
   ECC  rereads/errors   algorithm  
processeduncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
read:  00 0 0 376907  93335.728  
 0
write: 02 0 22113307  17978.600  
 0
verify:00 0 0848  0.002  
 0



But for the drive I have is look like they all have SMART Health Status.

"SMART Health Status: OK"


Ceph doesn't support SMART or any status on SAS disk today, I only get 
the message "No SMART data available".



I have gathered "smartctl -x --json=vo" log for the 6 types of SAS this 
I have in my possession.

You can find them here if interested [1]


[1] https://gitlab.com/-/snippets/2431089

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't setup Basic Ceph Client

2022-07-19 Thread Kai Stian Olstad

On 08.07.2022 16:18, Jean-Marc FONTANA wrote:

We're planning to use rbd too and get block device for a linux server.
In order to do that, we installed ceph-common packages
and created ceph.conf and ceph.keyring as explained at Basic Ceph
Client Setup — Ceph Documentation
<https://docs.ceph.com/en/pacific/cephadm/client-setup/>
(https://docs.ceph.com/en/pacific/cephadm/client-setup/)

This does not work.

Ceph seems to be installed

$ dpkg -l | grep ceph-common
ii  ceph-common   16.2.9-1~bpo11+1 amd64    common
utilities to mount and interact with a ceph storage cluster
ii  python3-ceph-common   16.2.9-1~bpo11+1 all  Python
3 utility libraries for Ceph

$ ceph -v
ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific 
(stable)


But, when using commands that interact with the cluster, we get this 
message


$ ceph -s
2022-07-08T15:51:24.965+0200 7f773b7fe700 -1 monclient(hunting):
handle_auth_bad_method server allowed_methods [2] but i only support 
[2,1]

[errno 13] RADOS permission denied (error connecting to the cluster)


The default user for ceph is the admin/client.admin do you have that key 
in your keyring?

And is the keyring file readable for the user running the ceph commands?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm host maintenance

2022-07-14 Thread Kai Stian Olstad

On 14.07.2022 11:01, Steven Goodliff wrote:

If i get anywhere with
detecting the instance is the active manager handling that in Ansible
i will reply back here.


I use this

- command: ceph mgr stat
  register: r

- debug: msg={{ (r.stdout | from_json).active_name.split(".")[0] }}


This works because the first part of the instance name is the hostname.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2022-04-19 Thread Kai Stian Olstad

On 18.04.2022 21:35, Wesley Dillingham wrote:
If you mark an osd "out" but not down / you dont stop the daemon do the 
PGs

go remapped or do they go degraded then as well?


First I made sure the balancer was active, then I marked one osd "out", 
"ceph osd out 34" and check status every 2 seconds for 2 minutes, no 
degraded messages.
The only new messages in ceph -s was 12 remapped pgs and "11 
active-remapped+backfilling" and "1 active+remapped+backfill_wait"


Previously I had to set all osd(15 disks) on a host to out and there was 
no issue with PG in degraded state.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2022-04-14 Thread Kai Stian Olstad

On 29.03.2022 14:56, Sandor Zeestraten wrote:

I was wondering if you ever found out anything more about this issue.


Unfortunately no, so I turned it off.


I am running into similar degradation issues while running rados bench 
on a

new 16.2.6 cluster.
In our case it's with a replicated pool, but the degradation problems 
also

go away when we turn off the balancer.


So this goes a long way of confirming there are something wrong with the 
balancer since we now see it on two different installation.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph namespace access control

2022-03-25 Thread Kai Stian Olstad
On Wed, Mar 23, 2022 at 07:14:22AM +0200, Budai Laszlo wrote:
> Hello all,
> 
> what capabilities a ceph user should have in order to be able to create rbd 
> images in one namespace only?
> 
> I have tried the following:
> 
> [root@ceph1 ~]# rbd namespace ls --format=json
> [{"name":"user1"},{"name":"user2"}]
> 
> [root@ceph1 ~]# ceph auth get-or-create client.user2 mon 'profile rbd' osd 
> 'allow rwx pool=rbd namespace=user2' -o /etc/ceph/client.user2.keyring

Instead of using allow use profile on the osd too and it will set the correct 
permissions.
# ceph auth get-or-create client.user2 mon 'profile rbd' osd 'profile rbd 
pool=rbd namespace=user2' -o /etc/ceph/client.user2.keyring

-- 
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RadosGW S3 range on a 0 byte object gives 416 Range Not Satisfiable

2022-03-22 Thread Kai Stian Olstad

On 22.03.2022 09:40, Ulrich Klein wrote:

Yup, completely agree. I find the 416 also a bit surprising, whether
in Ceph/RGW or plain HTTP.


Consistency between other highly used software would be nice.



Just to make sure: I am not at all involved in Ceph development, so
don’t send a feature request to me :)


Of course, I would never refer someone to send a feature request to a 
person even if you were a Ceph developer, I would consider that rude, 
the tracker exist for that :-)



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RadosGW S3 range on a 0 byte object gives 416 Range Not Satisfiable

2022-03-22 Thread Kai Stian Olstad

On 21.03.2022 15:35, Ulrich Klein wrote:

RFC 7233

4.4 <https://datatracker.ietf.org/doc/html/rfc7233#section-4.4>.  416
Range Not Satisfiable

   The 416 (Range Not Satisfiable) status code indicates that none of
   the ranges in the request's Range header field (Section 3.1
<https://datatracker.ietf.org/doc/html/rfc7233#section-3.1>) overlap


The section 3.1 say "A server MAY ignore the Range header field."



   For example:

 HTTP/1.1 416 Range Not Satisfiable
 Date: Fri, 20 Jan 2012 15:41:54 GMT
 Content-Range: bytes */47022

  Note: Because servers are free to ignore Range, many
  implementations will simply respond with the entire selected
  representation in a 200 (OK) response.  That is partly because


This is what Nginx and Apache do, if you specify range when the file has 
0 bytes they will return 200.
So they are ignore range with 0 bytes files but not when the bytes is 
grater than 0.



On 21. 03 2022, at 15:11, Ulrich Klein  
wrote:


With a bit of HTTP background I’d say:
bytes=0-100 means: First byte to to 100nd byte. First byte is 
byte #0
On an empty object there is no first byte, i.e. not satisfiable ==> 
416


Should be the same as on a single byte object and
bytes=1-100

200 OK should only be correct, if the server or a proxy in between 
doesn’t support range requests.


After reading your text and links I do concur that returning 416 with 0 
bytes with range bytes=0-100 is not wrong,
but I also believe that it would be correct to return 200 OK as Nginx 
and Apache do, since range can be ignored.


I think our user of Ceph is used to how Nginx and Apache works and that 
is the reason they wondered if it was something wrong with Ceph.


So I think the answer to them will be, It's according to spec but you 
can always put in a feature request.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RadosGW S3 range on a 0 byte object gives 416 Range Not Satisfiable

2022-03-21 Thread Kai Stian Olstad

Hi

Ceph v16.2.6.

Using GET with Range: bytes=0-100 it fails with 416 if the object is 
0 byte.
I tried reading the http specification[1] on the subject but did not get 
any wiser unfortunately.


I did a test with curl and range against a 0 byte file on Nginx and it 
returned 200 OK.


Does anyone know it's correct to return 416 on 0 byte object with range 
or should this be considered a bug in Ceph.



[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace HDD with cephadm

2022-03-16 Thread Kai Stian Olstad

On 15.03.2022 10:10, Jimmy Spets wrote:

Thanks for your reply.
I have two things that I am unsure of:
- Is the OSD UUID the same for all OSD:s or should it be unique for 
each?


It's unique and generated when you run ceph-volume lvm prepare or add an 
OSD.


You can find OSD UUID/FSID for existing OSD in /var/lib/ceph/FSID>/osd./fsid



- Have I understood correctly that in your example the OSD is not 
encrypted?


Yes, it's not encrypted.

--
Kai Stian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd namespace create - operation not supported

2022-03-15 Thread Kai Stian Olstad

On 11.03.2022 14:04, Ilya Dryomov wrote:
On Fri, Mar 11, 2022 at 8:04 AM Kai Stian Olstad  
wrote:


Isn't namespace supported with erasure encoded pools?


RBD images can't be created in EC pools, so attempting to create RBD
namespaces there is pointless.  The way to store RBD image data in
an EC pool is to create an image in a replicated pool (possibly in
a custom namespace) and specify --data-pool:

  $ rbd namespace create --pool rep3 --namespace testspace
  $ rbd create --size 10G --pool rep3 --namespace testspace --data-pool 
ec42 --image testimage


This worked like a charm.



The image metadata (header object, etc) would be stored in rep3
(replicated pool), while the data objects would go to ec42 (EC pool).


I see the meta pool is using OMAP so I guess that's the reason it need 
to be a replicated pool, makes sense.


Thank you for the help Ilya.

--
Kai Stian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replace HDD with cephadm

2022-03-11 Thread Kai Stian Olstad

On 10.03.2022 14:48, Jimmy Spets wrote:

I have a Ceph Pacific cluster managed by cephadm.

The nodes have six HDD:s and one NVME that is shared between the six
HDD:s.

The OSD spec file looks like this:

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0
  size: '800G:1200G'
db_slots: 6
encrypted: true

I need to replace one of the HDD:s that is broken.

How do I replace the HDD in the OSD connecting it to the old HDD:s
db_slot?


Last time I tried, cephadm could not replace a disk where the db was on 
a separate drive.

It would just add is as a new OSD without the db on a separate disk.
So to avoid this, remove all the active OSD spec so the disk wont be 
added automatically by cephadm.

Then you need to manual add the disk.
This is unfortunately not described anywhere, but the procedure I follow 
is this and the osd is osd.152



Find the VG og LV of the block db for the OSD.
  root@osd-host:~# ls -l /var/lib/ceph/*/osd.152/block.db
  lrwxrwxrwx 1 167 167 90 Dec  1 12:58 
/var/lib/ceph/b321e76e-da3a-11eb-b75c-4f948441dcd0/osd.152/block.db -> 
/dev/ceph-10215920-77ea-4d50-b153-162477116b4c/osd-db-25762869-20d5-49b1-9ff4-378af8f679c4


  VG = ceph-10215920-77ea-4d50-b153-162477116b4c
  LV = osd-db-25762869-20d5-49b1-9ff4-378af8f679c4

If you have already removed it, you'll find it in 
/var/lib/ceph/*/removed/



Then you remove the OSD.
  root@admin:~# ceph orch osd rm 152 --replace
  Scheduled OSD(s) for removal

When the disk is removed from Ceph you can replace it with a new one.
Look in dmesg what the new disk is named, in my case it's /dev/sdt


Prepare the new disk

  root@osd-host:~# cephadm shell

  root@osd-host:/# ceph auth get client.bootstrap-osd 
>/var/lib/ceph/bootstrap-osd/ceph.keyring

  exported keyring for client.bootstrap-osd

  # Here you need to use the VG/LV you found above so you can reuse the 
db volume.
  root@osd-host:~# ceph-volume lvm prepare --bluestore --no-systemd 
--osd-id 152 --data /dev/sdt --block.db 
ceph-10215920-77ea-4d50-b153-162477116b4c/osd-db-25762869-20d5-49b1-9ff4-378af8f679c4

  < removed some output >
  Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore 
bluestore --mkfs -i 152 --monmap 
/var/lib/ceph/osd/ceph-152/activate.monmap --keyfile - 
--bluestore-block-db-path 
/dev/ceph-10215920-77ea-4d50-b153-162477116b4c/osd-db-25762869-20d5-49b1-9ff4-378af8f679c4 
--osd-data /var/lib/ceph/osd/ceph-152/ --osd-uuid 
517213f3-0715-4d23-8103-6a34b1f8ef08 --setuser ceph --setgroup ceph
 stderr: 2021-12-01T11:50:33.613+ 7ff013614080 -1 
bluestore(/var/lib/ceph/osd/ceph-152/) _read_fsid unparsable uuid

  --> ceph-volume lvm prepare successful for: /dev/sdt

Here you need the --osd-uuid which is 
517213f3-0715-4d23-8103-6a34b1f8ef08



Then you need a json file containing ceph info and osd authentication, 
this file can be created like this
  root@admin:~# printf '{\n"config": "%s",\n"keyring": "%s"\n}\n' 
"$(ceph config generate-minimal-conf | sed -e ':a;N;$!ba;s/\n/\\n/g' -e 
's/\t/\\t/g' -e 's/$/\\n/')" "$(ceph auth get osd.152 | head -n 2 | sed 
-e ':a;N;$!ba;s/\n/\\n/g' -e 's/\t/\\t/g' -e 's/$/\\n/')" 
>config-osd.152.json
You might need to copy the json file to the OSD-host depending on where 
you run the command.


The --osd-uuid above is the same at --osd-fsid in this command, thank 
you for consistent naming.
  root@osd-host:~# cephadm deploy --fsid  --name osd.152 
--config-json config-osd.152.json --osd-fsid 
517213f3-0715-4d23-8103-6a34b1f8ef08


And then the OSD should be back up and running.

This is the way I have found to do OSD replacement, it might be an 
easier way of doing it but I have not found that.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd namespace create - operation not supported

2022-03-10 Thread Kai Stian Olstad

Hi

I'm trying to create namespace in an rbd pool, but get operation not 
supported.

This is on a 16.2.6 Cephadm installed on Ubuntu 20.04.3.

The pool is erasure encoded and the commands I run was the following.

cephadm shell

ceph osd pool create rbd 32 32 erasure ec42-jerasure-blaum_roth-hdd 
--autoscale-mode=warn

ceph osd pool set rbd allow_ec_overwrites true
rbd pool init --pool rbd

rbd namespace create --pool rbd --namespace testspace
rbd: failed to created namespace: (95) Operation not supported
2022-03-11T06:13:30.570+ 7f4a9426e2c0 -1 librbd::api::Namespace: 
create: failed to add namespace: (95) Operation not supported



Isn't namespace supported with erasure encoded pools?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-24 Thread Kai Stian Olstad
On Wed, Feb 23, 2022 at 12:02:53PM +, Adam Huffman wrote:
> On Wed, 23 Feb 2022 at 11:25, Eugen Block  wrote:
> 
> > How exactly did you determine that there was actual WAL data on the HDDs?
> >
> I couldn't say exactly what it was, but 7 or so TBs was in use, even with
> no user data at all.

When you have DB on a separate disk the DB size count towards total size of the
osd. But this DB space is considered used so you will see a lot of used space.

-- 
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The Return of Ceph Planet

2022-02-05 Thread Kai Stian Olstad

On 04.02.2022 00:00, Mike Perez wrote:

If you have a Ceph category feed you would like added; please email me
your RSS feed URL.


While you are mention RSS, any reason for the RSS feed on the ceph.com 
blog/news was removed?


It used to be https://ceph.com/community/blog/feed/ but after the change 
I can't find the feed URL.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: airgap install

2021-12-21 Thread Kai Stian Olstad

On 21.12.2021 09:41, Marc wrote:

I have also an 'airgapped install' but with rpm's, simply cloning the
necessary repositories. Why go through all these efforts trying to get
this to work via containers?


For me that is completely new to Ceph started with the documentation[1] 
and the recommended method is Cephadm or Rook, so I those Cephadm.
Unfortunately I do regret it, not because of the container mirroring 
since that is the easy part, but because of lacking documentation, 
lacking feature like replacing disk(where DB is on a shared SSD), bugs 
and other quirks.


Cephadm is not what I would consider stable and ready for production, so 
if I had to choose today it would not be Cephadm, but more likely manual 
install from deb with my own Ansible code or ceph-ansible.

Because then I would have a lot of documentation on how to solve things.


[1] https://docs.ceph.com/en/pacific/install/index.html


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: airgap install

2021-12-20 Thread Kai Stian Olstad

On 17.12.2021 11:06, Zoran Bošnjak wrote:

Kai, thank you for your answer. It looks like the "ceph config set
mgr..." commands are the key part, to specify my local registry.
However, I haven't got that far with the installation. I have tried
various options, but I have problems already with the bootstrap step.

I have documented the procedure (and the errors) here:
https://github.com/zoranbosnjak/ceph-install#readme

Would you please have a look and suggest corrections.


I have looked it over and checked the cephadm source code.



Ideally, I would like to run administrative commands from a dedicated
(admin) node... or alternatively to setup mon nodes to be able to run
administrative commands...


The bootstrap command you need to run on one of the nodes and/or the 
node you want the monitor to run on.
After that you can install cephadm or ceph-common to use you admin node 
for the rest.


So the error you get is this
  Non-zero exit code 22 from /usr/bin/docker run --rm --ipc=host 
--net=host --entrypoint /usr/bin/ceph -e 
CONTAINER_IMAGE=admin:5000/ceph/ceph:v16 -e NODE_NAME=node01 -v 
/var/log/ceph/da017daa-5f18-11ec-a05c-37b574681fc7:/var/log/ceph:z -v 
/tmp/ceph-tmph9jxliaz:/etc/ceph/ceph.client.admin.keyring:z -v 
/tmp/ceph-tmpsecch5kc:/etc/ceph/ceph.conf:z admin:5000/ceph/ceph:v16 
orch host add node01
  /usr/bin/ceph: stderr Error EINVAL: Can not automatically resolve ip 
address of host where active mgr is running. Please explicitly provide 
the address.


It is trying to find the IP address for the node01 but fails to do so.
So you need to look into you DNS settings so it possible to determine 
the IP for a hostname.


Checking for the IP is a reason change(16.2.6 or .7) 
https://github.com/ceph/ceph/pull/42772 to close this issue 
https://tracker.ceph.com/issues/51667



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: airgap install

2021-12-16 Thread Kai Stian Olstad
On Mon, Dec 13, 2021 at 06:18:55PM +, Zoran Bošnjak wrote:
> I am using "ubuntu 20.04" and I am trying to install "ceph pacific" version 
> with "cephadm".
> 
> Are there any instructions available about using "cephadm bootstrap" and 
> other related commands in an airgap environment (that is: on the local 
> network, without internet access)?

Unfortunately they say cephadm is stable but I would call it beta because of
lacking feature, bugs and missing documentation.

I can give you some pointers.

The best source to find the images you need is in cephadm code and for 16.2.7
you find it here [1].

cephadm bootstrap has the --image option to specify what image to use.
I also run the bootstrap with --skip-monitoring-stack, if not it fails since it
can't find the images.

After that you can update the monitor containers to you registry.
cephadm shell
ceph config set mgr mgr/cephadm/container_image_prometheus 
ceph config set mgr mgr/cephadm/container_image_node_exporter 
ceph config set mgr mgr/cephadm/container_image_grafana 
ceph config set mgr mgr/cephadm/container_image_alertmanager 

Check the result with
ceph config get mgr

To deploy the monitoring
ceph mgr module enable prometheus
ceph orch apply node-exporter '*'
ceph orch apply alertmanager --placement ...
ceph orch apply prometheus --placement ...
ceph orch apply grafana --placement ...


This should be what you need to get Ceph running in an isolated network.

[1] https://github.com/ceph/ceph/blob/v16.2.7/src/cephadm/cephadm#L50-L61

-- 
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-22 Thread Kai Stian Olstad

On 21.09.2021 09:11, Kobi Ginon wrote:

for sure the balancer affects the status


Of course, but setting several PG to degraded is something else.



i doubt that your customers will be writing so many objects in the same
rate of the Test.


I only need 2 host running rados bench to get several PG in degrade 
state.




maybe you need to play with the balancer configuration a bit.


Maybe, but a balancer should not set the cluster health to warning with 
several PG in degraded state.
It should be possible to do this cleanly, copy data and delete the 
source when copy is OK.




Could start with this
The balancer mode can be changed to crush-compat mode, which is 
backward

compatible with older clients, and will make small changes to the data
distribution over time to ensure that OSDs are equally utilized.
https://docs.ceph.com/en/latest/rados/operations/balancer/


I will probably just turn it off before I set the cluster in production.



side note: i m using indeed an old version of ceph ( nautilus)+ blancer
configured
and runs rado benchmarks , but did not saw such a problem.
on the other hand i m not using pg_autoscaler
i set the pools PG number in advanced according to assumption of the
percentage each pool will be using
Could be that you do use this Mode and the combination of auto scaler 
and

balancer is what reveals this issue


If you look at my initial post you will se that the pool is created with 
--autoscale-mode=off
The cluster is running 16.2.5 and is empty except for one pool with one 
PG created by Cephadm.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-20 Thread Kai Stian Olstad

On 17.09.2021 16:10, Eugen Block wrote:
Since I'm trying to test different erasure encoding plugin and  
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my  surprise 
I did not get any degraded messages at all, and the cluster  was in 
HEALTH_OK the whole time.


Interesting, maybe the balancer works differently now? Or it works
differently under heavy load?


It would be strange that the balancer normal operation is to put the 
cluster in degraded mode.




The only suspicious lines I see are these:

 Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug
2021-09-17T06:30:01.402+ 7f66b0329700  1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f66b0329700' had timed out
after 0.0s

But I'm not sure if this is related. The out OSDs shouldn't have any
impact on this test.

Did you monitor the network saturation during these tests with iftop
or something similar?


I did not, so I rerun the test this morning.

All the servers have 2x25Gbit/s NIC in bonding with LACP 802.3ad 
layer3+4.


The peak on the active monitor was 27 Mbit/s and less on the other 2 
monitors.
I also checked the CPU(Xeon 5222 3.8 GHz) and non of the cores was 
saturated,

and network statistics show no errors or drops.


So perhaps there is a bug in the balancer code?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-17 Thread Kai Stian Olstad

On 16.09.2021 15:51, Josh Baergen wrote:

I assume it's the balancer module. If you write lots of data quickly
into the cluster the distribution can vary and the balancer will try
to even out the placement.


The balancer won't cause degradation, only misplaced objects.


Since I'm trying to test different erasure encoding plugin and technique 
I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my surprise I 
did not get any degraded messages at all, and the cluster was in 
HEALTH_OK the whole time.




Degraded data redundancy: 260/11856050 objects degraded
(0.014%), 1 pg degraded


That status definitely indicates that something is wrong. Check your
cluster logs on your mons (/var/log/ceph/ceph.log) for the cause; my
guess is that you have OSDs flapping (rapidly going down and up again)
due to either overload (disk or network) or some sort of
misconfiguration.


So I enabled the balancer and run the rados bench again and the degraded 
messages is back.


I guess the equivalent log to /var/log/ceph/ceph.log in Cephadm is
  journalctl -u 
ceph-b321e76e-da3a-11eb-b75c-4f948441...@mon.pech-mon-1.service


There are no messages about osd being marked down, so I don't understand 
why this is happening.

I probably need to raise some verbose value.

I have attach the log from journalctl, it start at 06:30:00 when I 
started the rados bench and included a few lines after the first degrade 
message at 06:31.06.
Just be aware that 15 OSD is set to out, since I have some problem with 
the a HBA on one host, all test has been done with those 15 OSD in 
status out.


--
Kai Stian OlstadSep 17 06:30:00 pech-mon-1 conmon[1337]: debug 2021-09-17T06:29:59.994+ 
7f66b232d700  0 log_channel(cluster) log [INF] : overall HEALTH_OK
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster 
2021-09-17T06:29:59.317530+ mgr.pech-mon-1.ptrsea
Sep 17 06:30:00 pech-mon-1 conmon[1337]:  (mgr.245802) 345745 : cluster [DBG] 
pgmap v347889: 1025 pgs: 1025 active+clean; 0 B data, 73 TiB used, 2.8 PiB / 
2.9 PiB avail
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster 
2021-09-17T06:30:00.000143+ mon.pech-mon-1 (mon.0) 1166236 : 
Sep 17 06:30:00 pech-mon-1 conmon[1337]: cluster [INF] overall HEALTH_OK
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": "12.6d", 
"id": [293, 327]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.6d", "id": [293, 327]}]: dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.144", "id": [307, 351]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.318+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.144", "id": [307, 351]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.17d", "id": [144, 136]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.17d", "id": [144, 136]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.1a2", "id": [199, 189]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.1a2", "id": [199, 189]}]: 
dispatch
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 mon.pech-mon-1@0(leader) e7 handle_command 
mon_command({"prefix": "osd pg-upmap-items", "format": "json", "pgid": 
"12.1e1", "id": [289, 344]} v 0) v1
Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug 2021-09-17T06:30:01.322+ 
7f66afb28700  0 log_channel(audit) log [INF] : from='mgr.245802 
10.0.1.10:0/136830414' entity='mgr.pech-mon-1.ptrsea' cmd=[{"prefix": "osd 
pg-upmap-items", "format": "json", "pgid": "12.1e1", "id": [289, 344]}]: 
dispatch
Sep 17 06:30:01 

[ceph-users] Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-16 Thread Kai Stian Olstad

Hi

I'm testing a Ceph cluster with "rados bench", it's an empty Cephadm 
install that only has one pool device_health_metrics.


Create a pool with 1024 pg on the hdd devices(15 servers has HDDs and 13 
has SSDs)
ceph osd pool create pool-ec32-isa-reed_sol_van-hdd 1024 1024 erasue 
ec32-isa-reed_sol_van-hdd --autoscale-mode=off


I then run "rados bench" from the 13 SSD hosts at the same time.
rados bench -p pool-ec32-isa-reed_sol_van-hdd 600 write --no-cleanup

After just a few seconds "ceph -s" starts to reports degraded data 
redundancy


Here is some examples during the 10 minutes testing period
Degraded data redundancy: 260/11856050 objects degraded (0.014%), 1 
pg degraded
Degraded data redundancy: 260/1856050 objects degraded (0.014%), 1 
pg degraded

Degraded data redundancy: 1 pg undersized
Degraded data redundancy: 1688/3316225 objects degraded (0.051%), 3 
pgs degraded
Degraded data redundancy: 5457/7005845 objects degraded (0.078%), 3 
pgs degraded, 9 pgs undersized

Degraded data redundancy: 1 pg undersized
Degraded data redundancy: 4161/7005845 objects degraded (0.059%), 3 
pgs degraded
Degraded data redundancy: 4315/7005845 objects degraded (0.062%), 2 
pgs degraded, 4 pgs undersized



So my question is, it normal that Ceph report degraded under normal use?
or do I have a problem somewhere that I need to investigate?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-06 Thread Kai Stian Olstad

On 04.08.2021 20:31, Ernesto Puerta wrote:

Could you please go to the Prometheus UI and share the output of the
following query "node_network_mtu_bytes"? That'd be useful to 
understand

the issue. If you can open a tracker issue here:
https://tracker.ceph.com/projects/dashboard/issues/new ?


Found a issue reported under MGR
https://tracker.ceph.com/issues/52028 - mgr/dashboard: Incorrect MTU 
mismatch warning


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MTU mismatch error in Ceph dashboard

2021-08-04 Thread Kai Stian Olstad

On 04.08.2021 22:06, Paul Giralt (pgiralt) wrote:


I did notice that docker0 has an MTU of 1500 as do the eno1 and eno2
interfaces which I’m not using. I’m not sure if that’s related to the
error. I’ve been meaning to try changing the MTU on the eno interfaces
just to see if that makes a difference but haven’t gotten around to
it.


If you look at the message it says which interface it is.

It does check and report on all the interfaces, even those that is in 
DOWN state which it shouldn't.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and multipath.

2021-07-29 Thread Kai Stian Olstad

Hi Peter

Please remember to include the list address in your reply.
I will not trim so people on the list can read you answer.


On 29.07.2021 12:43, Peter Childs wrote:
On Thu, 29 Jul 2021 at 10:37, Kai Stian Olstad  
wrote:



A little disclaimer, I have never used multipath with Ceph.

On 28.07.2021 20:19, Peter Childs wrote:
> I have a number of disk trays, with 25 ssd's in them, these are
> attached to
> my servers via a pair of sas cables, so that multipath is used to join
> the
> together again and maximize speed etc.
>
> Using cephadm how can I create the osd's?

You can use the commands in the documentation [1] "ceph orch daemon 
add

osd :"
But you need to configure the LVM correctly to make this work.



That was my thought, but it was not working, but now it is

vgcreate test /dev/mapper/mpatha
lvcreate -l 190776 -n testlv test
ceph orch daemon add osd dampwood18:test/testlv
  Created osd(s) 1361 on host 'dampwood18'

I think I can live with that I think there is room for improvement 
here,

but I'm happy with creating the vgs and lvs before I use the disks.


If you could not run
  cephadm shell ceph orch daemon add osd dampwood18:/dev/mapper/mpatha
I would consider that a bug.



> It looks like it should be possible to use ceph-volume but I've not
> really
> worked out yet how to access ceph-volume within cephadm. Even if I've
> got
> to format them with lvm first. (The docs are slightly confusing here)
>
> It looks like the ceph disk inventory system can't cope with multipath?

If by "ceph disk inventory system" you mean OSD service 
specification[2]

then yes, I don't think it's possible to use it with multipath.


When you add a disk to Ceph with cephadm it will use LVM to create a
Physical Volume(PV) of that device and create Volume Group(VG) on the
disk and then create a Logical Volume(LV) that use the whole VG.
And the configuration in Ceph reference the VG/LV so Ceph should not
have a problem with multipath.

But since you have multipath, LVM might have a problem with that if 
not

configured correctly.
LVM will scan disk for LVM signature and try to create the devices for
the LV it finds.

So you need to make sure that the LVM only scan the multipath device
paths and not the individual disk the OS sees.




Hmm I think we might have "room for improvement" in this area,

Either the osd spec needs to include all the options for weird disks 
that

people might come up with, and allocating them to classes as well,


There are lot of limitation in the OSD service spec and handling drives 
in Cephadm.
Just try to replace a HDD disk with the DB on a SSD, that is a pain at 
the moment.




or all the options available to ceph-volume need to be exposed to
orchestration which would also working, currently it feels like some of 
the
complex options in ceph are not available to cephadm yet and you need 
to

work out how to do it.


You have "cephadm ceph-volume" or you could run "cephadm shell" and then 
run all the ceph commands.




I'm new to ceph and I like the theory having come from a Spectrum Scale
background, and I'm still trying to get to grips with how things work.

My Ceph cluster has got 3 types of drive, these multipathed 800G ssds,
Disks on nodes with lots of memory (256G between 30Disks) and Disks on
nodes with very little memory (48G between 60Disks) hence why I was
trying to get disk specs to work. I've actually got it working with 
a
little kernel tuning and must get around to writing it up so I can 
share

where I've got to..


As mention the OSD service spec has a lot of limitation.

The default memory size for an OSD is 4GB, so your 48 GB/60 disks would 
need some configuration and I'm not sure if it's feasible to run them 
with so little memory.




Thanks

Peter



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm and multipath.

2021-07-29 Thread Kai Stian Olstad

A little disclaimer, I have never used multipath with Ceph.

On 28.07.2021 20:19, Peter Childs wrote:
I have a number of disk trays, with 25 ssd's in them, these are 
attached to
my servers via a pair of sas cables, so that multipath is used to join 
the

together again and maximize speed etc.

Using cephadm how can I create the osd's?


You can use the commands in the documentation [1] "ceph orch daemon add 
osd :"

But you need to configure the LVM correctly to make this work.


It looks like it should be possible to use ceph-volume but I've not 
really
worked out yet how to access ceph-volume within cephadm. Even if I've 
got

to format them with lvm first. (The docs are slightly confusing here)

It looks like the ceph disk inventory system can't cope with multipath?


If by "ceph disk inventory system" you mean OSD service specification[2] 
then yes, I don't think it's possible to use it with multipath.



When you add a disk to Ceph with cephadm it will use LVM to create a 
Physical Volume(PV) of that device and create Volume Group(VG) on the 
disk and then create a Logical Volume(LV) that use the whole VG.
And the configuration in Ceph reference the VG/LV so Ceph should not 
have a problem with multipath.


But since you have multipath, LVM might have a problem with that if not 
configured correctly.
LVM will scan disk for LVM signature and try to create the devices for 
the LV it finds.


So you need to make sure that the LVM only scan the multipath device 
paths and not the individual disk the OS sees.



[1] https://docs.ceph.com/en/latest/cephadm/osd/#creating-new-osds
[2] 
https://docs.ceph.com/en/latest/cephadm/osd/#advanced-osd-service-specifications



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: How to remove a stray daemon ghost

2021-07-23 Thread Kai Stian Olstad

On 22.07.2021 13:56, Kai Stian Olstad wrote:

Hi

I have a warning that says
"1 stray daemon(s) not managed by cephadm"

What i did is the following.
I have 3 nodes that the mon should run on, but because of a bug in
16.2.4 I couldn't run on then since they are in different subnet.
But this was fixed in 16.2.5 so i upgraded without issues.

but i got a health warning
root@pech-mon-1:~# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon mon.pech-mds-1 on host pech-cog-1 not managed by 
cephadm


I think this relates to this issue
https://tracker.ceph.com/issues/50272

I restart the active mgr and the other mgr become active and the stray 
message went away.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm: How to remove a stray daemon ghost

2021-07-22 Thread Kai Stian Olstad

Hi

I have a warning that says
"1 stray daemon(s) not managed by cephadm"

What i did is the following.
I have 3 nodes that the mon should run on, but because of a bug in 
16.2.4 I couldn't run on then since they are in different subnet.

But this was fixed in 16.2.5 so i upgraded without issues.

Before I started it looked like this

root@pech-mon-1:~# ceph orch ps | grep ^mon
NAME HOSTPORTS  STATUS REFRESHED  AGE  MEM 
USE  MEM LIM  VERSION  IMAGE ID  CONTAINER ID
mon.pech-cog-1   pech-cog-1 running (23h) 9m ago   3w
1182M2048M  16.2.5   6933c2a0b7dd  b226c1714777
mon.pech-mds-1   pech-mds-1 running (23h) 7m ago   3w
1147M2048M  16.2.5   6933c2a0b7dd  40f8e268afca
mon.pech-mon-1   pech-mon-1 running (23h) 2m ago   3w
1161M2048M  16.2.5   6933c2a0b7dd  b358057dcb3a



To place the daemon on correct hosts I run this
root@pech-mon-1:~# ceph orch apply mon pech-mon-1,pech-mon-2,pech-mon-3
Scheduled mon update...


And that worked fine.
root@pech-mon-1:~# ceph orch ps |grep ^mon
NAME HOSTPORTS  STATUS REFRESHED  AGE  MEM 
USE  MEM LIM  VERSION  IMAGE ID  CONTAINER ID
mon.pech-mon-1   pech-mon-1 running (23h) 6s ago   3w
1360M2048M  16.2.5   6933c2a0b7dd  b358057dcb3a
mon.pech-mon-2   pech-mon-2 running (13s) 6s ago  13s 
287M2048M  16.2.5   6933c2a0b7dd  25a68933c119
mon.pech-mon-3   pech-mon-3 running (11s) 6s ago  11s 
241M2048M  16.2.5   6933c2a0b7dd  be0c6e5a5fdf



but i got a health warning
root@pech-mon-1:~# ceph health detail
HEALTH_WARN 1 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
stray daemon mon.pech-mds-1 on host pech-cog-1 not managed by 
cephadm


The strange thing is daemon mon.pech-mds-1 has never run on pech-cog-1.
And the problem is that I can not find this supposedly stray damon.


With ansible I run "podman ps" on all nodes and removed the osd, node 
and crash damone from the output


$ ansible pech -u root -m shell -a "podman ps" | grep ceph | awk '{ 
print $NF }' | egrep -v "osd|node|crash" | sort

ceph--alertmanager.pech-mds-1
ceph--grafana.pech-cog-2
ceph--mgr.pech-mon-1.ptrsea
ceph--mgr.pech-mon-2.mfdanx
ceph--mon.pech-mon-1
ceph--mon.pech-mon-2
ceph--mon.pech-mon-3
ceph--prometheus.pech-mds-1

No stray daemon here


also with ansible I run "cephadm ls" on all of them and removed the osd, 
node and crash damone from the output


$ ansible pech -u root -m shell -a "cephadm ls | jq .[].name" | grep 
'^"' | egrep -v "osd|node|crash" | sort

"alertmanager.pech-mds-1"
"grafana.pech-cog-2"
"mgr.pech-mon-1.ptrsea"
"mgr.pech-mon-2.mfdanx"
"mon.pech-mon-1"
"mon.pech-mon-2"
"mon.pech-mon-3"
"prometheus.pech-mds-1"

No stray daemon here either.

Does anyone know how to find this supposedly stray daemon?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Having issues to start more than 24 OSDs per host

2021-07-01 Thread Kai Stian Olstad

On 22.06.2021 17:27, David Orman wrote:

https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62

If you're brave (YMMV, test first non-prod), we pushed an image with
the issue we encountered fixed as per above here:
https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 that
you can use to install with.


Thank you David.
I could not add 1 host with 15 HDD and 3 SSD without it hanging forever.
I used your patch and created a new container and could add in 15 hosts 
15 HDD and 3 SSD in each without any issue.



(I'm a little confused why a breaking install/upgrade issue like this 
has been allowed to sit)


You and me both.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad

On 27.05.2021 11:53, Eugen Block wrote:

This test was on ceph version 15.2.8.

On Pacific (ceph version 16.2.4) this also works for me for initial
deployment of an entire host:

+-+-+--+--+--+-+
|SERVICE  |NAME |HOST  |DATA  |DB|WAL  |
+-+-+--+--+--+-+
|osd  |ssd-hdd-mix  |pacific1  |/dev/vdb  |/dev/vdd  |-|
|osd  |ssd-hdd-mix  |pacific1  |/dev/vdc  |/dev/vdd  |-|
+-+-+--+--+--+-+

But it doesn't work if I remove one OSD, just like you describe. This
is what ceph-volume reports:

---snip---
[ceph: root@pacific1 /]# ceph-volume lvm batch --report /dev/vdc
--db-devices /dev/vdd --block-db-size 3G
--> passed data devices: 1 physical, 0 LVM
--> relative data size: 1.0
--> passed block_db devices: 1 physical, 0 LVM
--> 1 fast devices were passed, but none are available

Total OSDs: 0

  TypePath
LV Size % of device
---snip---

I know that this has already worked in Octopus, I did test it
successfully not long ago.


Thank you for trying, so it looks like a bug.
Searching through the issue tracker I find few issues related to 
replacing OSD, but it doesn't look like they get much attention.



I tried to find a way to add the disk manually, did not find any 
documentation about it, but looking at the source code, some issues with 
some trial and error I ended up with this.


Since the LV is deleted I recreated it with the same name.

# lvcreate -l 91570 -n osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69 
ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b


In "cephadm shell"
# cephadm shell
# ceph auth get client.bootstrap-osd 
>/var/lib/ceph/bootstrap-osd/ceph.keyring
# ceph-volume lvm prepare --bluestore --no-systemd --data /dev/sdt 
--block.db 
ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69



Need to have a json file for the "cephadm deploy"
# printf '{\n"config": "%s",\n"keyring": "%s"\n}\n' "$(ceph config 
generate-minimal-conf | sed -e ':a;N;$!ba;s/\n/\\n/g' -e 's/\t/\\t/g' -e 
's/$/\\n/')" "$(ceph auth get osd.178 | head -n 2 | sed -e 
':a;N;$!ba;s/\n/\\n/g' -e 's/\t/\\t/g' -e 's/$/\\n/')" 
>config-osd.178.json



Exit cephadm shell and run
# cephadm --image ceph:v15.2.9 deploy --fsid 
3614abcc-201c-11eb-995a-2794bcc75ae0 --config-json 
/var/lib/ceph/3614abcc-201c-11eb-995a-2794bcc75ae0/home/config-osd.178.json 
--osd-fsid 9227e8ae-92eb-429e-9c7f-d4a2b75afb8e



And the OSD is back, but the VG name on the HDD is missing block in it's 
name, just a cosmetic thing so I leave it as is.


  LVVG   
   Attr   LSize
  osd-block-9227e8ae-92eb-429e-9c7f-d4a2b75afb8e
ceph-46f42262-d3dc-4dc3-8952-eec3e4a2c178   -wi-ao   12.47t
  osd-block-2da790bc-a74c-41da-8772-3b8aac77001c    
ceph-block-1b5ad7e7-2e24-4315-8a05-7439ab782b45 -wi-ao   12.47t


The fist one is the new OSD and the second one is one that cephadm 
itself created.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad

On 27.05.2021 11:17, Eugen Block wrote:

That's not how it's supposed to work. I tried the same on an Octopus
cluster and removed all filters except:

data_devices:
  rotational: 1
db_devices:
  rotational: 0

My Octopus test osd nodes have two HDDs and one SSD, I removed all
OSDs and redeployed on one node. This spec file results in three
standalone OSDs! Without the other filters this won't work as
expected, it seems. I'll try again on Pacific with the same test and
see where that goes.


This spec did worked for me when I initially deployed with Octopus 
15.2.5.


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad

On 27.05.2021 10:46, Eugen Block wrote:

Hi,

The VG has 357.74GB of free space of total 5.24TB so I did actually  
tried different values like "30G:", "30G", "300G:", "300G", "357G".
I also tied some crazy high numbers and some ranges, but don't  
remember the values. But none of them worked.


the size parameter is filtering the disk size, not the size you want
the db to have (that's block_db_size). Your SSD disk size is 1.8 TB so
 your specs could look something like this:

block_db_size: 360G
data_devices:
  size: "12T:"
  rotational: 1
db_devices:
  size: ":2T"
  rotational: 0
filter_logic: AND
...
But I was under the impression that this all should of course work
with just the rotational flags, I'm confused that it doesn't. Can you
try with these specs to see if you get the OSD deployed?


I tried this one

hdd-test-from-eugen.yml
---
service_type: osd
service_id: hdd
placement:
  host_pattern: 'pech-hd-*'
block_db_size: 360G
data_devices:
  size: "12T:"
  rotational: 1
db_devices:
  size: ":2T"
  rotational: 0
filter_logic: AND

But it doesn't find any disk.

I also tried this, but with the same result.

service_type: osd
service_id: hdd
placement:
  host_pattern: 'pech-hd-*'
block_db_size: 360G
data_devices:
  rotational: 1
db_devices:
  rotational: 0
filter_logic: AND



I'll try again with Octopus to see if I see similar behaviour.


Very much appreciated, thanks.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad

On 26.05.2021 22:14, David Orman wrote:

We've found that after doing the osd rm, you can use: "ceph-volume lvm
zap --osd-id 178 --destroy" on the server with that OSD as per:
https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/#removing-devices
and it will clean things up so they work as expected.


With the help of Eugen I did run "cephadm ceph-volume lvm zap --destroy 
" and the LV is gone.
I think that is the same result as "ceph-volume lvm zap --osd-id 178 
--destroy" would give me?


I now have 357GB free space on the VG, but Cephadm doesn't find and use 
this space.

Above it the result of the zap command and it show the LV is deleted.

$ sudo cephadm ceph-volume lvm zap --destroy 
/dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69

INFO:cephadm:Inferring fsid 3614abcc-201c-11eb-995a-2794bcc75ae0
INFO:cephadm:Using recent ceph image ceph:v15.2.9
INFO:cephadm:/usr/bin/podman:stderr --> Zapping: 
/dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69
INFO:cephadm:/usr/bin/podman:stderr Running command: /usr/bin/dd 
if=/dev/zero 
of=/dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69 
bs=1M count=10 conv=fsync

INFO:cephadm:/usr/bin/podman:stderr  stderr: 10+0 records in
INFO:cephadm:/usr/bin/podman:stderr 10+0 records out
INFO:cephadm:/usr/bin/podman:stderr  stderr: 10485760 bytes (10 MB, 10 
MiB) copied, 0.0195532 s, 536 MB/s
INFO:cephadm:/usr/bin/podman:stderr --> More than 1 LV left in VG, will 
proceed to destroy LV only
INFO:cephadm:/usr/bin/podman:stderr --> Removing LV because --destroy 
was given: 
/dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69
INFO:cephadm:/usr/bin/podman:stderr Running command: /usr/sbin/lvremove 
-v -f 
/dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69
INFO:cephadm:/usr/bin/podman:stderr  stdout: Logical volume 
"osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69" successfully removed
INFO:cephadm:/usr/bin/podman:stderr  stderr: Removing 
ceph--block--dbs--563432b7--f52d--4cfe--b952--11542594843b-osd--block--db--449bd001--eb32--46de--ab80--a1cbcd293d69 
(253:3)
INFO:cephadm:/usr/bin/podman:stderr  stderr: Archiving volume group 
"ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b" metadata (seqno 
61).
INFO:cephadm:/usr/bin/podman:stderr  stderr: Releasing logical volume 
"osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69"
INFO:cephadm:/usr/bin/podman:stderr  stderr: Creating volume group 
backup 
"/etc/lvm/backup/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b" 
(seqno 62).
INFO:cephadm:/usr/bin/podman:stderr --> Zapping successful for: /dev/ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b/osd-block-db-449bd001-eb32-46de-ab80-a1cbcd293d69>



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-27 Thread Kai Stian Olstad
, but I'm not sure if that's already the default, I
 remember that there were issues in Nautilus because the default was
OR. I tried this just recently with a version similar to this, I
believe it was 15.2.8 and it worked for me, but again, it's just a
tiny virtual lab cluster.


Yes, AND is default, I tried adding 'filter_logic: AND' but with the 
same result.


In you virtual lab cluster do you have multiple HDD sharing the same SSD 
as I do?


To me it looks like Cephadm can't find or use the 357.71GB free space on 
the VG, it can only find devices that is available.

Here is how my "orch device ls" is for that host

$ ceph orch device ls --wide | egrep "Hostname|hd-7"
HostnamePath  Type  Vendor   ModelSize   Available  
Reject Reasons

pech-hd-7   /dev/sdt  hdd   WDC  WUH721414AL5200  13.7T  Yes
pech-hd-7   /dev/sdb  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdc  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdd  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sde  ssd   SAMSUNG  MZILT1T9HAJQ0D3  1920G  No 
LVM detected, locked
pech-hd-7   /dev/sdf  ssd   SAMSUNG  MZILT1T9HAJQ0D3  1920G  No 
LVM detected, locked
pech-hd-7   /dev/sdg  ssd   SAMSUNG  MZILT1T9HAJQ0D3  1920G  No 
LVM detected, locked
pech-hd-7   /dev/sdi  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdj  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdk  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdl  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdm  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdn  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdo  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdp  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdq  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sdr  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked
pech-hd-7   /dev/sds  hdd   SEAGATE  ST14000NM016813.7T  No 
Insufficient space (<10 extents) on vgs, LVM detected, locked


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-26 Thread Kai Stian Olstad

On 26.05.2021 11:16, Eugen Block wrote:

Yes, the LVs are not removed automatically, you need to free up the
VG, there are a couple of ways to do so, for example remotely:

pacific1:~ # ceph orch device zap pacific4 /dev/vdb --force

or directly on the host with:

pacific1:~ # cephadm ceph-volume lvm zap --destroy 
/dev//


Thanks,

I used the cephadm command and deleted the LV and the VG now has free 
space


# vgs | egrep "VG|dbs"
  VG  #PV #LV #SN Attr   
VSize  VFree
  ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b   3  14   0 wz--n- 
<5.24t 357.74g


But it doesn't seams to be able to use it, because it can find anyting

# ceph orch apply osd -i hdd.yml --dry-run

OSDSPEC PREVIEWS

+-+--+-+--++-+
|SERVICE  |NAME  |HOST |DATA  |DB  |WAL  |
+-+--+-+--++-+
+-+--+-+--++-+

I tried adding size as you have in your configuration
db_devices:
  rotational: 0
  size: '30G:'

Still it was unable to create the OSD.

If I removed the : so it is 30GB exact size, it did find the disk, but 
DB is not placed on a SSD since I do not have one with 30 GB exact size


OSDSPEC PREVIEWS

+-+--+-+--++-+
|SERVICE  |NAME  |HOST |DATA  |DB  |WAL  |
+-+--+-+--++-+
|osd  |hdd   |pech-hd-7|/dev/sdt  |-   |-|
+-+--+-+--++-+


To me I looks like Cephadm can't use/find the free space on the VG and 
use that as a new LV for the OSD.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm: How to replace failed HDD where DB is on SSD

2021-05-26 Thread Kai Stian Olstad

On 26.05.2021 08:22, Eugen Block wrote:

Hi,

did you wipe the LV on the SSD that was assigned to the failed HDD? I
just did that on a fresh Pacific install successfully, a couple of
weeks ago it also worked on an Octopus cluster.


No, I did not wipe the LV.
Not sure what you mean by wipe, so I tried overwriting the LV with 
/dev/zero, but that did solve it.

So I guess with wipe do you mean delete the LV with lvremove?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm: How to replace failed HDD where DB is on SSD

2021-05-25 Thread Kai Stian Olstad

Hi

The server run 15.2.9 and has 15 HDD and 3 SSD.
The OSDs was created with this YAML file

hdd.yml

service_type: osd
service_id: hdd
placement:
  host_pattern: 'pech-hd-*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0


The result was that the 3 SSD is added to 1 VG with 15 LV on it.

# vgs | egrep "VG|dbs"
  VG  #PV #LV #SN Attr   
VSize  VFree
  ceph-block-dbs-563432b7-f52d-4cfe-b952-11542594843b   3  15   0 wz--n- 
<5.24t 48.00m



One of the osd failed and I run rm with replace

# ceph orch osd rm 178 --replace

and the result is

# ceph osd tree | grep "ID|destroyed"
ID   CLASS  WEIGHT  TYPE NAME STATUS REWEIGHT  
PRI-AFF
178hdd12.82390  osd.178   destroyed 0  
1.0



But I'm not able to replace the disk with the same YAML file as shown 
above.



# ceph orch apply osd -i hdd.yml --dry-run

OSDSPEC PREVIEWS

+-+--+--+--++-+
|SERVICE  |NAME  |HOST  |DATA  |DB  |WAL  |
+-+--+--+--++-+
+-+--+--+--++-+

I guess this is the wrong way to do it, but I can't find the answer in 
the documentation.

So how can I replace this failed disk in Cephadm?


--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

On 11.03.2021 15:47, Sebastian Wagner wrote:

yes

Am 11.03.21 um 15:46 schrieb Kai Stian Olstad:


To resolve it, could I just remove it with "cephadm rm-daemon"?


That worked like a charm, and the upgrade is resumed.

Thank you Sebastian.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

Hi Sebastian

On 11.03.2021 13:13, Sebastian Wagner wrote:

looks like

$ ssh pech-hd-009
# cephadm ls

is returning this non-existent OSDs.

can you verify that `cephadm ls` on that host doesn't
print osd.355 ?


"cephadm ls" on the node does list this drive

{
"style": "cephadm:v1",
"name": "osd.355",
"fsid": "3614abcc-201c-11eb-995a-2794bcc75ae0",
"systemd_unit": "ceph-3614abcc-201c-11eb-995a-2794bcc75ae0@osd.355",
"enabled": true,
"state": "stopped",
"container_id": null,
"container_image_name": 
"goharbor.example.com/library/ceph/ceph:v15.2.5",

"container_image_id": null,
"version": null,
"started": null,
"created": "2021-01-20T09:53:22.229080",
"deployed": "2021-02-09T09:24:02.855576",
"configured": "2021-02-09T09:24:04.211587"
}


To resolve it, could I just remove it with "cephadm rm-daemon"?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad
Before I started the upgrade the cluster was healthy but one 
OSD(osd.355) was down, can't remember if it was in or out.

Upgrade was started with
ceph orch upgrade start --image 
goharbor.example.com/library/ceph/ceph:v15.2.9


The upgrade started but when Ceph tried to upgrade osd.355 it paused 
with the following messages:


2021-03-11T09:15:35.638104+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id 
dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2

a542b0802fe77ba
2021-03-11T09:15:35.639882+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking mgr daemons...
2021-03-11T09:15:35.644170+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All mgr daemons are up to date.
2021-03-11T09:15:35.644376+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking mon daemons...
2021-03-11T09:15:35.647669+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All mon daemons are up to date.
2021-03-11T09:15:35.647866+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking crash daemons...
2021-03-11T09:15:35.652035+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Setting container_image for all crash...
2021-03-11T09:15:35.653683+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
All crash daemons are up to date.
2021-03-11T09:15:35.653896+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Checking osd daemons...
2021-03-11T09:15:36.273345+ mgr.pech-mon-2.cjeiyc [INF] It is 
presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273504+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
It is presumed safe to stop ['osd.355']
2021-03-11T09:15:36.273887+ mgr.pech-mon-2.cjeiyc [INF] Upgrade: 
Redeploying osd.355
2021-03-11T09:15:36.276673+ mgr.pech-mon-2.cjeiyc [ERR] Upgrade: 
Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host 
pech-hd-009 failed.



One of the first ting the upgrade did was to upgrade mon, so they are 
restarted and now the osd.355 no longer exist


$ ceph osd info osd.355
Error EINVAL: osd.355 does not exist

But if I run a resume
ceph orch upgrade resume
it still tries to upgrade osd.355, same message as above.

I tried to stop and start the upgrade again with
ceph orch upgrade stop
ceph orch upgrade start --image 
goharbor.example.com/library/ceph/ceph:v15.2.9

it still tries to upgrade osd.355, with the same message as above.

Looking at the source code it looks like it get daemons to upgrade from 
mgr cache, so I restarted both mgr but still it tries to upgrade 
osd.355.



Does anyone know how I can get the upgrade to continue?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io