[ceph-users] pgs stuck backfill_toofull

2020-10-28 Thread Mark Johnson
I've been struggling with this one for a few days now.  We had an OSD report as 
near full a few days ago.  Had this happen a couple of times before and a 
reweight-by-utilization has sorted it out in the past.  Tried the same again 
but this time we ended up with a couple of pgs in a state of backfill_toofull 
and a handful of misplaced objects as a result.

Tried doing the reweight a few more times and it's been moving data around.  We 
did have another osd trigger the near full alert but running the reweight a 
couple more times seems to have moved some of that data around a bit better.  
However, the original near_full osd doesn't seem to have changed much and the 
backfill_toofull pgs are still there.  I'd keep doing the 
reweight-by-utilization but I'm not sure if I'm heading down the right path and 
if it will eventually sort it out.

We have 14 pools, but the vast majority of data resides in just one of those 
pools (pool 20).  The pgs in the backfill state are in pool 2 (as far as I can 
tell).  That particular pool is used for some cephfs stuff and has a handful of 
large files in there (not sure if this is significant to the problem).

All up, our utilization is showing as 55.13% but some of our OSDs are showing 
as 76% in use with this one problem sitting at 85.02%.  Right now, I'm just not 
sure what the proper corrective action is.  The last couple of reweights I've 
run have been a bit more targetted in that I've set it to only function on two 
OSDs at a time.  If I run a test-reweight targetting only one osd, it does say 
it will reweight OSD 9 (the one at 85.02%).  I gather this will move data away 
from this OSD and potentially get it below the threshold.  However, at one 
point in the past couple of days, it's shown as no OSDs in a near full state, 
yet the two pgs in backfill_toofull didn't change.  So, that's why I'm not sure 
continually reweighting is going to solve this issue.

I'm a long way from knowledgable on Ceph so I'm not really sure what 
information is useful here.  Here's a bit of info on what I'm seeing.  Can 
provide anything else that might help.


Basically, we have a three node cluster but only two have OSDs.  The third is 
there simply to enable a quorum to be established.  The OSDs are evenly spread 
across these two needs and the configuration of each is identical.  We are 
running Jewel and are not in a position to upgrade at this stage.




# ceph --version
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)


# ceph health detail
HEALTH_WARN 2 pgs backfill_toofull; 2 pgs stuck unclean; recovery 33/62099566 
objects misplaced (0.000%); 1 near full osd(s)
pg 2.52 is stuck unclean for 201822.031280, current state 
active+remapped+backfill_toofull, last acting [17,3]
pg 2.18 is stuck unclean for 202114.617682, current state 
active+remapped+backfill_toofull, last acting [18,2]
pg 2.18 is active+remapped+backfill_toofull, acting [18,2]
pg 2.52 is active+remapped+backfill_toofull, acting [17,3]
recovery 33/62099566 objects misplaced (0.000%)
osd.9 is near full at 85%


# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
 2 1.37790  1.0  1410G   842G   496G 59.75 1.08  33
 3 1.37790  0.45013  1410G  1079G   259G 76.49 1.39  21
 4 1.37790  0.95001  1410G  1086G   253G 76.98 1.40  44
 5 1.37790  1.0  1410G   617G   722G 43.74 0.79  43
 6 1.37790  0.65009  1410G   616G   722G 43.69 0.79  39
 7 1.37790  0.95001  1410G   495G   844G 35.10 0.64  40
 8 1.37790  1.0  1410G   732G   606G 51.93 0.94  52
 9 1.37790  0.70007  1410G  1199G   139G 85.02 1.54  37
10 1.37790  1.0  1410G   611G   727G 43.35 0.79  41
11 1.37790  0.75006  1410G   495G   843G 35.11 0.64  32
 0 1.37790  1.0  1410G   731G   608G 51.82 0.94  43
12 1.37790  1.0  1410G   851G   487G 60.36 1.09  44
13 1.37790  1.0  1410G   378G   960G 26.82 0.49  38
14 1.37790  1.0  1410G   969G   370G 68.68 1.25  37
15 1.37790  1.0  1410G   724G   614G 51.35 0.93  35
16 1.37790  1.0  1410G   491G   847G 34.84 0.63  43
17 1.37790  1.0  1410G   862G   476G 61.16 1.11  50
18 1.37790  0.80005  1410G  1083G   255G 76.78 1.39  26
19 1.37790  0.65009  1410G   963G   375G 68.29 1.24  23
20 1.37790  1.0  1410G   724G   614G 51.38 0.93  42
  TOTAL 28219G 15557G 11227G 55.13
MIN/MAX VAR: 0.49/1.54  STDDEV: 15.57


# ceph pg ls backfill_toofull
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v 
reported up up_primary acting acting_primary last_scrub scrub_stamp 
last_deep_scrub deep_scrub_stamp
2.18 9 0 0 18 0 0 3653 3653 active+remapped+backfill_toofull 2020-10-29 
05:31:20.429912 610'549153 656:390372 [9,12] 9 [18,2] 18 594'547482 2020-10-25 
20:28:39.680744 594'543841 2020-10-21 21:21:33.092868
2.52 15 0 0 15 0 0 4883 4883 active+remapped+backfill_toofull 2020-10-29 
05:31:28.277898 652'502085 656:367288 [17,9] 17 [17,3] 17 594'499108 2020-10-26 
11:06:48.417825 594'499108 2020-10-26 11:06:48.417825


pool : 17 18 19 11 20 

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Eugen, I've got four physical servers and I've installed mon on all of them. 
I've discussed it with Wido and a few other chaps from ceph and there is no 
issue in doing it. The quorum issues would happen if you have 2 mons. If you've 
got more than 2 you should be fine.

Andrei

- Original Message -
> From: "Eugen Block" 
> To: "Andrei Mikhailovsky" 
> Cc: "ceph-users" 
> Sent: Wednesday, 28 October, 2020 20:19:15
> Subject: Re: [ceph-users] Re: frequent Monitor down

> Why do you have 4 MONs in the first place? That way a quorum is
> difficult to achieve, could it be related to that?
> 
> Zitat von Andrei Mikhailovsky :
> 
>> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a
>> lot of entries relating to Compressing as well as monitor going down.
>>
>> Andrei
>>
>>
>>
>> - Original Message -
>>> From: "Eugen Block" 
>>> To: "ceph-users" 
>>> Sent: Wednesday, 28 October, 2020 11:51:20
>>> Subject: [ceph-users] Re: frequent Monitor down
>>
>>> Have you looked into syslog and mon logs?
>>>
>>>
>>> Zitat von Andrei Mikhailovsky :
>>>
 Hello everyone,

 I am having regular messages that the Monitors are going down and up:

 2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)
 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
 2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1)
 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
 BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
 set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed in time
 2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( mon .0)
 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons
 down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
 2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1)
 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
 2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)
 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
 BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
 set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed in time


 This happens on a daily basis several times a day.

 Could you please let me know how to fix this annoying problem?

 I am running ceph version 15.2.4
 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on
 Ubuntu 18.04 LTS with latest updates.

 Thanks

 Andrei
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Monitor persistently out-of-quorum

2020-10-28 Thread Ki Wong
Hello,

I am at my wit's end.

So I made a mistake in the configuration of my router and one
of the monitors (out of 3) dropped out of the quorum and nothing
I’ve done allow it to rejoin. That includes reinstalling the
monitor with ceph-ansible.

The connectivity issue is fixed. I’ve tested it using “nc” and
the host can connect to both port 3300 and 6789 of the other
monitors. But the wayward monitor continue to stay out of quorum.

What is wrong? I see a bunch of “EBUSY” errors in the log, with
the message:

  e1 handle_auth_request haven't formed initial quorum, EBUSY

How do I fix this? Any help would be greatly appreciated.

Many thanks,

-kc


With debug_mon at 1/10, I got these log snippets:

2020-10-28 15:40:05.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 probe_timeout 
0x564050353ec0
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
sync_reset_requester
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
unregister_cluster_logger - not registered
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
mons at 
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0 
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
timecheck_finish
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
scrub_event_cancel
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
reset_probe_timeout 0x564050347ce0 after 2 seconds
2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing other 
monitors
2020-10-28 15:40:07.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 probe_timeout 
0x564050347ce0
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
sync_reset_requester
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
unregister_cluster_logger - not registered
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
mons at 
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0 
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
timecheck_finish
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
scrub_event_cancel
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
reset_probe_timeout 0x564050360660 after 2 seconds
2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing other 
monitors
2020-10-28 15:40:09.107 7fb79253a700 -1 mon.mgmt03@2(probing) e1 
get_health_metrics reporting 7 slow ops, oldest is log(1 entries from seq 1 at 
2020-10-27 23:03:41.586915)
2020-10-28 15:40:09.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 probe_timeout 
0x564050360660
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
sync_reset_requester
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
unregister_cluster_logger - not registered
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
mons at 
{mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0 
_set_mon_num_rank num 0 rank 0
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
cancel_probe_timeout (none scheduled)
2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Yes, I have, Eugen, I see no obvious reason / error / etc. I see a lot of 
entries relating to Compressing as well as monitor going down.

Andrei



- Original Message -
> From: "Eugen Block" 
> To: "ceph-users" 
> Sent: Wednesday, 28 October, 2020 11:51:20
> Subject: [ceph-users] Re: frequent Monitor down

> Have you looked into syslog and mon logs?
> 
> 
> Zitat von Andrei Mikhailovsky :
> 
>> Hello everyone,
>>
>> I am having regular messages that the Monitors are going down and up:
>>
>> 2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)
>> 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum
>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
>> 2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1)
>> 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
>> set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed in time
>> 2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( mon .0)
>> 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons
>> down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
>> 2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1)
>> 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum
>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
>> 2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)
>> 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
>> BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
>> arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
>> set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed in time
>>
>>
>> This happens on a daily basis several times a day.
>>
>> Could you please let me know how to fix this annoying problem?
>>
>> I am running ceph version 15.2.4
>> (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on
>> Ubuntu 18.04 LTS with latest updates.
>>
>> Thanks
>>
>> Andrei
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph User Survey 2020 - Working Group Invite

2020-10-28 Thread Mike Perez
Hi all,

please join here:
https://meet.google.com/_meet/ush-zpjg-wab?ijlm=1603919117370=130

On Fri, Oct 9, 2020 at 10:25 AM  wrote:

> Hello all,
>
> This is an invite to all interested  to join a working group  being formed
> for 2020 Ceph User Survey planning. The focus is to augment the
> questionnaire coverage, explore survey  delivery formats and to expand the
> survey reach to audience across the world.  The popularity and adoption of
> Ceph is growing steadily and so are the deployment options. Survey feedback
> has certainly helped the community to focus on  user's asks and make Ceph
> better for their needs.  This time we want to  make the survey experience
> more enriching to the community of  developers and  to the user community.
>
>
> As a sample, here are a few questions that have been collected to help
> build better hardware options for Ceph across Enterprise and CSPs.  The
> working group can help refine them for content and importance. There will
> be questions in other categories like Ceph configuration  that members will
> help assess for relevance and inclusion and find some innovative approaches
> to reach boarder audience.
>
> Do you prefer single socket servers for Ceph OSD nodes?
> Which drive form factors are important to you (NVMe drives: U.2 or ruler
> (E1.L))?
> How many drives per server fits your need?
> What drive capacities are important to you?
> Do you separate metadata and data on different classes of media?
> Do you use  Optane 3D XPoint or  NAND for BlueStore metadata?
> Which caching method, client side vs OSD side is more useful to you?
>
> As always, many many thanks to Mike Perez who is driving the user survey
> effort and making it better every passing year.
>
> Thank you,
> Anantha Adiga
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Mike Perez

he/him

Ceph / Rook / RDO / Gluster Community Architect

Open-Source Program Office (OSPO)


M: +1-951-572-2633

494C 5D25 2968 D361 65FB 3829 94BC D781 ADA8 8AEA
@Thingee   Thingee
 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Eugen Block
Why do you have 4 MONs in the first place? That way a quorum is  
difficult to achieve, could it be related to that?


Zitat von Andrei Mikhailovsky :

Yes, I have, Eugen, I see no obvious reason / error / etc. I see a  
lot of entries relating to Compressing as well as monitor going down.


Andrei



- Original Message -

From: "Eugen Block" 
To: "ceph-users" 
Sent: Wednesday, 28 October, 2020 11:51:20
Subject: [ceph-users] Re: frequent Monitor down



Have you looked into syslog and mon logs?


Zitat von Andrei Mikhailovsky :


Hello everyone,

I am having regular messages that the Monitors are going down and up:

2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)
2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1)
2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed in time
2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( mon .0)
31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons
down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1)
2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)
2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)
set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed in time


This happens on a daily basis several times a day.

Could you please let me know how to fix this annoying problem?

I am running ceph version 15.2.4
(7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on
Ubuntu 18.04 LTS with latest updates.

Thanks

Andrei
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] frequent Monitor down

2020-10-28 Thread Andrei Mikhailovsky
Hello everyone, 

I am having regular messages that the Monitors are going down and up: 

2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1) 2248 : cluster 
[WRN] Health check failed: 1/4 mons down, quorum 
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1) 2250 : cluster 
[WRN] overall HEALTH_WARN 23 OSD(s) experiencing BlueFS spillover; 3 large omap 
objects; 1/4 mons down, quorum 
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s) set; 43 
pgs not deep-scrubbed in time; 12 pgs not scrubbed in time 
2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( mon .0) 31287 : 
cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons down, quorum 
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib) 
2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1) 2260 : cluster 
[WRN] Health check failed: 1/4 mons down, quorum 
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1) 2262 : cluster 
[WRN] overall HEALTH_WARN 23 OSD(s) experiencing BlueFS spillover; 3 large omap 
objects; 1/4 mons down, quorum 
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s) set; 47 
pgs not deep-scrubbed in time; 14 pgs not scrubbed in time 


This happens on a daily basis several times a day. 

Could you please let me know how to fix this annoying problem? 

I am running ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) 
octopus (stable) on Ubuntu 18.04 LTS with latest updates. 

Thanks 

Andrei 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Eugen Block
You have many unknown PGs because you removed lots of OSDs, this is  
likely to be a problem. Are the removed OSDs still at hand? It's  
possible that you could extract PGs which are missing and import them  
on healthy OSDs, but that's a lot of manual work. Do you have backups  
of the data? Then it would be easier to delete the EC pool and import  
the backups to a healthy pool.



Zitat von "Ing. Luis Felipe Domínguez Vega" :

Great response, thanks, i will use now only one site, but need first  
stabilice the cluster to remove the EC erasure coding and use  
replicate. Could you help me?


So the thing is that i have 2 pool, cinder-ceph and data_storage.  
data_storage is only as data_path for cinder-ceph pool, but now i  
use only the cinder-ceph with replication 3. How can i move all data  
from data_storage to cinder-ceph and remove the EC.


El 2020-10-28 06:55, Frank Schilder escribió:

Hi all, I need to go back to a small piece of information:


I was 3 mons, but i have 2 physical datacenters, one of them breaks with
not short term fix, so i remove all osds and ceph mon (2 of them) and
now i have only the osds of 1 datacenter with the monitor.


When I look at the data about pools and crush map, I don't see
anything that is multi-site. Maybe the physical location was 2-site,
but the crush rules don't reflect that. Consequently, the ceph cluster
was configured single-site and will act accordingly when you loose 50%
of it.

Quick interlude: when people recommend to add servers, they do not
necessarily mean *new* servers. They mean you have to go to ground
zero, dig out as much hardware as you can, drive it to the working
site and make it rejoin the cluster.

A hypothesis. Assume we want to build a 2-site cluster (sites A and B)
that can sustain the total loss of any 1 site, capacity at each site
is equal (mirrored).

Short answer: this is not exactly possible due to the fact that you
always need a qualified majority of monitors for quorum and you cannot
distribute both, N MONs and a qualified majority evenly and
simultaneously over 2 sites. We have already an additional constraint:
site A will have 2 and site B 1 monitor. The condition is, that in
case site A goes down the monitors from the site A can be rescued and
moved to site B to restore data access. Hence, a loss of site A will
imply temporary loss of service (Note that 2+2=4 MONs will not help,
because now 3 MONs are required for a qualified majority; again MONs
need to be rescued from the down site). If this constraint is
satisfied, then one can configure pools as follows:

replicated: size 4, min_size 2, crush rule places 2 copies at each site
erasure coded: k+m with min_size=k+1, m even and m>=k+2, for example,
k=2, m=4, crush rule places 3 shards at each site

With such a configuration, it is possible to sustain the loss of any
one site if the monitors can be recovered from site A. Note that such
EC pools will be very compute intense and have high latency (use
option fast_read to get at least reasonable read speeds). Essentially,
EC is not really suitable for multi-site redundancy, but the above EC
set up will require a bit less capacity than 4 copies.

This setup can sustain the total loss of 1 site (minus MONs on site A)
and will rebuild all data once a large enough second site is brought
up again.

When I look at the information you posted, I see replication 3(2) and
EC 5+2 pools, all having crush root default. I do not see any of these
mandatory configurations, the sites are ignored in the crush rules.
Hence, if you can't get material from the down site back up, you look
at permanent data loss.

You may be able to recover some more data in the replicated pools by
setting min_size=1 for some time. However, you will loose any writes
that are on the other 2 but not on the 1 disk now used for recovery
and it will certainly not recover PGs with all 3 copies on the down
site. Therefore, I would not attempt this, also because for the EC
pools, you will need to get hold of the hosts from the down site and
re-integrate these into the cluster any ways. If you can't do this,
the data is lost.

In the long run, given your crush map and rules, you either stop
placing stuff at 2 sites, or you create a proper 2-site set-up and
copy data over.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Ing. Luis Felipe Domínguez Vega 
Sent: 28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules 

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Frank Schilder
Hi all, I need to go back to a small piece of information:

> I was 3 mons, but i have 2 physical datacenters, one of them breaks with
> not short term fix, so i remove all osds and ceph mon (2 of them) and
> now i have only the osds of 1 datacenter with the monitor.

When I look at the data about pools and crush map, I don't see anything that is 
multi-site. Maybe the physical location was 2-site, but the crush rules don't 
reflect that. Consequently, the ceph cluster was configured single-site and 
will act accordingly when you loose 50% of it.

Quick interlude: when people recommend to add servers, they do not necessarily 
mean *new* servers. They mean you have to go to ground zero, dig out as much 
hardware as you can, drive it to the working site and make it rejoin the 
cluster.

A hypothesis. Assume we want to build a 2-site cluster (sites A and B) that can 
sustain the total loss of any 1 site, capacity at each site is equal (mirrored).

Short answer: this is not exactly possible due to the fact that you always need 
a qualified majority of monitors for quorum and you cannot distribute both, N 
MONs and a qualified majority evenly and simultaneously over 2 sites. We have 
already an additional constraint: site A will have 2 and site B 1 monitor. The 
condition is, that in case site A goes down the monitors from the site A can be 
rescued and moved to site B to restore data access. Hence, a loss of site A 
will imply temporary loss of service (Note that 2+2=4 MONs will not help, 
because now 3 MONs are required for a qualified majority; again MONs need to be 
rescued from the down site). If this constraint is satisfied, then one can 
configure pools as follows:

replicated: size 4, min_size 2, crush rule places 2 copies at each site
erasure coded: k+m with min_size=k+1, m even and m>=k+2, for example, k=2, m=4, 
crush rule places 3 shards at each site

With such a configuration, it is possible to sustain the loss of any one site 
if the monitors can be recovered from site A. Note that such EC pools will be 
very compute intense and have high latency (use option fast_read to get at 
least reasonable read speeds). Essentially, EC is not really suitable for 
multi-site redundancy, but the above EC set up will require a bit less capacity 
than 4 copies.

This setup can sustain the total loss of 1 site (minus MONs on site A) and will 
rebuild all data once a large enough second site is brought up again.

When I look at the information you posted, I see replication 3(2) and EC 5+2 
pools, all having crush root default. I do not see any of these mandatory 
configurations, the sites are ignored in the crush rules. Hence, if you can't 
get material from the down site back up, you look at permanent data loss.

You may be able to recover some more data in the replicated pools by setting 
min_size=1 for some time. However, you will loose any writes that are on the 
other 2 but not on the 1 disk now used for recovery and it will certainly not 
recover PGs with all 3 copies on the down site. Therefore, I would not attempt 
this, also because for the EC pools, you will need to get hold of the hosts 
from the down site and re-integrate these into the cluster any ways. If you 
can't do this, the data is lost.

In the long run, given your crush map and rules, you either stop placing stuff 
at 2 sites, or you create a proper 2-site set-up and copy data over.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Ing. Luis Felipe Domínguez Vega 
Sent: 28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 09:59, Eugen Block escribió:
> Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
> erasure-coded) and the rule requires each chunk on a different host
> but you currently have only 5 hosts available, that's why the recovery
>  is not progressing. It's waiting for two more hosts. Unfortunately,
> you can't change the EC profile or the rule of that pool. I'm not sure
>  if it would work in the current cluster state, but if you can't add
> two more hosts (which would be your best option for recovery) it might
>  be possible to create a new replicated pool (you seem to have enough
> free space) and copy the contents from that EC pool. But as I said,
> I'm not sure if that would work in a degraded state, I've never tried
> that.
>
> So your best bet is to 

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega

EC profile: https://pastebin.ubuntu.com/p/kjbdQXbs85/
ceph pg dump pgs | grep -v "active+clean": 
https://pastebin.ubuntu.com/p/g6TdZXNXBR/


El 2020-10-28 02:23, Eugen Block escribió:

If you have that many spare hosts I would recommend to deploy two more
 MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be
because you removed OSDs. That could mean data loss, but maybe there's
 a chance to recover anyway.


Zitat von "Ing. Luis Felipe Domínguez Vega" :

Well recovering not working yet... i was started 6 servers more and  
the cluster not yet recovered.

Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)   
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/


El 2020-10-27 09:59, Eugen Block escribió:

Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the 
recovery

is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change the EC profile or the rule of that pool. I'm not 
sure

if it would work in the current cluster state, but if you can't add
two more hosts (which would be your best option for recovery) it 
might

be possible to create a new replicated pool (you seem to have enough
free space) and copy the contents from that EC pool. But as I said,
I'm not sure if that would work in a degraded state, I've never tried
that.

So your best bet is to get two more hosts somehow.


pool 4 'data_storage' erasure profile desoft size 7 min_size 5   
crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32   
autoscale_mode off last_change 154384 lfor 0/121016/121014 flags   
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384   
application rbd



Zitat von "Ing. Luis Felipe Domínguez Vega" 
:



Needed data:

ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
ceph osd tree   : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
ceph osd df : (later, because i'm waiting since 10   
minutes and not output yet)

ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)   
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/


El 2020-10-27 07:14, Eugen Block escribió:
I understand, but i delete the OSDs from CRUSH map, so ceph  don't 
  wait for these OSDs, i'm right?


It depends on your actual crush tree and rules. Can you share 
(maybe

you already did)

ceph osd tree
ceph osd df
ceph osd pool ls detail

and a dump of your crush rules?

As I already said, if you have rules in place that distribute data
across 2 DCs and one of them is down the PGs will never recover 
even

if you delete the OSDs from the failed DC.



Zitat von "Ing. Luis Felipe Domínguez Vega" 
:


I understand, but i delete the OSDs from CRUSH map, so ceph  don't 
  wait for these OSDs, i'm right?


El 2020-10-27 04:06, Eugen Block escribió:

Hi,

just to clarify so I don't miss anything: you have two DCs and 
one of

them is down. And two of the MONs were in that failed DC? Now you
removed all OSDs and two MONs from the failed DC hoping that your
cluster will recover? If you have reasonable crush rules in place
(e.g. to recover from a failed DC) your cluster will never 
recover in

the current state unless you bring OSDs back up on the second DC.
That's why you don't see progress in the recovery process, the 
PGs are
waiting for their peers in the other DC so they can follow the 
crush

rules.

Regards,
Eugen


Zitat von "Ing. Luis Felipe Domínguez Vega" 
:


I was 3 mons, but i have 2 physical datacenters, one of them
breaks  with not short term fix, so i remove all osds and ceph   
mon  (2 of  them) and now i have only the osds of 1  datacenter  
with the  monitor.  I was stopped the ceph  manager, but i was  
see that when  i restart a  ceph manager  then ceph -s show  
recovering info for a  short term of  20  min more or less, then 
 dissapear all info.


The thing is that sems the cluster is not self recovering and   
the   ceph monitor is "eating" all of the HDD.


El 2020-10-26 15:57, Eugen Block escribió:
The recovery process (ceph -s) is independent of the MGR 
service but
only depends on the MON service. It seems you only have the one 
MON,
if the MGR is overloading it (not clear why) it could help to 
leave
MGR off and see if the MON service then has enough RAM to 
proceed with
the recovery. Do you have any chance to add two more MONs? A 
single


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Ing . Luis Felipe Domínguez Vega
Great response, thanks, i will use now only one site, but need first 
stabilice the cluster to remove the EC erasure coding and use replicate. 
Could you help me?


So the thing is that i have 2 pool, cinder-ceph and data_storage. 
data_storage is only as data_path for cinder-ceph pool, but now i use 
only the cinder-ceph with replication 3. How can i move all data from 
data_storage to cinder-ceph and remove the EC.


El 2020-10-28 06:55, Frank Schilder escribió:

Hi all, I need to go back to a small piece of information:

I was 3 mons, but i have 2 physical datacenters, one of them breaks 
with

not short term fix, so i remove all osds and ceph mon (2 of them) and
now i have only the osds of 1 datacenter with the monitor.


When I look at the data about pools and crush map, I don't see
anything that is multi-site. Maybe the physical location was 2-site,
but the crush rules don't reflect that. Consequently, the ceph cluster
was configured single-site and will act accordingly when you loose 50%
of it.

Quick interlude: when people recommend to add servers, they do not
necessarily mean *new* servers. They mean you have to go to ground
zero, dig out as much hardware as you can, drive it to the working
site and make it rejoin the cluster.

A hypothesis. Assume we want to build a 2-site cluster (sites A and B)
that can sustain the total loss of any 1 site, capacity at each site
is equal (mirrored).

Short answer: this is not exactly possible due to the fact that you
always need a qualified majority of monitors for quorum and you cannot
distribute both, N MONs and a qualified majority evenly and
simultaneously over 2 sites. We have already an additional constraint:
site A will have 2 and site B 1 monitor. The condition is, that in
case site A goes down the monitors from the site A can be rescued and
moved to site B to restore data access. Hence, a loss of site A will
imply temporary loss of service (Note that 2+2=4 MONs will not help,
because now 3 MONs are required for a qualified majority; again MONs
need to be rescued from the down site). If this constraint is
satisfied, then one can configure pools as follows:

replicated: size 4, min_size 2, crush rule places 2 copies at each site
erasure coded: k+m with min_size=k+1, m even and m>=k+2, for example,
k=2, m=4, crush rule places 3 shards at each site

With such a configuration, it is possible to sustain the loss of any
one site if the monitors can be recovered from site A. Note that such
EC pools will be very compute intense and have high latency (use
option fast_read to get at least reasonable read speeds). Essentially,
EC is not really suitable for multi-site redundancy, but the above EC
set up will require a bit less capacity than 4 copies.

This setup can sustain the total loss of 1 site (minus MONs on site A)
and will rebuild all data once a large enough second site is brought
up again.

When I look at the information you posted, I see replication 3(2) and
EC 5+2 pools, all having crush root default. I do not see any of these
mandatory configurations, the sites are ignored in the crush rules.
Hence, if you can't get material from the down site back up, you look
at permanent data loss.

You may be able to recover some more data in the replicated pools by
setting min_size=1 for some time. However, you will loose any writes
that are on the other 2 but not on the 1 disk now used for recovery
and it will certainly not recover PGs with all 3 copies on the down
site. Therefore, I would not attempt this, also because for the EC
pools, you will need to get hold of the hosts from the down site and
re-integrate these into the cluster any ways. If you can't do this,
the data is lost.

In the long run, given your crush map and rules, you either stop
placing stuff at 2 sites, or you create a proper 2-site set-up and
copy data over.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Ing. Luis Felipe Domínguez Vega 
Sent: 28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 09:59, Eugen Block escribió:

Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the recovery
 is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change the EC profile or the 

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-28 Thread David Caro

Hi Wladimir, according to the logs you first sent it seems that there is an
authentication issue (the osd daemon not being able to fetch the mon config):

> жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
> 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
> /var/lib/ceph/osd/ceph-1/keyring, disabling cephx
> жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
> (--no-mon-config to skip)
> жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
> exited, code=exited, status=1/FAILURE


The file it fails to load the keyring from is where the auth details for the
osd daemon should be in.
Some more info here:
  https://docs.ceph.com/en/latest/man/8/ceph-authtool/
  https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/
  https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/
  (specifically step 5)

I'm not sure if you were able to fix it or not, but I'd start trying to get
that fixed before playing with ceph-volume.


On 10/27 10:24, Wladimir Mutel wrote:
> Dear David,
> 
> I assimilated most of my Ceph configuration into the cluster itself as this 
> feature was announced by Mimic.
> I see some fsid in [global] section of /etc/ceph/ceph.conf , and some key in 
> [client.admin] section of /etc/ceph/ceph.client.admin.keyring
> The rest is pretty uninteresting, some minimal adjustments in config file and 
> cluster's config dump.
> 
> Looking into Python scripts of ceph-volume, I noticed that tmpfs is mounted 
> during the run "ceph-colume lvm activate",
> and "ceph-bluestore-tool prime-osd-dir" is started from the same script 
> afterwards.
> Should I try starting "ceph-volume lvm activate" in some manual way to see 
> where it stumbles and why ?
> 
> David Caro wrote:
> > Hi Wladim,
> > 
> > If the "unable to find keyring" message disappeared, what was the error 
> > after that fix?
> > 
> > If it's still failing to fetch the mon config, check your authentication 
> > (you might have to add the osd key to the keyring again), and/or that the 
> > mons ips are correct in your osd ceph.conf file.
> > 
> > On 23 October 2020 16:08:02 CEST, Wladimir Mutel  wrote:
> > > Dear all,
> > > 
> > > after breaking my experimental 1-host Ceph cluster and making one its
> > > pg 'incomplete' I left it in abandoned state for some time.
> > > Now I decided to bring it back into life and found that it can not
> > > start one of its OSDs (osd.1 to name it)
> > > 
> > > "ceph osd df" shows :
> > > 
> > > ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META
> > > AVAIL%USE   VAR   PGS  STATUS
> > > 0hdd0   1.0  2.7 TiB  1.6 TiB  1.6 TiB  113 MiB  4.7
> > > GiB  1.1 TiB  59.77  0.69  102  up
> > > 1hdd  2.84549 0  0 B  0 B  0 B  0 B  0
> > > B  0 B  0 00down
> > > 2hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.8
> > > GiB  275 GiB  90.58  1.05  176  up
> > > 3hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   57 MiB  3.9
> > > GiB  271 GiB  90.69  1.05  185  up
> > > 4hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   63 MiB  4.2
> > > GiB  263 GiB  90.98  1.05  184  up
> > > 5hdd  2.84549   1.0  2.8 TiB  2.6 TiB  2.5 TiB   52 MiB  3.8
> > > GiB  263 GiB  90.96  1.05  178  up
> > > 6hdd  2.53400   1.0  2.5 TiB  2.3 TiB  2.3 TiB  173 MiB  5.2
> > > GiB  228 GiB  91.21  1.05  178  up
> > > 7hdd  2.53400   1.0  2.5 TiB  2.3 TiB  2.3 TiB  147 MiB  5.2
> > > GiB  230 GiB  91.12  1.05  168  up
> > >  TOTAL   19 TiB   17 TiB   16 TiB  662 MiB   31 GiB  2.6 TiB  86.48
> > > MIN/MAX VAR: 0.69/1.05  STDDEV: 10.90
> > > 
> > > "ceph device ls" shows :
> > > 
> > > DEVICE  HOST:DEV  DAEMONS
> > >  LIFE EXPECTANCY
> > > GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380  p10s:nvme0n1  osd.1 osd.2
> > > osd.3 osd.4 osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVSTp10s:sdd  osd.1
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRAp10s:sda  osd.6
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NTp10s:sdb  osd.7
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74p10s:sde  osd.2
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1Fp10s:sdh  osd.5
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87p10s:sdf  osd.3
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJPp10s:sdc  osd.0
> > > WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRNp10s:sdg  osd.4
> > > 
> > > In my last migration, I created a bluestore volume with external
> > > block.db like this :
> > > 
> > > "ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db
> > > /dev/nvme0n1p4"
> > > 
> > > And I can see this metadata by
> > > 
> > > "ceph-bluestore-tool show-label --dev
> > > /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202"
> > > :
> > > 
> > > {
> > > 

[ceph-users] Re: frequent Monitor down

2020-10-28 Thread Eugen Block

Have you looked into syslog and mon logs?


Zitat von Andrei Mikhailovsky :


Hello everyone,

I am having regular messages that the Monitors are going down and up:

2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)  
2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum  
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1)  
2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing  
BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum  
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)  
set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed in time
2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( mon .0)  
31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons  
down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1)  
2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum  
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)  
2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing  
BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum  
arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s)  
set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed in time



This happens on a daily basis several times a day.

Could you please let me know how to fix this annoying problem?

I am running ceph version 15.2.4  
(7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on  
Ubuntu 18.04 LTS with latest updates.


Thanks

Andrei
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-28 Thread Eugen Block
If you have that many spare hosts I would recommend to deploy two more  
MONs on them, and probably also additional MGRs so they can failover.


What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be  
because you removed OSDs. That could mean data loss, but maybe there's  
a chance to recover anyway.



Zitat von "Ing. Luis Felipe Domínguez Vega" :

Well recovering not working yet... i was started 6 servers more and  
the cluster not yet recovered.

Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)   
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/


El 2020-10-27 09:59, Eugen Block escribió:

Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the recovery
is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change the EC profile or the rule of that pool. I'm not sure
if it would work in the current cluster state, but if you can't add
two more hosts (which would be your best option for recovery) it might
be possible to create a new replicated pool (you seem to have enough
free space) and copy the contents from that EC pool. But as I said,
I'm not sure if that would work in a degraded state, I've never tried
that.

So your best bet is to get two more hosts somehow.


pool 4 'data_storage' erasure profile desoft size 7 min_size 5   
crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32   
autoscale_mode off last_change 154384 lfor 0/121016/121014 flags   
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384   
application rbd



Zitat von "Ing. Luis Felipe Domínguez Vega" :


Needed data:

ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
ceph osd tree   : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
ceph osd df : (later, because i'm waiting since 10   
minutes and not output yet)

ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)   
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/


El 2020-10-27 07:14, Eugen Block escribió:
I understand, but i delete the OSDs from CRUSH map, so ceph  
don't   wait for these OSDs, i'm right?


It depends on your actual crush tree and rules. Can you share (maybe
you already did)

ceph osd tree
ceph osd df
ceph osd pool ls detail

and a dump of your crush rules?

As I already said, if you have rules in place that distribute data
across 2 DCs and one of them is down the PGs will never recover even
if you delete the OSDs from the failed DC.



Zitat von "Ing. Luis Felipe Domínguez Vega" :

I understand, but i delete the OSDs from CRUSH map, so ceph  
don't   wait for these OSDs, i'm right?


El 2020-10-27 04:06, Eugen Block escribió:

Hi,

just to clarify so I don't miss anything: you have two DCs and one of
them is down. And two of the MONs were in that failed DC? Now you
removed all OSDs and two MONs from the failed DC hoping that your
cluster will recover? If you have reasonable crush rules in place
(e.g. to recover from a failed DC) your cluster will never recover in
the current state unless you bring OSDs back up on the second DC.
That's why you don't see progress in the recovery process, the PGs are
waiting for their peers in the other DC so they can follow the crush
rules.

Regards,
Eugen


Zitat von "Ing. Luis Felipe Domínguez Vega" :

I was 3 mons, but i have 2 physical datacenters, one of them
breaks  with not short term fix, so i remove all osds and ceph  
 mon  (2 of  them) and now i have only the osds of 1  
datacenter  with the  monitor.  I was stopped the ceph  
manager, but i was  see that when  i restart a  ceph manager  
then ceph -s show  recovering info for a  short term of  20  
min more or less, then  dissapear all info.


The thing is that sems the cluster is not self recovering and   
the   ceph monitor is "eating" all of the HDD.


El 2020-10-26 15:57, Eugen Block escribió:

The recovery process (ceph -s) is independent of the MGR service but
only depends on the MON service. It seems you only have the one MON,
if the MGR is overloading it (not clear why) it could help to leave
MGR off and see if the MON service then has enough RAM to proceed with
the recovery. Do you have any chance to add two more MONs? A single
MON is of course a single point of failure.


Zitat von "Ing. Luis Felipe Domínguez Vega"  
:



El 2020-10-26 15:16, Eugen Block escribió:
You could stop the MGRs and wait for the recovery to  
finish, MGRs are