[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

Frank Schilder Thu, 29 Oct 2020 14:13:26 -0700

> ... i will use now only one site, but need first stabilice the
> cluster to remove the EC erasure coding and use replicate ...


If you change to one site only, there is no point in getting rid of the EC 
pool. Your main problem will be restoring the lost data. Do you have backup of 
everything? Do you still have the old OSDs? You never answered these questions.

To give you an idea why this is important, with ceph, loosing 1% of data on an 
rbd pool does *not* mean you loose 1% of the disks. It means that, on average, 
every disk looses 1% of its blocks. In other words, getting everything up again 
will be a lot of work either way.

The best path to follow is what Eugen suggested: add mons to have at least 3 
and dig out the old disks to be able to export and import PGs. Look at Eugen's 
last 2 e-mails, its a starting point. You might be able to recover more by 
reducing temporarily min_size to 1 on the replicated pools and to 4 on the EC 
pool. If possible, make sure there is no client access during that time. The 
missing rest needs to be scraped off the OSDs you deleted from the cluster.

If you have backup of everything, starting from scratch and populating the ceph 
cluster from backup might be the fastest option.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <ebl...@nde.ag>
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be
because you removed OSDs. That could mean data loss, but maybe there's
a chance to recover anyway.


Zitat von "Ing. Luis Felipe Domínguez Vega" <luis.doming...@desoft.cu>:

> Well recovering not working yet... i was started 6 servers more and
> the cluster not yet recovered.
> Ceph status not show any recover progress
>
> ceph -s                 : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
> ceph osd tree           : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
> ceph osd df             : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
> crush rules             : (ceph osd crush rule dump)
> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/
>
> El 2020-10-27 09:59, Eugen Block escribió:
>> Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
>> erasure-coded) and the rule requires each chunk on a different host
>> but you currently have only 5 hosts available, that's why the recovery
>> is not progressing. It's waiting for two more hosts. Unfortunately,
>> you can't change the EC profile or the rule of that pool. I'm not sure
>> if it would work in the current cluster state, but if you can't add
>> two more hosts (which would be your best option for recovery) it might
>> be possible to create a new replicated pool (you seem to have enough
>> free space) and copy the contents from that EC pool. But as I said,
>> I'm not sure if that would work in a degraded state, I've never tried
>> that.
>>
>> So your best bet is to get two more hosts somehow.
>>
>>
>>> pool 4 'data_storage' erasure profile desoft size 7 min_size 5
>>> crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32
>>> autoscale_mode off last_change 154384 lfor 0/121016/121014 flags
>>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>>> application rbd
>>
>>
>> Zitat von "Ing. Luis Felipe Domínguez Vega" <luis.doming...@desoft.cu>:
>>
>>> Needed data:
>>>
>>> ceph -s                 : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
>>> ceph osd tree           : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
>>> ceph osd df             : (later, because i'm waiting since 10
>>> minutes and not output yet)
>>> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
>>> crush rules             : (ceph osd crush rule dump)
>>> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/
>>>
>>> El 2020-10-27 07:14, Eugen Block escribió:
>>>>> I understand, but i delete the OSDs from CRUSH map, so ceph
>>>>> don't   wait for these OSDs, i'm right?
>>>>
>>>> It depends on your actual crush tree and rules. Can you share (maybe
>>>> you already did)
>>>>
>>>> ceph osd tree
>>>> ceph osd df
>>>> ceph osd pool ls detail
>>>>
>>>> and a dump of your crush rules?
>>>>
>>>> As I already said, if you have rules in place that distribute data
>>>> across 2 DCs and one of them is down the PGs will never recover even
>>>> if you delete the OSDs from the failed DC.
>>>>
>>>>
>>>>
>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" <luis.doming...@desoft.cu>:
>>>>
>>>>> I understand, but i delete the OSDs from CRUSH map, so ceph
>>>>> don't   wait for these OSDs, i'm right?
>>>>>
>>>>> El 2020-10-27 04:06, Eugen Block escribió:
>>>>>> Hi,
>>>>>>
>>>>>> just to clarify so I don't miss anything: you have two DCs and one of
>>>>>> them is down. And two of the MONs were in that failed DC? Now you
>>>>>> removed all OSDs and two MONs from the failed DC hoping that your
>>>>>> cluster will recover? If you have reasonable crush rules in place
>>>>>> (e.g. to recover from a failed DC) your cluster will never recover in
>>>>>> the current state unless you bring OSDs back up on the second DC.
>>>>>> That's why you don't see progress in the recovery process, the PGs are
>>>>>> waiting for their peers in the other DC so they can follow the crush
>>>>>> rules.
>>>>>>
>>>>>> Regards,
>>>>>> Eugen
>>>>>>
>>>>>>
>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" <luis.doming...@desoft.cu>:
>>>>>>
>>>>>>> I was 3 mons, but i have 2 physical datacenters, one of them
>>>>>>> breaks  with not short term fix, so i remove all osds and ceph
>>>>>>>  mon  (2 of  them) and now i have only the osds of 1
>>>>>>> datacenter  with the  monitor.  I was stopped the ceph
>>>>>>> manager, but i was  see that when  i restart a  ceph manager
>>>>>>> then ceph -s show  recovering info for a  short term of  20
>>>>>>> min more or less, then  dissapear all info.
>>>>>>>
>>>>>>> The thing is that sems the cluster is not self recovering and
>>>>>>> the   ceph monitor is "eating" all of the HDD.
>>>>>>>
>>>>>>> El 2020-10-26 15:57, Eugen Block escribió:
>>>>>>>> The recovery process (ceph -s) is independent of the MGR service but
>>>>>>>> only depends on the MON service. It seems you only have the one MON,
>>>>>>>> if the MGR is overloading it (not clear why) it could help to leave
>>>>>>>> MGR off and see if the MON service then has enough RAM to proceed with
>>>>>>>> the recovery. Do you have any chance to add two more MONs? A single
>>>>>>>> MON is of course a single point of failure.
>>>>>>>>
>>>>>>>>
>>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
>>>>>>>> <luis.doming...@desoft.cu>:
>>>>>>>>
>>>>>>>>> El 2020-10-26 15:16, Eugen Block escribió:
>>>>>>>>>> You could stop the MGRs and wait for the recovery to
>>>>>>>>>> finish, MGRs are
>>>>>>>>>> not a critical component. You won’t have a dashboard or metrics
>>>>>>>>>> during/of that time but it would prevent the high RAM usage.
>>>>>>>>>>
>>>>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega"
>>>>>>>>>> <luis.doming...@desoft.cu>:
>>>>>>>>>>
>>>>>>>>>>> El 2020-10-26 12:23, 胡 玮文 escribió:
>>>>>>>>>>>>> 在 2020年10月26日，23:29，Ing. Luis Felipe Domínguez Vega
>>>>>>>>>>>>> <luis.doming...@desoft.cu> 写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>> mgr: fond-beagle(active, since 39s)
>>>>>>>>>>>>
>>>>>>>>>>>> Your manager seems crash looping, it only started since
>>>>>>>>>>>> 39s. Looking
>>>>>>>>>>>> at mgr logs may help you identify why your cluster is not
>>>>>>>>>>>>  recovering.
>>>>>>>>>>>> You may hit some bug in mgr.
>>>>>>>>>>> Noup, I'm restarting the ceph manager because they eat all
>>>>>>>>>>>    server   RAM and then i have an script that when i have
>>>>>>>>>>> 1GB  of   Free Ram  (the  server has 94 Gb of RAM) then
>>>>>>>>>>> restart  the   manager, i dont  known why  and the logs of
>>>>>>>>>>> manager are:
>>>>>>>>>>>
>>>>>>>>>>> -----------------------------------
>>>>>>>>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db#
>>>>>>>>>>> tail    -f /var/log/ceph/ceph-mgr.fond-beagle.log
>>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v584: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v585: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v586: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v587: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v588: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   log [DBG] : pgmap v589: 2305 pgs: 4
>>>>>>>>>>>      active+undersized+degraded+remapped, 4
>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104
>>>>>>>>>>>      active+clean, 5 active+undersized+degraded, 34
>>>>>>>>>>> incomplete,  154     unknown; 1.7 TiB data, 2.9 TiB used,
>>>>>>>>>>> 21 TiB / 24 TiB  avail;     347248/2606900 objects
>>>>>>>>>>> degraded (13.320%);  107570/2606900   objects   misplaced
>>>>>>>>>>> (4.126%); 19/404328  objects unfound (0.005%)
>>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700  0
>>>>>>>>>>> log_channel(cluster)   do_log log to syslog
>>>>>>>>>>> ---------------
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>>>>>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>>>>>>>
>>>>>>>>> Ok i will do that... but the thing is that the cluster not
>>>>>>>>> show    recovering, not show that are doing nothing, like to
>>>>>>>>>  show the    recovering info on ceph -s command, and then i
>>>>>>>>> dont know if is    recovering or doing what?


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

Reply via email to