Re: [ceph-users] Help Ceph Cluster Down

Caspar Smit Fri, 04 Jan 2019 04:54:10 -0800

Hi Arun,

How did you end up with a 'working' cluster with so many pgs per OSD?


"too many PGs per OSD (2968 > max 200)"

To (temporarily) allow this kind of pgs per osd you could try this:

Change these values in the global section in your ceph.conf:

mon max pg per osd = 200
osd max pg per osd hard ratio = 2

It allows 200*2 = 400 Pgs per OSD before disabling the creation of new
pgs.

Above are the defaults (for Luminous, maybe other versions too)
You can check your current settings with:

ceph daemon mon.ceph-mon01 config show |grep pg_per_osd

Since your current pgs per osd ratio is way higher then the default you
could set them to for instance:

mon max pg per osd = 1000
osd max pg per osd hard ratio = 5

Which allow for 5000 pgs per osd before disabling creation of new pgs.

You'll need to inject the setting into the mons/osds and restart mgrs to
make them active.

ceph tell mon.* injectargs ‘--mon_max_pg_per_osd 1000’
ceph tell mon.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
ceph tell osd.* injectargs ‘--mon_max_pg_per_osd 1000’
ceph tell osd.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
restart mgrs

Kind regards,
Caspar


Op vr 4 jan. 2019 om 04:28 schreef Arun POONIA <
arun.poo...@nuagenetworks.net>:

> Hi Chris,
>
> Indeed that's what happened. I didn't set noout flag either and I did
> zapped disk on new server every time. In my cluster status fre201 is only
> new server.
>
> Current Status after enabling 3 OSDs on fre201 host.
>
> [root@fre201 ~]# ceph osd tree
> ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
>  -1       70.92137 root default
>  -2        5.45549     host fre101
>   0   hdd  1.81850         osd.0       up  1.00000 1.00000
>   1   hdd  1.81850         osd.1       up  1.00000 1.00000
>   2   hdd  1.81850         osd.2       up  1.00000 1.00000
>  -9        5.45549     host fre103
>   3   hdd  1.81850         osd.3       up  1.00000 1.00000
>   4   hdd  1.81850         osd.4       up  1.00000 1.00000
>   5   hdd  1.81850         osd.5       up  1.00000 1.00000
>  -3        5.45549     host fre105
>   6   hdd  1.81850         osd.6       up  1.00000 1.00000
>   7   hdd  1.81850         osd.7       up  1.00000 1.00000
>   8   hdd  1.81850         osd.8       up  1.00000 1.00000
>  -4        5.45549     host fre107
>   9   hdd  1.81850         osd.9       up  1.00000 1.00000
>  10   hdd  1.81850         osd.10      up  1.00000 1.00000
>  11   hdd  1.81850         osd.11      up  1.00000 1.00000
>  -5        5.45549     host fre109
>  12   hdd  1.81850         osd.12      up  1.00000 1.00000
>  13   hdd  1.81850         osd.13      up  1.00000 1.00000
>  14   hdd  1.81850         osd.14      up  1.00000 1.00000
>  -6        5.45549     host fre111
>  15   hdd  1.81850         osd.15      up  1.00000 1.00000
>  16   hdd  1.81850         osd.16      up  1.00000 1.00000
>  17   hdd  1.81850         osd.17      up  0.79999 1.00000
>  -7        5.45549     host fre113
>  18   hdd  1.81850         osd.18      up  1.00000 1.00000
>  19   hdd  1.81850         osd.19      up  1.00000 1.00000
>  20   hdd  1.81850         osd.20      up  1.00000 1.00000
>  -8        5.45549     host fre115
>  21   hdd  1.81850         osd.21      up  1.00000 1.00000
>  22   hdd  1.81850         osd.22      up  1.00000 1.00000
>  23   hdd  1.81850         osd.23      up  1.00000 1.00000
> -10        5.45549     host fre117
>  24   hdd  1.81850         osd.24      up  1.00000 1.00000
>  25   hdd  1.81850         osd.25      up  1.00000 1.00000
>  26   hdd  1.81850         osd.26      up  1.00000 1.00000
> -11        5.45549     host fre119
>  27   hdd  1.81850         osd.27      up  1.00000 1.00000
>  28   hdd  1.81850         osd.28      up  1.00000 1.00000
>  29   hdd  1.81850         osd.29      up  1.00000 1.00000
> -12        5.45549     host fre121
>  30   hdd  1.81850         osd.30      up  1.00000 1.00000
>  31   hdd  1.81850         osd.31      up  1.00000 1.00000
>  32   hdd  1.81850         osd.32      up  1.00000 1.00000
> -13        5.45549     host fre123
>  33   hdd  1.81850         osd.33      up  1.00000 1.00000
>  34   hdd  1.81850         osd.34      up  1.00000 1.00000
>  35   hdd  1.81850         osd.35      up  1.00000 1.00000
> -27        5.45549     host fre201
>  36   hdd  1.81850         osd.36      up  1.00000 1.00000
>  37   hdd  1.81850         osd.37      up  1.00000 1.00000
>  38   hdd  1.81850         osd.38      up  1.00000 1.00000
> [root@fre201 ~]#
> [root@fre201 ~]#
> [root@fre201 ~]#
> [root@fre201 ~]#
> [root@fre201 ~]#
> [root@fre201 ~]# ceph -s
>   cluster:
>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>     health: HEALTH_ERR
>             3 pools have many more objects per pg than average
>             585791/12391450 objects misplaced (4.727%)
>             2 scrub errors
>             2374 PGs pending on creation
>             Reduced data availability: 6578 pgs inactive, 2025 pgs down,
> 74 pgs peering, 1234 pgs stale
>             Possible data damage: 2 pgs inconsistent
>             Degraded data redundancy: 64969/12391450 objects degraded
> (0.524%), 616 pgs degraded, 20 pgs undersized
>             96242 slow requests are blocked > 32 sec
>             228 stuck requests are blocked > 4096 sec
>             too many PGs per OSD (2768 > max 200)
>
>   services:
>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>     osd: 39 osds: 39 up, 39 in; 96 remapped pgs
>     rgw: 1 daemon active
>
>   data:
>     pools:   18 pools, 54656 pgs
>     objects: 6050k objects, 10942 GB
>     usage:   21900 GB used, 50721 GB / 72622 GB avail
>     pgs:     0.002% pgs unknown
>              12.050% pgs not active
>              64969/12391450 objects degraded (0.524%)
>              585791/12391450 objects misplaced (4.727%)
>              47489 active+clean
>              3670  activating
>              1098  stale+down
>              923   down
>              575   activating+degraded
>              563   stale+active+clean
>              105   stale+activating
>              78    activating+remapped
>              72    peering
>              25    stale+activating+degraded
>              23    stale+activating+remapped
>              9     stale+active+undersized
>              6     stale+activating+undersized+degraded+remapped
>              5     stale+active+undersized+degraded
>              4     down+remapped
>              4     activating+degraded+remapped
>              2     active+clean+inconsistent
>              1     stale+activating+degraded+remapped
>              1     stale+active+clean+remapped
>              1     stale+remapped+peering
>              1     remapped+peering
>              1     unknown
>
>   io:
>     client:   0 B/s rd, 208 kB/s wr, 22 op/s rd, 22 op/s wr
>
>
>
> Thanks
> Arun
>
>
> On Thu, Jan 3, 2019 at 7:19 PM Chris <bitskr...@bitskrieg.net> wrote:
>
>> If you added OSDs and then deleted them repeatedly without waiting for
>> replication to finish as the cluster attempted to re-balance across them,
>> its highly likely that you are permanently missing PGs (especially if the
>> disks were zapped each time).
>>
>> If those 3 down OSDs can be revived there is a (small) chance that you
>> can right the ship, but 1400pg/OSD is pretty extreme.  I'm surprised the
>> cluster even let you do that - this sounds like a data loss event.
>>
>> Bring back the 3 OSD and see what those 2 inconsistent pgs look like with
>> ceph pg query.
>>
>> On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net>
>> wrote:
>>
>>> Hi,
>>>
>>> Recently I tried adding a new node (OSD) to ceph cluster using
>>> ceph-deploy tool. Since I was experimenting with tool and ended up deleting
>>> OSD nodes on new server couple of times.
>>>
>>> Now since ceph OSDs are running on new server cluster PGs seems to be
>>> inactive (10-15%) and they are not recovering or rebalancing. Not sure what
>>> to do. I tried shutting down OSDs on new server.
>>>
>>> Status:
>>> [root@fre105 ~]# ceph -s
>>> 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
>>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
>>> bind the UNIX domain socket to
>>> '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
>>> No such file or directory
>>>   cluster:
>>>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>>>     health: HEALTH_ERR
>>>             3 pools have many more objects per pg than average
>>>             373907/12391198 objects misplaced (3.018%)
>>>             2 scrub errors
>>>             9677 PGs pending on creation
>>>             Reduced data availability: 7145 pgs inactive, 6228 pgs down,
>>> 1 pg peering, 2717 pgs stale
>>>             Possible data damage: 2 pgs inconsistent
>>>             Degraded data redundancy: 178350/12391198 objects degraded
>>> (1.439%), 346 pgs degraded, 1297 pgs undersized
>>>             52486 slow requests are blocked > 32 sec
>>>             9287 stuck requests are blocked > 4096 sec
>>>             too many PGs per OSD (2968 > max 200)
>>>
>>>   services:
>>>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>>     osd: 39 osds: 36 up, 36 in; 51 remapped pgs
>>>     rgw: 1 daemon active
>>>
>>>   data:
>>>     pools:   18 pools, 54656 pgs
>>>     objects: 6050k objects, 10941 GB
>>>     usage:   21727 GB used, 45308 GB / 67035 GB avail
>>>     pgs:     13.073% pgs not active
>>>              178350/12391198 objects degraded (1.439%)
>>>              373907/12391198 objects misplaced (3.018%)
>>>              46177 active+clean
>>>              5054  down
>>>              1173  stale+down
>>>              1084  stale+active+undersized
>>>              547   activating
>>>              201   stale+active+undersized+degraded
>>>              158   stale+activating
>>>              96    activating+degraded
>>>              46    stale+active+clean
>>>              42    activating+remapped
>>>              34    stale+activating+degraded
>>>              23    stale+activating+remapped
>>>              6     stale+activating+undersized+degraded+remapped
>>>              6     activating+undersized+degraded+remapped
>>>              2     activating+degraded+remapped
>>>              2     active+clean+inconsistent
>>>              1     stale+activating+degraded+remapped
>>>              1     stale+active+clean+remapped
>>>              1     stale+remapped
>>>              1     down+remapped
>>>              1     remapped+peering
>>>
>>>   io:
>>>     client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr
>>>
>>> Thanks
>>> --
>>> Arun Poonia
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
> --
> Arun Poonia
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help Ceph Cluster Down

Reply via email to