Re: [ceph-users] Help Ceph Cluster Down

Caspar Smit Fri, 04 Jan 2019 05:39:10 -0800

Are the numbers still decreasing?

This one for instance:


"3883 PGs pending on creation"

Caspar


Op vr 4 jan. 2019 om 14:23 schreef Arun POONIA <
arun.poo...@nuagenetworks.net>:

> Hi Caspar,
>
> Yes, cluster was working fine with number of PGs per OSD warning up until
> now. I am not sure how to recover from stale down/inactive PGs. If you
> happen to know about this can you let me know?
>
> Current State:
>
> [root@fre101 ~]# ceph -s
> 2019-01-04 05:22:05.942349 7f314f613700 -1 asok(0x7f31480017a0)
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
> bind the UNIX domain socket to
> '/var/run/ceph-guests/ceph-client.admin.1053724.139849638091088.asok': (2)
> No such file or directory
>   cluster:
>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>     health: HEALTH_ERR
>             3 pools have many more objects per pg than average
>             505714/12392650 objects misplaced (4.081%)
>             3883 PGs pending on creation
>             Reduced data availability: 6519 pgs inactive, 1870 pgs down, 1
> pg peering, 886 pgs stale
>             Degraded data redundancy: 42987/12392650 objects degraded
> (0.347%), 634 pgs degraded, 16 pgs undersized
>             125827 slow requests are blocked > 32 sec
>             2 stuck requests are blocked > 4096 sec
>             too many PGs per OSD (2758 > max 200)
>
>   services:
>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>     osd: 39 osds: 39 up, 39 in; 76 remapped pgs
>     rgw: 1 daemon active
>
>   data:
>     pools:   18 pools, 54656 pgs
>     objects: 6051k objects, 10944 GB
>     usage:   21933 GB used, 50688 GB / 72622 GB avail
>     pgs:     11.927% pgs not active
>              42987/12392650 objects degraded (0.347%)
>              505714/12392650 objects misplaced (4.081%)
>              48080 active+clean
>              3885  activating
>              1111  down
>              759   stale+down
>              614   activating+degraded
>              74    activating+remapped
>              46    stale+active+clean
>              35    stale+activating
>              21    stale+activating+remapped
>              9     stale+active+undersized
>              9     stale+activating+degraded
>              5     stale+activating+undersized+degraded+remapped
>              3     activating+degraded+remapped
>              1     stale+activating+degraded+remapped
>              1     stale+active+undersized+degraded
>              1     remapped+peering
>              1     active+clean+remapped
>              1     activating+undersized+degraded+remapped
>
>   io:
>     client:   0 B/s rd, 25397 B/s wr, 4 op/s rd, 4 op/s wr
>
> I will update number of PGs per OSD once these inactive or stale PGs come
> online. I am not able to access VMs (VMs, Images) which are using Ceph.
>
> Thanks
> Arun
>
> On Fri, Jan 4, 2019 at 4:53 AM Caspar Smit <caspars...@supernas.eu> wrote:
>
>> Hi Arun,
>>
>> How did you end up with a 'working' cluster with so many pgs per OSD?
>>
>> "too many PGs per OSD (2968 > max 200)"
>>
>> To (temporarily) allow this kind of pgs per osd you could try this:
>>
>> Change these values in the global section in your ceph.conf:
>>
>> mon max pg per osd = 200
>> osd max pg per osd hard ratio = 2
>>
>> It allows 200*2 = 400 Pgs per OSD before disabling the creation of new
>> pgs.
>>
>> Above are the defaults (for Luminous, maybe other versions too)
>> You can check your current settings with:
>>
>> ceph daemon mon.ceph-mon01 config show |grep pg_per_osd
>>
>> Since your current pgs per osd ratio is way higher then the default you
>> could set them to for instance:
>>
>> mon max pg per osd = 1000
>> osd max pg per osd hard ratio = 5
>>
>> Which allow for 5000 pgs per osd before disabling creation of new pgs.
>>
>> You'll need to inject the setting into the mons/osds and restart mgrs to
>> make them active.
>>
>> ceph tell mon.* injectargs ‘--mon_max_pg_per_osd 1000’
>> ceph tell mon.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
>> ceph tell osd.* injectargs ‘--mon_max_pg_per_osd 1000’
>> ceph tell osd.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
>> restart mgrs
>>
>> Kind regards,
>> Caspar
>>
>>
>> Op vr 4 jan. 2019 om 04:28 schreef Arun POONIA <
>> arun.poo...@nuagenetworks.net>:
>>
>>> Hi Chris,
>>>
>>> Indeed that's what happened. I didn't set noout flag either and I did
>>> zapped disk on new server every time. In my cluster status fre201 is only
>>> new server.
>>>
>>> Current Status after enabling 3 OSDs on fre201 host.
>>>
>>> [root@fre201 ~]# ceph osd tree
>>> ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
>>>  -1       70.92137 root default
>>>  -2        5.45549     host fre101
>>>   0   hdd  1.81850         osd.0       up  1.00000 1.00000
>>>   1   hdd  1.81850         osd.1       up  1.00000 1.00000
>>>   2   hdd  1.81850         osd.2       up  1.00000 1.00000
>>>  -9        5.45549     host fre103
>>>   3   hdd  1.81850         osd.3       up  1.00000 1.00000
>>>   4   hdd  1.81850         osd.4       up  1.00000 1.00000
>>>   5   hdd  1.81850         osd.5       up  1.00000 1.00000
>>>  -3        5.45549     host fre105
>>>   6   hdd  1.81850         osd.6       up  1.00000 1.00000
>>>   7   hdd  1.81850         osd.7       up  1.00000 1.00000
>>>   8   hdd  1.81850         osd.8       up  1.00000 1.00000
>>>  -4        5.45549     host fre107
>>>   9   hdd  1.81850         osd.9       up  1.00000 1.00000
>>>  10   hdd  1.81850         osd.10      up  1.00000 1.00000
>>>  11   hdd  1.81850         osd.11      up  1.00000 1.00000
>>>  -5        5.45549     host fre109
>>>  12   hdd  1.81850         osd.12      up  1.00000 1.00000
>>>  13   hdd  1.81850         osd.13      up  1.00000 1.00000
>>>  14   hdd  1.81850         osd.14      up  1.00000 1.00000
>>>  -6        5.45549     host fre111
>>>  15   hdd  1.81850         osd.15      up  1.00000 1.00000
>>>  16   hdd  1.81850         osd.16      up  1.00000 1.00000
>>>  17   hdd  1.81850         osd.17      up  0.79999 1.00000
>>>  -7        5.45549     host fre113
>>>  18   hdd  1.81850         osd.18      up  1.00000 1.00000
>>>  19   hdd  1.81850         osd.19      up  1.00000 1.00000
>>>  20   hdd  1.81850         osd.20      up  1.00000 1.00000
>>>  -8        5.45549     host fre115
>>>  21   hdd  1.81850         osd.21      up  1.00000 1.00000
>>>  22   hdd  1.81850         osd.22      up  1.00000 1.00000
>>>  23   hdd  1.81850         osd.23      up  1.00000 1.00000
>>> -10        5.45549     host fre117
>>>  24   hdd  1.81850         osd.24      up  1.00000 1.00000
>>>  25   hdd  1.81850         osd.25      up  1.00000 1.00000
>>>  26   hdd  1.81850         osd.26      up  1.00000 1.00000
>>> -11        5.45549     host fre119
>>>  27   hdd  1.81850         osd.27      up  1.00000 1.00000
>>>  28   hdd  1.81850         osd.28      up  1.00000 1.00000
>>>  29   hdd  1.81850         osd.29      up  1.00000 1.00000
>>> -12        5.45549     host fre121
>>>  30   hdd  1.81850         osd.30      up  1.00000 1.00000
>>>  31   hdd  1.81850         osd.31      up  1.00000 1.00000
>>>  32   hdd  1.81850         osd.32      up  1.00000 1.00000
>>> -13        5.45549     host fre123
>>>  33   hdd  1.81850         osd.33      up  1.00000 1.00000
>>>  34   hdd  1.81850         osd.34      up  1.00000 1.00000
>>>  35   hdd  1.81850         osd.35      up  1.00000 1.00000
>>> -27        5.45549     host fre201
>>>  36   hdd  1.81850         osd.36      up  1.00000 1.00000
>>>  37   hdd  1.81850         osd.37      up  1.00000 1.00000
>>>  38   hdd  1.81850         osd.38      up  1.00000 1.00000
>>> [root@fre201 ~]#
>>> [root@fre201 ~]#
>>> [root@fre201 ~]#
>>> [root@fre201 ~]#
>>> [root@fre201 ~]#
>>> [root@fre201 ~]# ceph -s
>>>   cluster:
>>>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>>>     health: HEALTH_ERR
>>>             3 pools have many more objects per pg than average
>>>             585791/12391450 objects misplaced (4.727%)
>>>             2 scrub errors
>>>             2374 PGs pending on creation
>>>             Reduced data availability: 6578 pgs inactive, 2025 pgs down,
>>> 74 pgs peering, 1234 pgs stale
>>>             Possible data damage: 2 pgs inconsistent
>>>             Degraded data redundancy: 64969/12391450 objects degraded
>>> (0.524%), 616 pgs degraded, 20 pgs undersized
>>>             96242 slow requests are blocked > 32 sec
>>>             228 stuck requests are blocked > 4096 sec
>>>             too many PGs per OSD (2768 > max 200)
>>>
>>>   services:
>>>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>>     osd: 39 osds: 39 up, 39 in; 96 remapped pgs
>>>     rgw: 1 daemon active
>>>
>>>   data:
>>>     pools:   18 pools, 54656 pgs
>>>     objects: 6050k objects, 10942 GB
>>>     usage:   21900 GB used, 50721 GB / 72622 GB avail
>>>     pgs:     0.002% pgs unknown
>>>              12.050% pgs not active
>>>              64969/12391450 objects degraded (0.524%)
>>>              585791/12391450 objects misplaced (4.727%)
>>>              47489 active+clean
>>>              3670  activating
>>>              1098  stale+down
>>>              923   down
>>>              575   activating+degraded
>>>              563   stale+active+clean
>>>              105   stale+activating
>>>              78    activating+remapped
>>>              72    peering
>>>              25    stale+activating+degraded
>>>              23    stale+activating+remapped
>>>              9     stale+active+undersized
>>>              6     stale+activating+undersized+degraded+remapped
>>>              5     stale+active+undersized+degraded
>>>              4     down+remapped
>>>              4     activating+degraded+remapped
>>>              2     active+clean+inconsistent
>>>              1     stale+activating+degraded+remapped
>>>              1     stale+active+clean+remapped
>>>              1     stale+remapped+peering
>>>              1     remapped+peering
>>>              1     unknown
>>>
>>>   io:
>>>     client:   0 B/s rd, 208 kB/s wr, 22 op/s rd, 22 op/s wr
>>>
>>>
>>>
>>> Thanks
>>> Arun
>>>
>>>
>>> On Thu, Jan 3, 2019 at 7:19 PM Chris <bitskr...@bitskrieg.net> wrote:
>>>
>>>> If you added OSDs and then deleted them repeatedly without waiting for
>>>> replication to finish as the cluster attempted to re-balance across them,
>>>> its highly likely that you are permanently missing PGs (especially if the
>>>> disks were zapped each time).
>>>>
>>>> If those 3 down OSDs can be revived there is a (small) chance that you
>>>> can right the ship, but 1400pg/OSD is pretty extreme.  I'm surprised
>>>> the cluster even let you do that - this sounds like a data loss event.
>>>>
>>>> Bring back the 3 OSD and see what those 2 inconsistent pgs look like
>>>> with ceph pg query.
>>>>
>>>> On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Recently I tried adding a new node (OSD) to ceph cluster using
>>>>> ceph-deploy tool. Since I was experimenting with tool and ended up 
>>>>> deleting
>>>>> OSD nodes on new server couple of times.
>>>>>
>>>>> Now since ceph OSDs are running on new server cluster PGs seems to be
>>>>> inactive (10-15%) and they are not recovering or rebalancing. Not sure 
>>>>> what
>>>>> to do. I tried shutting down OSDs on new server.
>>>>>
>>>>> Status:
>>>>> [root@fre105 ~]# ceph -s
>>>>> 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
>>>>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed 
>>>>> to
>>>>> bind the UNIX domain socket to
>>>>> '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
>>>>> No such file or directory
>>>>>   cluster:
>>>>>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>>>>>     health: HEALTH_ERR
>>>>>             3 pools have many more objects per pg than average
>>>>>             373907/12391198 objects misplaced (3.018%)
>>>>>             2 scrub errors
>>>>>             9677 PGs pending on creation
>>>>>             Reduced data availability: 7145 pgs inactive, 6228 pgs
>>>>> down, 1 pg peering, 2717 pgs stale
>>>>>             Possible data damage: 2 pgs inconsistent
>>>>>             Degraded data redundancy: 178350/12391198 objects degraded
>>>>> (1.439%), 346 pgs degraded, 1297 pgs undersized
>>>>>             52486 slow requests are blocked > 32 sec
>>>>>             9287 stuck requests are blocked > 4096 sec
>>>>>             too many PGs per OSD (2968 > max 200)
>>>>>
>>>>>   services:
>>>>>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>>>>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>>>>     osd: 39 osds: 36 up, 36 in; 51 remapped pgs
>>>>>     rgw: 1 daemon active
>>>>>
>>>>>   data:
>>>>>     pools:   18 pools, 54656 pgs
>>>>>     objects: 6050k objects, 10941 GB
>>>>>     usage:   21727 GB used, 45308 GB / 67035 GB avail
>>>>>     pgs:     13.073% pgs not active
>>>>>              178350/12391198 objects degraded (1.439%)
>>>>>              373907/12391198 objects misplaced (3.018%)
>>>>>              46177 active+clean
>>>>>              5054  down
>>>>>              1173  stale+down
>>>>>              1084  stale+active+undersized
>>>>>              547   activating
>>>>>              201   stale+active+undersized+degraded
>>>>>              158   stale+activating
>>>>>              96    activating+degraded
>>>>>              46    stale+active+clean
>>>>>              42    activating+remapped
>>>>>              34    stale+activating+degraded
>>>>>              23    stale+activating+remapped
>>>>>              6     stale+activating+undersized+degraded+remapped
>>>>>              6     activating+undersized+degraded+remapped
>>>>>              2     activating+degraded+remapped
>>>>>              2     active+clean+inconsistent
>>>>>              1     stale+activating+degraded+remapped
>>>>>              1     stale+active+clean+remapped
>>>>>              1     stale+remapped
>>>>>              1     down+remapped
>>>>>              1     remapped+peering
>>>>>
>>>>>   io:
>>>>>     client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr
>>>>>
>>>>> Thanks
>>>>> --
>>>>> Arun Poonia
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Arun Poonia
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
> --
> Arun Poonia
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help Ceph Cluster Down

Reply via email to