Re: [ceph-users] Help Ceph Cluster Down

Arun POONIA Fri, 04 Jan 2019 05:24:42 -0800

Hi Caspar,

Yes, cluster was working fine with number of PGs per OSD warning up until
now. I am not sure how to recover from stale down/inactive PGs. If you
happen to know about this can you let me know?


Current State:

[root@fre101 ~]# ceph -s
2019-01-04 05:22:05.942349 7f314f613700 -1 asok(0x7f31480017a0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
bind the UNIX domain socket to
'/var/run/ceph-guests/ceph-client.admin.1053724.139849638091088.asok': (2)
No such file or directory
  cluster:
    id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
    health: HEALTH_ERR
            3 pools have many more objects per pg than average
            505714/12392650 objects misplaced (4.081%)
            3883 PGs pending on creation
            Reduced data availability: 6519 pgs inactive, 1870 pgs down, 1
pg peering, 886 pgs stale
            Degraded data redundancy: 42987/12392650 objects degraded
(0.347%), 634 pgs degraded, 16 pgs undersized
            125827 slow requests are blocked > 32 sec
            2 stuck requests are blocked > 4096 sec
            too many PGs per OSD (2758 > max 200)

  services:
    mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
    mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
    osd: 39 osds: 39 up, 39 in; 76 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   18 pools, 54656 pgs
    objects: 6051k objects, 10944 GB
    usage:   21933 GB used, 50688 GB / 72622 GB avail
    pgs:     11.927% pgs not active
             42987/12392650 objects degraded (0.347%)
             505714/12392650 objects misplaced (4.081%)
             48080 active+clean
             3885  activating
             1111  down
             759   stale+down
             614   activating+degraded
             74    activating+remapped
             46    stale+active+clean
             35    stale+activating
             21    stale+activating+remapped
             9     stale+active+undersized
             9     stale+activating+degraded
             5     stale+activating+undersized+degraded+remapped
             3     activating+degraded+remapped
             1     stale+activating+degraded+remapped
             1     stale+active+undersized+degraded
             1     remapped+peering
             1     active+clean+remapped
             1     activating+undersized+degraded+remapped

  io:
    client:   0 B/s rd, 25397 B/s wr, 4 op/s rd, 4 op/s wr

I will update number of PGs per OSD once these inactive or stale PGs come
online. I am not able to access VMs (VMs, Images) which are using Ceph.

Thanks
Arun

On Fri, Jan 4, 2019 at 4:53 AM Caspar Smit <caspars...@supernas.eu> wrote:

> Hi Arun,
>
> How did you end up with a 'working' cluster with so many pgs per OSD?
>
> "too many PGs per OSD (2968 > max 200)"
>
> To (temporarily) allow this kind of pgs per osd you could try this:
>
> Change these values in the global section in your ceph.conf:
>
> mon max pg per osd = 200
> osd max pg per osd hard ratio = 2
>
> It allows 200*2 = 400 Pgs per OSD before disabling the creation of new
> pgs.
>
> Above are the defaults (for Luminous, maybe other versions too)
> You can check your current settings with:
>
> ceph daemon mon.ceph-mon01 config show |grep pg_per_osd
>
> Since your current pgs per osd ratio is way higher then the default you
> could set them to for instance:
>
> mon max pg per osd = 1000
> osd max pg per osd hard ratio = 5
>
> Which allow for 5000 pgs per osd before disabling creation of new pgs.
>
> You'll need to inject the setting into the mons/osds and restart mgrs to
> make them active.
>
> ceph tell mon.* injectargs ‘--mon_max_pg_per_osd 1000’
> ceph tell mon.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
> ceph tell osd.* injectargs ‘--mon_max_pg_per_osd 1000’
> ceph tell osd.* injectargs ‘--osd_max_pg_per_osd_hard_ratio 5’
> restart mgrs
>
> Kind regards,
> Caspar
>
>
> Op vr 4 jan. 2019 om 04:28 schreef Arun POONIA <
> arun.poo...@nuagenetworks.net>:
>
>> Hi Chris,
>>
>> Indeed that's what happened. I didn't set noout flag either and I did
>> zapped disk on new server every time. In my cluster status fre201 is only
>> new server.
>>
>> Current Status after enabling 3 OSDs on fre201 host.
>>
>> [root@fre201 ~]# ceph osd tree
>> ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
>>  -1       70.92137 root default
>>  -2        5.45549     host fre101
>>   0   hdd  1.81850         osd.0       up  1.00000 1.00000
>>   1   hdd  1.81850         osd.1       up  1.00000 1.00000
>>   2   hdd  1.81850         osd.2       up  1.00000 1.00000
>>  -9        5.45549     host fre103
>>   3   hdd  1.81850         osd.3       up  1.00000 1.00000
>>   4   hdd  1.81850         osd.4       up  1.00000 1.00000
>>   5   hdd  1.81850         osd.5       up  1.00000 1.00000
>>  -3        5.45549     host fre105
>>   6   hdd  1.81850         osd.6       up  1.00000 1.00000
>>   7   hdd  1.81850         osd.7       up  1.00000 1.00000
>>   8   hdd  1.81850         osd.8       up  1.00000 1.00000
>>  -4        5.45549     host fre107
>>   9   hdd  1.81850         osd.9       up  1.00000 1.00000
>>  10   hdd  1.81850         osd.10      up  1.00000 1.00000
>>  11   hdd  1.81850         osd.11      up  1.00000 1.00000
>>  -5        5.45549     host fre109
>>  12   hdd  1.81850         osd.12      up  1.00000 1.00000
>>  13   hdd  1.81850         osd.13      up  1.00000 1.00000
>>  14   hdd  1.81850         osd.14      up  1.00000 1.00000
>>  -6        5.45549     host fre111
>>  15   hdd  1.81850         osd.15      up  1.00000 1.00000
>>  16   hdd  1.81850         osd.16      up  1.00000 1.00000
>>  17   hdd  1.81850         osd.17      up  0.79999 1.00000
>>  -7        5.45549     host fre113
>>  18   hdd  1.81850         osd.18      up  1.00000 1.00000
>>  19   hdd  1.81850         osd.19      up  1.00000 1.00000
>>  20   hdd  1.81850         osd.20      up  1.00000 1.00000
>>  -8        5.45549     host fre115
>>  21   hdd  1.81850         osd.21      up  1.00000 1.00000
>>  22   hdd  1.81850         osd.22      up  1.00000 1.00000
>>  23   hdd  1.81850         osd.23      up  1.00000 1.00000
>> -10        5.45549     host fre117
>>  24   hdd  1.81850         osd.24      up  1.00000 1.00000
>>  25   hdd  1.81850         osd.25      up  1.00000 1.00000
>>  26   hdd  1.81850         osd.26      up  1.00000 1.00000
>> -11        5.45549     host fre119
>>  27   hdd  1.81850         osd.27      up  1.00000 1.00000
>>  28   hdd  1.81850         osd.28      up  1.00000 1.00000
>>  29   hdd  1.81850         osd.29      up  1.00000 1.00000
>> -12        5.45549     host fre121
>>  30   hdd  1.81850         osd.30      up  1.00000 1.00000
>>  31   hdd  1.81850         osd.31      up  1.00000 1.00000
>>  32   hdd  1.81850         osd.32      up  1.00000 1.00000
>> -13        5.45549     host fre123
>>  33   hdd  1.81850         osd.33      up  1.00000 1.00000
>>  34   hdd  1.81850         osd.34      up  1.00000 1.00000
>>  35   hdd  1.81850         osd.35      up  1.00000 1.00000
>> -27        5.45549     host fre201
>>  36   hdd  1.81850         osd.36      up  1.00000 1.00000
>>  37   hdd  1.81850         osd.37      up  1.00000 1.00000
>>  38   hdd  1.81850         osd.38      up  1.00000 1.00000
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]#
>> [root@fre201 ~]# ceph -s
>>   cluster:
>>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>>     health: HEALTH_ERR
>>             3 pools have many more objects per pg than average
>>             585791/12391450 objects misplaced (4.727%)
>>             2 scrub errors
>>             2374 PGs pending on creation
>>             Reduced data availability: 6578 pgs inactive, 2025 pgs down,
>> 74 pgs peering, 1234 pgs stale
>>             Possible data damage: 2 pgs inconsistent
>>             Degraded data redundancy: 64969/12391450 objects degraded
>> (0.524%), 616 pgs degraded, 20 pgs undersized
>>             96242 slow requests are blocked > 32 sec
>>             228 stuck requests are blocked > 4096 sec
>>             too many PGs per OSD (2768 > max 200)
>>
>>   services:
>>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>     osd: 39 osds: 39 up, 39 in; 96 remapped pgs
>>     rgw: 1 daemon active
>>
>>   data:
>>     pools:   18 pools, 54656 pgs
>>     objects: 6050k objects, 10942 GB
>>     usage:   21900 GB used, 50721 GB / 72622 GB avail
>>     pgs:     0.002% pgs unknown
>>              12.050% pgs not active
>>              64969/12391450 objects degraded (0.524%)
>>              585791/12391450 objects misplaced (4.727%)
>>              47489 active+clean
>>              3670  activating
>>              1098  stale+down
>>              923   down
>>              575   activating+degraded
>>              563   stale+active+clean
>>              105   stale+activating
>>              78    activating+remapped
>>              72    peering
>>              25    stale+activating+degraded
>>              23    stale+activating+remapped
>>              9     stale+active+undersized
>>              6     stale+activating+undersized+degraded+remapped
>>              5     stale+active+undersized+degraded
>>              4     down+remapped
>>              4     activating+degraded+remapped
>>              2     active+clean+inconsistent
>>              1     stale+activating+degraded+remapped
>>              1     stale+active+clean+remapped
>>              1     stale+remapped+peering
>>              1     remapped+peering
>>              1     unknown
>>
>>   io:
>>     client:   0 B/s rd, 208 kB/s wr, 22 op/s rd, 22 op/s wr
>>
>>
>>
>> Thanks
>> Arun
>>
>>
>> On Thu, Jan 3, 2019 at 7:19 PM Chris <bitskr...@bitskrieg.net> wrote:
>>
>>> If you added OSDs and then deleted them repeatedly without waiting for
>>> replication to finish as the cluster attempted to re-balance across them,
>>> its highly likely that you are permanently missing PGs (especially if the
>>> disks were zapped each time).
>>>
>>> If those 3 down OSDs can be revived there is a (small) chance that you
>>> can right the ship, but 1400pg/OSD is pretty extreme.  I'm surprised
>>> the cluster even let you do that - this sounds like a data loss event.
>>>
>>> Bring back the 3 OSD and see what those 2 inconsistent pgs look like
>>> with ceph pg query.
>>>
>>> On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Recently I tried adding a new node (OSD) to ceph cluster using
>>>> ceph-deploy tool. Since I was experimenting with tool and ended up deleting
>>>> OSD nodes on new server couple of times.
>>>>
>>>> Now since ceph OSDs are running on new server cluster PGs seems to be
>>>> inactive (10-15%) and they are not recovering or rebalancing. Not sure what
>>>> to do. I tried shutting down OSDs on new server.
>>>>
>>>> Status:
>>>> [root@fre105 ~]# ceph -s
>>>> 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
>>>> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
>>>> bind the UNIX domain socket to
>>>> '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
>>>> No such file or directory
>>>>   cluster:
>>>>     id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
>>>>     health: HEALTH_ERR
>>>>             3 pools have many more objects per pg than average
>>>>             373907/12391198 objects misplaced (3.018%)
>>>>             2 scrub errors
>>>>             9677 PGs pending on creation
>>>>             Reduced data availability: 7145 pgs inactive, 6228 pgs
>>>> down, 1 pg peering, 2717 pgs stale
>>>>             Possible data damage: 2 pgs inconsistent
>>>>             Degraded data redundancy: 178350/12391198 objects degraded
>>>> (1.439%), 346 pgs degraded, 1297 pgs undersized
>>>>             52486 slow requests are blocked > 32 sec
>>>>             9287 stuck requests are blocked > 4096 sec
>>>>             too many PGs per OSD (2968 > max 200)
>>>>
>>>>   services:
>>>>     mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
>>>>     mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
>>>>     osd: 39 osds: 36 up, 36 in; 51 remapped pgs
>>>>     rgw: 1 daemon active
>>>>
>>>>   data:
>>>>     pools:   18 pools, 54656 pgs
>>>>     objects: 6050k objects, 10941 GB
>>>>     usage:   21727 GB used, 45308 GB / 67035 GB avail
>>>>     pgs:     13.073% pgs not active
>>>>              178350/12391198 objects degraded (1.439%)
>>>>              373907/12391198 objects misplaced (3.018%)
>>>>              46177 active+clean
>>>>              5054  down
>>>>              1173  stale+down
>>>>              1084  stale+active+undersized
>>>>              547   activating
>>>>              201   stale+active+undersized+degraded
>>>>              158   stale+activating
>>>>              96    activating+degraded
>>>>              46    stale+active+clean
>>>>              42    activating+remapped
>>>>              34    stale+activating+degraded
>>>>              23    stale+activating+remapped
>>>>              6     stale+activating+undersized+degraded+remapped
>>>>              6     activating+undersized+degraded+remapped
>>>>              2     activating+degraded+remapped
>>>>              2     active+clean+inconsistent
>>>>              1     stale+activating+degraded+remapped
>>>>              1     stale+active+clean+remapped
>>>>              1     stale+remapped
>>>>              1     down+remapped
>>>>              1     remapped+peering
>>>>
>>>>   io:
>>>>     client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr
>>>>
>>>> Thanks
>>>> --
>>>> Arun Poonia
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>
>> --
>> Arun Poonia
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Arun Poonia

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help Ceph Cluster Down

Reply via email to