Re: [ceph-users] Help Ceph Cluster Down

Chris Thu, 03 Jan 2019 19:20:25 -0800

If you added OSDs and then deleted them repeatedly without waiting forreplication to finish as the cluster attempted to re-balance across them,its highly likely that you are permanently missing PGs (especially if thedisks were zapped each time).

If those 3 down OSDs can be revived there is a (small) chance that you canright the ship, but 1400pg/OSD is pretty extreme. I'm surprised thecluster even let you do that - this sounds like a data loss event.

Bring back the 3 OSD and see what those 2 inconsistent pgs look like withceph pg query.


On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net> wrote:

Hi,

Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploytool. Since I was experimenting with tool and ended up deleting OSD nodeson new server couple of times.

Now since ceph OSDs are running on new server cluster PGs seems to beinactive (10-15%) and they are not recovering or rebalancing. Not sure whatto do. I tried shutting down OSDs on new server.


Status:
[root@fre105 ~]# ceph -s

2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed tobind the UNIX domain socket to'/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)No such file or directory

 cluster:
   id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
   health: HEALTH_ERR
           3 pools have many more objects per pg than average
           373907/12391198 objects misplaced (3.018%)
           2 scrub errors
           9677 PGs pending on creation

Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering,2717 pgs stale

           Possible data damage: 2 pgs inconsistent

Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346pgs degraded, 1297 pgs undersized

           52486 slow requests are blocked > 32 sec
           9287 stuck requests are blocked > 4096 sec
           too many PGs per OSD (2968 > max 200)

 services:
   mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
   mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
   osd: 39 osds: 36 up, 36 in; 51 remapped pgs
   rgw: 1 daemon active

 data:
   pools:   18 pools, 54656 pgs
   objects: 6050k objects, 10941 GB
   usage:   21727 GB used, 45308 GB / 67035 GB avail
   pgs:     13.073% pgs not active
            178350/12391198 objects degraded (1.439%)
            373907/12391198 objects misplaced (3.018%)
            46177 active+clean
            5054  down
            1173  stale+down
            1084  stale+active+undersized
            547   activating
            201   stale+active+undersized+degraded
            158   stale+activating
            96    activating+degraded
            46    stale+active+clean
            42    activating+remapped
            34    stale+activating+degraded
            23    stale+activating+remapped
            6     stale+activating+undersized+degraded+remapped
            6     activating+undersized+degraded+remapped
            2     activating+degraded+remapped
            2     active+clean+inconsistent
            1     stale+activating+degraded+remapped
            1     stale+active+clean+remapped
            1     stale+remapped
            1     down+remapped
            1     remapped+peering

 io:
   client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr

Thanks
--
Arun Poonia

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help Ceph Cluster Down

Reply via email to