If you added OSDs and then deleted them repeatedly without waiting for replication to finish as the cluster attempted to re-balance across them, its highly likely that you are permanently missing PGs (especially if the disks were zapped each time).

If those 3 down OSDs can be revived there is a (small) chance that you can right the ship, but 1400pg/OSD is pretty extreme. I'm surprised the cluster even let you do that - this sounds like a data loss event.


Bring back the 3 OSD and see what those 2 inconsistent pgs look like with ceph pg query.

On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net> wrote:
Hi,

Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy tool. Since I was experimenting with tool and ended up deleting OSD nodes on new server couple of times.

Now since ceph OSDs are running on new server cluster PGs seems to be inactive (10-15%) and they are not recovering or rebalancing. Not sure what to do. I tried shutting down OSDs on new server.

Status:
[root@fre105 ~]# ceph -s
2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2) No such file or directory
 cluster:
   id:     adb9ad8e-f458-4124-bf58-7963a8d1391f
   health: HEALTH_ERR
           3 pools have many more objects per pg than average
           373907/12391198 objects misplaced (3.018%)
           2 scrub errors
           9677 PGs pending on creation
Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering, 2717 pgs stale
           Possible data damage: 2 pgs inconsistent
Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346 pgs degraded, 1297 pgs undersized
           52486 slow requests are blocked > 32 sec
           9287 stuck requests are blocked > 4096 sec
           too many PGs per OSD (2968 > max 200)

 services:
   mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
   mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
   osd: 39 osds: 36 up, 36 in; 51 remapped pgs
   rgw: 1 daemon active

 data:
   pools:   18 pools, 54656 pgs
   objects: 6050k objects, 10941 GB
   usage:   21727 GB used, 45308 GB / 67035 GB avail
   pgs:     13.073% pgs not active
            178350/12391198 objects degraded (1.439%)
            373907/12391198 objects misplaced (3.018%)
            46177 active+clean
            5054  down
            1173  stale+down
            1084  stale+active+undersized
            547   activating
            201   stale+active+undersized+degraded
            158   stale+activating
            96    activating+degraded
            46    stale+active+clean
            42    activating+remapped
            34    stale+activating+degraded
            23    stale+activating+remapped
            6     stale+activating+undersized+degraded+remapped
            6     activating+undersized+degraded+remapped
            2     activating+degraded+remapped
            2     active+clean+inconsistent
            1     stale+activating+degraded+remapped
            1     stale+active+clean+remapped
            1     stale+remapped
            1     down+remapped
            1     remapped+peering

 io:
   client:   0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr

Thanks
--
Arun Poonia

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to