If you added OSDs and then deleted them repeatedly without waiting for
replication to finish as the cluster attempted to re-balance across them,
its highly likely that you are permanently missing PGs (especially if the
disks were zapped each time).
If those 3 down OSDs can be revived there is a (small) chance that you can
right the ship, but 1400pg/OSD is pretty extreme. I'm surprised the
cluster even let you do that - this sounds like a data loss event.
Bring back the 3 OSD and see what those 2 inconsistent pgs look like with
ceph pg query.
On January 3, 2019 21:59:38 Arun POONIA <arun.poo...@nuagenetworks.net> wrote:
Hi,
Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy
tool. Since I was experimenting with tool and ended up deleting OSD nodes
on new server couple of times.
Now since ceph OSDs are running on new server cluster PGs seems to be
inactive (10-15%) and they are not recovering or rebalancing. Not sure what
to do. I tried shutting down OSDs on new server.
Status:
[root@fre105 ~]# ceph -s
2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0)
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to
bind the UNIX domain socket to
'/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2)
No such file or directory
cluster:
id: adb9ad8e-f458-4124-bf58-7963a8d1391f
health: HEALTH_ERR
3 pools have many more objects per pg than average
373907/12391198 objects misplaced (3.018%)
2 scrub errors
9677 PGs pending on creation
Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering,
2717 pgs stale
Possible data damage: 2 pgs inconsistent
Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346
pgs degraded, 1297 pgs undersized
52486 slow requests are blocked > 32 sec
9287 stuck requests are blocked > 4096 sec
too many PGs per OSD (2968 > max 200)
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03
mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02
osd: 39 osds: 36 up, 36 in; 51 remapped pgs
rgw: 1 daemon active
data:
pools: 18 pools, 54656 pgs
objects: 6050k objects, 10941 GB
usage: 21727 GB used, 45308 GB / 67035 GB avail
pgs: 13.073% pgs not active
178350/12391198 objects degraded (1.439%)
373907/12391198 objects misplaced (3.018%)
46177 active+clean
5054 down
1173 stale+down
1084 stale+active+undersized
547 activating
201 stale+active+undersized+degraded
158 stale+activating
96 activating+degraded
46 stale+active+clean
42 activating+remapped
34 stale+activating+degraded
23 stale+activating+remapped
6 stale+activating+undersized+degraded+remapped
6 activating+undersized+degraded+remapped
2 activating+degraded+remapped
2 active+clean+inconsistent
1 stale+activating+degraded+remapped
1 stale+active+clean+remapped
1 stale+remapped
1 down+remapped
1 remapped+peering
io:
client: 0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr
Thanks
--
Arun Poonia
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com