Yes the pg *should *get remapped, but that is not always the case. For discussion on thi, check out the tracker below. Your particular circumstances may be a little different, but the idea is the same.
http://tracker.ceph.com/issues/3806 On Tue, May 3, 2016 at 9:16 AM, Gaurav Bafna <baf...@gmail.com> wrote: > Thanks Tupper for replying. > > Shouldn't the PG be remapped to other OSDs ? > > Yes , removing OSD from the cluster is resulting into full recovery. > But that should not be needed , right ? > > > > On Tue, May 3, 2016 at 6:31 PM, Tupper Cole <tc...@redhat.com> wrote: > > The degraded pgs are mapped to the down OSD and have not mapped to a new > > OSD. Removing the OSD would likely result in a full recovery. > > > > As a note, having two monitors (or any even number of monitors) is not > > recommended. If either monitor goes down you will lose quorum. The > > recommended number of monitors for any cluster is at least three. > > > > On Tue, May 3, 2016 at 8:42 AM, Gaurav Bafna <baf...@gmail.com> wrote: > >> > >> Hi Cephers, > >> > >> I am running a very small cluster of 3 storage and 2 monitor nodes. > >> > >> After I kill 1 osd-daemon, the cluster never recovers fully. 9 PGs > >> remain undersized for unknown reason. > >> > >> After I restart that 1 osd deamon, the cluster recovers in no time . > >> > >> Size of all pools are 3 and min_size is 2. > >> > >> Can anybody please help ? > >> > >> Output of "ceph -s" > >> cluster fac04d85-db48-4564-b821-deebda046261 > >> health HEALTH_WARN > >> 9 pgs degraded > >> 9 pgs stuck degraded > >> 9 pgs stuck unclean > >> 9 pgs stuck undersized > >> 9 pgs undersized > >> recovery 3327/195138 objects degraded (1.705%) > >> pool .users pg_num 512 > pgp_num 8 > >> monmap e2: 2 mons at > >> {dssmon2=10.140.13.13:6789/0,dssmonleader1=10.140.13.11:6789/0} > >> election epoch 1038, quorum 0,1 dssmonleader1,dssmon2 > >> osdmap e857: 69 osds: 68 up, 68 in > >> pgmap v106601: 896 pgs, 9 pools, 435 MB data, 65047 objects > >> 279 GB used, 247 TB / 247 TB avail > >> 3327/195138 objects degraded (1.705%) > >> 887 active+clean > >> 9 active+undersized+degraded > >> client io 395 B/s rd, 0 B/s wr, 0 op/s > >> > >> ceph health detail output : > >> > >> HEALTH_WARN 9 pgs degraded; 9 pgs stuck degraded; 9 pgs stuck unclean; > >> 9 pgs stuck undersized; 9 pgs undersized; recovery 3327/195138 objects > >> degraded (1.705%); pool .users pg_num 512 > pgp_num 8 > >> pg 7.a is stuck unclean for 322742.938959, current state > >> active+undersized+degraded, last acting [38,2] > >> pg 5.27 is stuck unclean for 322754.823455, current state > >> active+undersized+degraded, last acting [26,19] > >> pg 5.32 is stuck unclean for 322750.685684, current state > >> active+undersized+degraded, last acting [39,19] > >> pg 6.13 is stuck unclean for 322732.665345, current state > >> active+undersized+degraded, last acting [30,16] > >> pg 5.4e is stuck unclean for 331869.103538, current state > >> active+undersized+degraded, last acting [16,38] > >> pg 5.72 is stuck unclean for 331871.208948, current state > >> active+undersized+degraded, last acting [16,49] > >> pg 4.17 is stuck unclean for 331822.771240, current state > >> active+undersized+degraded, last acting [47,20] > >> pg 5.2c is stuck unclean for 323021.274535, current state > >> active+undersized+degraded, last acting [47,18] > >> pg 5.37 is stuck unclean for 323007.574395, current state > >> active+undersized+degraded, last acting [43,1] > >> pg 7.a is stuck undersized for 322487.284302, current state > >> active+undersized+degraded, last acting [38,2] > >> pg 5.27 is stuck undersized for 322487.287164, current state > >> active+undersized+degraded, last acting [26,19] > >> pg 5.32 is stuck undersized for 322487.285566, current state > >> active+undersized+degraded, last acting [39,19] > >> pg 6.13 is stuck undersized for 322487.287168, current state > >> active+undersized+degraded, last acting [30,16] > >> pg 5.4e is stuck undersized for 331351.476170, current state > >> active+undersized+degraded, last acting [16,38] > >> pg 5.72 is stuck undersized for 331351.475707, current state > >> active+undersized+degraded, last acting [16,49] > >> pg 4.17 is stuck undersized for 322487.280309, current state > >> active+undersized+degraded, last acting [47,20] > >> pg 5.2c is stuck undersized for 322487.286347, current state > >> active+undersized+degraded, last acting [47,18] > >> pg 5.37 is stuck undersized for 322487.280027, current state > >> active+undersized+degraded, last acting [43,1] > >> pg 7.a is stuck degraded for 322487.284340, current state > >> active+undersized+degraded, last acting [38,2] > >> pg 5.27 is stuck degraded for 322487.287202, current state > >> active+undersized+degraded, last acting [26,19] > >> pg 5.32 is stuck degraded for 322487.285604, current state > >> active+undersized+degraded, last acting [39,19] > >> pg 6.13 is stuck degraded for 322487.287207, current state > >> active+undersized+degraded, last acting [30,16] > >> pg 5.4e is stuck degraded for 331351.476209, current state > >> active+undersized+degraded, last acting [16,38] > >> pg 5.72 is stuck degraded for 331351.475746, current state > >> active+undersized+degraded, last acting [16,49] > >> pg 4.17 is stuck degraded for 322487.280348, current state > >> active+undersized+degraded, last acting [47,20] > >> pg 5.2c is stuck degraded for 322487.286386, current state > >> active+undersized+degraded, last acting [47,18] > >> pg 5.37 is stuck degraded for 322487.280066, current state > >> active+undersized+degraded, last acting [43,1] > >> pg 5.72 is active+undersized+degraded, acting [16,49] > >> pg 5.4e is active+undersized+degraded, acting [16,38] > >> pg 5.32 is active+undersized+degraded, acting [39,19] > >> pg 5.37 is active+undersized+degraded, acting [43,1] > >> pg 5.2c is active+undersized+degraded, acting [47,18] > >> pg 5.27 is active+undersized+degraded, acting [26,19] > >> pg 6.13 is active+undersized+degraded, acting [30,16] > >> pg 4.17 is active+undersized+degraded, acting [47,20] > >> pg 7.a is active+undersized+degraded, acting [38,2] > >> recovery 3327/195138 objects degraded (1.705%) > >> pool .users pg_num 512 > pgp_num 8 > >> > >> > >> My crush map is default. > >> > >> Ceph.conf is : > >> > >> [osd] > >> osd mkfs type=xfs > >> osd recovery threads=2 > >> osd disk thread ioprio class=idle > >> osd disk thread ioprio priority=7 > >> osd journal=/var/lib/ceph/osd/ceph-$id/journal > >> filestore flusher=False > >> osd op num shards=3 > >> debug osd=5 > >> osd disk threads=2 > >> osd data=/var/lib/ceph/osd/ceph-$id > >> osd op num threads per shard=5 > >> osd op threads=4 > >> keyring=/var/lib/ceph/osd/ceph-$id/keyring > >> osd journal size=4096 > >> > >> > >> [global] > >> filestore max sync interval=10 > >> auth cluster required=cephx > >> osd pool default min size=3 > >> osd pool default size=3 > >> public network=10.140.13.0/26 > >> objecter inflight op_bytes=1073741824 > >> auth service required=cephx > >> filestore min sync interval=1 > >> fsid=fac04d85-db48-4564-b821-deebda046261 > >> keyring=/etc/ceph/keyring > >> cluster network=10.140.13.0/26 > >> auth client required=cephx > >> filestore xattr use omap=True > >> max open files=65536 > >> objecter inflight ops=2048 > >> osd pool default pg num=512 > >> log to syslog = true > >> #err to syslog = true > >> > >> > >> -- > >> Gaurav Bafna > >> 9540631400 > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > -- > > > > Thanks, > > Tupper Cole > > Senior Storage Consultant > > Global Storage Consulting, Red Hat > > tc...@redhat.com > > phone: + 01 919-720-2612 > > > > -- > Gaurav Bafna > 9540631400 > -- Thanks, Tupper Cole Senior Storage Consultant Global Storage Consulting, Red Hat tc...@redhat.com phone: + 01 919-720-2612
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com