Re: [ceph-users] PG_AVAILABILITY with one osd down?

Maks Kowalik Sat, 16 Feb 2019 11:55:55 -0800

Hello,
your log extract shows that:

2019-02-15 21:40:08 OSD.29 DOWN
2019-02-15 21:40:09 PG_AVAILABILITY warning start
2019-02-15 21:40:15 PG_AVAILABILITY warning cleared


2019-02-15 21:44:06 OSD.29 UP
2019-02-15 21:44:08 PG_AVAILABILITY warning start
2019-02-15 21:44:15 PG_AVAILABILITY warning cleared

What you saw is the natural consequence of OSD state change. Those two
periods of limited PG availability (6s each) are related to peering
that happens shortly after an OSD goes down or up.
Basically, the placement groups stored on that OSD need peering, so
the incoming connections are directed to other (alive) OSDs. And, yes,
during those few seconds the data are not accessible.

Kind regards,
Maks


sob., 16 lut 2019 o 07:25 <jes...@krogh.cc> napisał(a):

> Yesterday I saw this one.. it puzzles me:
> 2019-02-15 21:00:00.000126 mon.torsk1 mon.0 10.194.132.88:6789/0 604164 :
> cluster [INF] overall HEALTH_OK
> 2019-02-15 21:39:55.793934 mon.torsk1 mon.0 10.194.132.88:6789/0 604304 :
> cluster [WRN] Health check failed: 2 slow requests are blocked > 32 sec.
> Implicated osds 58 (REQUEST_SLOW)
> 2019-02-15 21:40:00.887766 mon.torsk1 mon.0 10.194.132.88:6789/0 604305 :
> cluster [WRN] Health check update: 6 slow requests are blocked > 32 sec.
> Implicated osds 9,19,52,58,68 (REQUEST_SLOW)
> 2019-02-15 21:40:06.973901 mon.torsk1 mon.0 10.194.132.88:6789/0 604306 :
> cluster [WRN] Health check update: 14 slow requests are blocked > 32 sec.
> Implicated osds 3,9,19,29,32,52,55,58,68,69 (REQUEST_SLOW)
> 2019-02-15 21:40:08.466266 mon.torsk1 mon.0 10.194.132.88:6789/0 604307 :
> cluster [INF] osd.29 failed (root=default,host=bison) (6 reporters from
> different host after 33.862482 >= grace 29.247323)
> 2019-02-15 21:40:08.473703 mon.torsk1 mon.0 10.194.132.88:6789/0 604308 :
> cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
> 2019-02-15 21:40:09.489494 mon.torsk1 mon.0 10.194.132.88:6789/0 604310 :
> cluster [WRN] Health check failed: Reduced data availability: 6 pgs
> peering (PG_AVAILABILITY)
> 2019-02-15 21:40:11.008906 mon.torsk1 mon.0 10.194.132.88:6789/0 604312 :
> cluster [WRN] Health check failed: Degraded data redundancy:
> 3828291/700353996 objects degraded (0.547%), 77 pgs degraded (PG_DEGRADED)
> 2019-02-15 21:40:13.474777 mon.torsk1 mon.0 10.194.132.88:6789/0 604313 :
> cluster [WRN] Health check update: 9 slow requests are blocked > 32 sec.
> Implicated osds 3,9,32,55,58,69 (REQUEST_SLOW)
> 2019-02-15 21:40:15.060165 mon.torsk1 mon.0 10.194.132.88:6789/0 604314 :
> cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
> availability: 17 pgs peering)
> 2019-02-15 21:40:17.128185 mon.torsk1 mon.0 10.194.132.88:6789/0 604315 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897139/700354131 objects degraded (1.413%), 200 pgs degraded
> (PG_DEGRADED)
> 2019-02-15 21:40:17.128219 mon.torsk1 mon.0 10.194.132.88:6789/0 604316 :
> cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are
> blocked > 32 sec. Implicated osds 32,55)
> 2019-02-15 21:40:22.137090 mon.torsk1 mon.0 10.194.132.88:6789/0 604317 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897140/700354194 objects degraded (1.413%), 200 pgs degraded
> (PG_DEGRADED)
> 2019-02-15 21:40:27.249354 mon.torsk1 mon.0 10.194.132.88:6789/0 604318 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897142/700354287 objects degraded (1.413%), 200 pgs degraded
> (PG_DEGRADED)
> 2019-02-15 21:40:33.335147 mon.torsk1 mon.0 10.194.132.88:6789/0 604322 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897143/700354356 objects degraded (1.413%), 200 pgs degraded
> (PG_DEGRADED)
> ....... shortened ......
> 2019-02-15 21:43:48.496536 mon.torsk1 mon.0 10.194.132.88:6789/0 604366 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897168/700356693 objects degraded (1.413%), 200 pgs degraded, 201 pgs
> undersized (PG_DEGRADED)
> 2019-02-15 21:43:53.496924 mon.torsk1 mon.0 10.194.132.88:6789/0 604367 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897170/700356804 objects degraded (1.413%), 200 pgs degraded, 201 pgs
> undersized (PG_DEGRADED)
> 2019-02-15 21:43:58.497313 mon.torsk1 mon.0 10.194.132.88:6789/0 604368 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897172/700356879 objects degraded (1.413%), 200 pgs degraded, 201 pgs
> undersized (PG_DEGRADED)
> 2019-02-15 21:44:03.497696 mon.torsk1 mon.0 10.194.132.88:6789/0 604369 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897174/700356996 objects degraded (1.413%), 200 pgs degraded, 201 pgs
> undersized (PG_DEGRADED)
> 2019-02-15 21:44:06.939331 mon.torsk1 mon.0 10.194.132.88:6789/0 604372 :
> cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
> 2019-02-15 21:44:06.965401 mon.torsk1 mon.0 10.194.132.88:6789/0 604373 :
> cluster [INF] osd.29 10.194.133.58:6844/305358 boot
> 2019-02-15 21:44:08.498060 mon.torsk1 mon.0 10.194.132.88:6789/0 604376 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 9897174/700357056 objects degraded (1.413%), 200 pgs degraded, 201 pgs
> undersized (PG_DEGRADED)
> 2019-02-15 21:44:08.996099 mon.torsk1 mon.0 10.194.132.88:6789/0 604377 :
> cluster [WRN] Health check failed: Reduced data availability: 12 pgs
> peering (PG_AVAILABILITY)
> 2019-02-15 21:44:13.498472 mon.torsk1 mon.0 10.194.132.88:6789/0 604378 :
> cluster [WRN] Health check update: Degraded data redundancy: 55/700357161
> objects degraded (0.000%), 33 pgs degraded (PG_DEGRADED)
> 2019-02-15 21:44:15.081437 mon.torsk1 mon.0 10.194.132.88:6789/0 604379 :
> cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data
> availability: 12 pgs peering)
> 2019-02-15 21:44:18.498808 mon.torsk1 mon.0 10.194.132.88:6789/0 604380 :
> cluster [WRN] Health check update: Degraded data redundancy: 14/700357230
> objects degraded (0.000%), 9 pgs degraded (PG_DEGRADED)
> 2019-02-15 21:44:19.132797 mon.torsk1 mon.0 10.194.132.88:6789/0 604381 :
> cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data
> redundancy: 14/700357230 objects degraded (0.000%), 9 pgs degraded)
> 2019-02-15 21:44:19.132824 mon.torsk1 mon.0 10.194.132.88:6789/0 604382 :
> cluster [INF] Cluster is now healthy
> 2019-02-15 22:00:00.000117 mon.torsk1 mon.0 10.194.132.88:6789/0 604402 :
> cluster [INF] overall HEALTH_OK
>
> Why do I end up with a PG_AVAILABILITY warning with just one OSD down. We
> have 3x replicated pools and 4+2 EC pools in the system?. Or am I just
> mis-reading what PG_AVAILABILTY means on the docs... but I got to the
> conclusion that "some data is
> inaccessible"?_______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG_AVAILABILITY with one osd down?

Reply via email to