On Wed, 24 Sep 2014, Sahana Lokeshappa wrote: > 2.a9 518 0 0 0 0 2172649472 3001 > 3001 active+clean 2014-09-22 17:49:35.357586 6826'35762 > 17842:72706 [12,7,28] 12 [12,7,28] 12 6826'35762 > 2014-09-22 11:33:55.985449 0'0 2014-09-16 20:11:32.693864
Can you verify that 2.a9 exists in teh data directory for 12, 7, and/or 28? If so the next step would be to enable logging (debug osd = 20, debug ms = 1) and see wy peering is stuck... sage > > 0.59 0 0 0 0 0 0 0 0 > active+clean 2014-09-22 17:50:00.751218 0'0 17842:4472 > [12,41,2] 12 [12,41,2] 12 0'0 2014-09-22 > 16:47:09.315499 0'0 2014-09-16 12:20:48.618726 > > 0.4d 0 0 0 0 0 0 4 4 > stale+down+peering 2014-09-18 17:51:10.038247 186'4 > 11134:498 [12,56,27] 12 [12,56,27] 12 186'4 > 2014-09-18 17:30:32.393188 0'0 2014-09-16 12:20:48.615322 > > 0.49 0 0 0 0 0 0 0 0 > stale+down+peering 2014-09-18 17:44:52.681513 0'0 > 11134:498 [12,6,25] 12 [12,6,25] 12 0'0 > 2014-09-18 17:16:12.986658 0'0 2014-09-16 12:20:48.614192 > > 0.1c 0 0 0 0 0 0 12 12 > stale+down+peering 2014-09-18 17:51:16.735549 186'12 > 11134:522 [12,25,23] 12 [12,25,23] 12 186'12 > 2014-09-18 17:16:04.457863 186'10 2014-09-16 14:23:58.731465 > > 2.17 510 0 0 0 0 2139095040 3001 > 3001 active+clean 2014-09-22 17:52:20.364754 6784'30742 > 17842:72033 [12,27,23] 12 [12,27,23] 12 6784'30742 > 2014-09-22 00:19:39.905291 0'0 2014-09-16 20:11:17.016299 > > 2.7e8 508 0 0 0 0 2130706432 3433 > 3433 active+clean 2014-09-22 17:52:20.365083 6702'21132 > 17842:64769 [12,25,23] 12 [12,25,23] 12 6702'21132 > 2014-09-22 17:01:20.546126 0'0 2014-09-16 14:42:32.079187 > > 2.6a5 528 0 0 0 0 2214592512 2840 > 2840 active+clean 2014-09-22 22:50:38.092084 6775'34416 > 17842:83221 [12,58,0] 12 [12,58,0] 12 6775'34416 > 2014-09-22 22:50:38.091989 0'0 2014-09-16 20:11:32.703368 > > > > And we couldn?t observe and peering events happening on the primary osd. > > > > $ sudo ceph pg 0.49 query > > Error ENOENT: i don't have pgid 0.49 > > $ sudo ceph pg 0.4d query > > Error ENOENT: i don't have pgid 0.4d > > $ sudo ceph pg 0.1c query > > Error ENOENT: i don't have pgid 0.1c > > > > Not able to explain why the peering was stuck. BTW, Rbd pool doesn?t contain > any data. > > > > Varada > > > > From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] On > Behalf Of Sage Weil > Sent: Monday, September 22, 2014 10:44 PM > To: Sahana Lokeshappa; ceph-users@lists.ceph.com; ceph-us...@ceph.com; > ceph-commun...@lists.ceph.com > Subject: Re: [Ceph-community] Pgs are in stale+down+peering state > > > > Stale means that the primary OSD for the PG went down and the status is > stale. They all seem to be from OSD.12... Seems like something is > preventing that OSD from reporting to the mon? > > sage > > > > On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa > <sahana.lokesha...@sandisk.com> wrote: > > Hi all, > > > > I used command ?ceph osd thrash ? command and after all osds are up > and in, 3 pgs are in stale+down+peering state > > > > sudo ceph -s > > cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758 > > health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; > 3 pgs stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean > > monmap e1: 3 mons > at{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ra > m-3=10.242.42.188:6789/0}, election epoch 2008, quorum 0,1,2 > rack2-ram-1,rack2-ram-2,rack2-ram-3 > > osdmap e17031: 64 osds: 64 up, 64 in > > pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033 > kobjects > > 12501 GB used, 10975 GB / 23476 GB avail > > 2145 active+clean > > 3 stale+down+peering > > > > sudo ceph health detail > > HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck > inactive; 3 pgs stuck stale; 3 pgs stuck unclean > > pg 0.4d is stuck inactive for 341048.948643, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck inactive for 341048.948667, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck inactive for 341048.949362, current state > stale+down+peering, last acting [12,25,23] > > pg 0.4d is stuck unclean for 341048.948665, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck unclean for 341048.948687, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck unclean for 341048.949382, current state > stale+down+peering, last acting [12,25,23] > > pg 0.4d is stuck stale for 339823.956929, current state > stale+down+peering, last acting [12,56,27] > > pg 0.49 is stuck stale for 339823.956930, current state > stale+down+peering, last acting [12,6,25] > > pg 0.1c is stuck stale for 339823.956925, current state > stale+down+peering, last acting [12,25,23] > > > > > > Please, can anyone explain why pgs are in this state. > > Sahana Lokeshappa > Test Development Engineer I > SanDisk Corporation > 3rd Floor, Bagmane Laurel, Bagmane Tech Park > > C V Raman nagar, Bangalore 560093 > T: +918042422283 > > sahana.lokesha...@sandisk.com > > > > > > > ____________________________________________________________________________ > > > > PLEASE NOTE: The information contained in this electronic mail > message is intended only for the use of the designated > recipient(s) named above. If the reader of this message is not > the intended recipient, you are hereby notified that you have > received this message in error and that any review, > dissemination, distribution, or copying of this message is > strictly prohibited. If you have received this communication in > error, please notify the sender by telephone or e-mail (as shown > above) immediately and destroy any and all copies of this > message in your possession (whether hard copies or > electronically stored copies). > > ____________________________________________________________________________ > > Ceph-community mailing list > ceph-commun...@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com > > > -- > Sent from Kaiten Mail. Please excuse my brevity. > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com