On Wed, 24 Sep 2014, Sahana Lokeshappa wrote:
> 2.a9    518     0       0       0       0       2172649472      3001   
> 3001    active+clean    2014-09-22 17:49:35.357586      6826'35762     
> 17842:72706     [12,7,28]       12      [12,7,28]   12       6826'35762     
> 2014-09-22 11:33:55.985449      0'0     2014-09-16 20:11:32.693864

Can you verify that 2.a9 exists in teh data directory for 12, 7, and/or 
28?  If so the next step would be to enable logging (debug osd = 20, debug 
ms = 1) and see wy peering is stuck...

sage

> 
> 0.59    0       0       0       0       0       0       0       0      
> active+clean    2014-09-22 17:50:00.751218      0'0     17842:4472     
> [12,41,2]       12      [12,41,2]       12      0'0 2014-09-22
> 16:47:09.315499       0'0     2014-09-16 12:20:48.618726
> 
> 0.4d    0       0       0       0       0       0       4       4      
> stale+down+peering      2014-09-18 17:51:10.038247      186'4  
> 11134:498       [12,56,27]      12      [12,56,27]      12  186'4   
> 2014-09-18 17:30:32.393188      0'0     2014-09-16 12:20:48.615322
> 
> 0.49    0       0       0       0       0       0       0       0      
> stale+down+peering      2014-09-18 17:44:52.681513      0'0    
> 11134:498       [12,6,25]       12      [12,6,25]       12  0'0
>      2014-09-18 17:16:12.986658      0'0     2014-09-16 12:20:48.614192
> 
> 0.1c    0       0       0       0       0       0       12      12     
> stale+down+peering      2014-09-18 17:51:16.735549      186'12 
> 11134:522       [12,25,23]      12      [12,25,23]      12  186'12  
> 2014-09-18 17:16:04.457863      186'10  2014-09-16 14:23:58.731465
> 
> 2.17    510     0       0       0       0       2139095040      3001   
> 3001    active+clean    2014-09-22 17:52:20.364754      6784'30742     
> 17842:72033     [12,27,23]      12      [12,27,23]  12       6784'30742     
> 2014-09-22 00:19:39.905291      0'0     2014-09-16 20:11:17.016299
> 
> 2.7e8   508     0       0       0       0       2130706432      3433   
> 3433    active+clean    2014-09-22 17:52:20.365083      6702'21132     
> 17842:64769     [12,25,23]      12      [12,25,23]  12       6702'21132     
> 2014-09-22 17:01:20.546126      0'0     2014-09-16 14:42:32.079187
> 
> 2.6a5   528     0       0       0       0       2214592512      2840   
> 2840    active+clean    2014-09-22 22:50:38.092084      6775'34416     
> 17842:83221     [12,58,0]       12      [12,58,0]   12       6775'34416     
> 2014-09-22 22:50:38.091989      0'0     2014-09-16 20:11:32.703368
> 
>  
> 
> And we couldn?t observe and peering events happening on the primary osd.
> 
>  
> 
> $ sudo ceph pg 0.49 query
> 
> Error ENOENT: i don't have pgid 0.49
> 
> $ sudo ceph pg 0.4d query
> 
> Error ENOENT: i don't have pgid 0.4d
> 
> $ sudo ceph pg 0.1c query
> 
> Error ENOENT: i don't have pgid 0.1c
> 
>  
> 
> Not able to explain why the peering was stuck. BTW, Rbd pool doesn?t contain
> any data.
> 
>  
> 
> Varada
> 
>  
> 
> From: Ceph-community [mailto:ceph-community-boun...@lists.ceph.com] On
> Behalf Of Sage Weil
> Sent: Monday, September 22, 2014 10:44 PM
> To: Sahana Lokeshappa; ceph-users@lists.ceph.com; ceph-us...@ceph.com;
> ceph-commun...@lists.ceph.com
> Subject: Re: [Ceph-community] Pgs are in stale+down+peering state
> 
>  
> 
> Stale means that the primary OSD for the PG went down and the status is
> stale.  They all seem to be from OSD.12... Seems like something is
> preventing that OSD from reporting to the mon?
> 
> sage
> 
>  
> 
> On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa
> <sahana.lokesha...@sandisk.com> wrote:
> 
>       Hi all,
> 
>        
> 
>       I used command  ?ceph osd thrash ? command and after all osds are up
>       and in, 3  pgs are in  stale+down+peering state
> 
>        
> 
>       sudo ceph -s
> 
>           cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
> 
>            health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale;
>       3 pgs stuck inactive; 3 pgs stuck stale; 3 pgs stuck unclean
> 
>            monmap e1: 3 mons 
> at{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ra
>       m-3=10.242.42.188:6789/0}, election epoch 2008, quorum 0,1,2
>       rack2-ram-1,rack2-ram-2,rack2-ram-3
> 
>            osdmap e17031: 64 osds: 64 up, 64 in
> 
>             pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033
>       kobjects
> 
>                   12501 GB used, 10975 GB / 23476 GB avail
> 
>                       2145 active+clean
> 
>                          3 stale+down+peering
> 
>        
> 
>       sudo ceph health detail
> 
>       HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck
>       inactive; 3 pgs stuck stale; 3 pgs stuck unclean
> 
>       pg 0.4d is stuck inactive for 341048.948643, current state
>       stale+down+peering, last acting [12,56,27]
> 
>       pg 0.49 is stuck inactive for 341048.948667, current state
>       stale+down+peering, last acting [12,6,25]
> 
>       pg 0.1c is stuck inactive for 341048.949362, current state
>       stale+down+peering, last acting [12,25,23]
> 
>       pg 0.4d is stuck unclean for 341048.948665, current state
>       stale+down+peering, last acting [12,56,27]
> 
>       pg 0.49 is stuck unclean for 341048.948687, current state
>       stale+down+peering, last acting [12,6,25]
> 
>       pg 0.1c is stuck unclean for 341048.949382, current state
>       stale+down+peering, last acting [12,25,23]
> 
>       pg 0.4d is stuck stale for 339823.956929, current state
>       stale+down+peering, last acting [12,56,27]
> 
>       pg 0.49 is stuck stale for 339823.956930, current state
>       stale+down+peering, last acting [12,6,25]
> 
>       pg 0.1c is stuck stale for 339823.956925, current state
>       stale+down+peering, last acting [12,25,23]
> 
>        
> 
>        
> 
>       Please, can anyone explain why pgs are in this state.
> 
>       Sahana Lokeshappa
>       Test Development Engineer I
>       SanDisk Corporation
>       3rd Floor, Bagmane Laurel, Bagmane Tech Park
> 
>       C V Raman nagar, Bangalore 560093
>       T: +918042422283
> 
>       sahana.lokesha...@sandisk.com
> 
>        
> 
>        
> 
> 
> ____________________________________________________________________________
> 
> 
> 
>       PLEASE NOTE: The information contained in this electronic mail
>       message is intended only for the use of the designated
>       recipient(s) named above. If the reader of this message is not
>       the intended recipient, you are hereby notified that you have
>       received this message in error and that any review,
>       dissemination, distribution, or copying of this message is
>       strictly prohibited. If you have received this communication in
>       error, please notify the sender by telephone or e-mail (as shown
>       above) immediately and destroy any and all copies of this
>       message in your possession (whether hard copies or
>       electronically stored copies).
> 
> ____________________________________________________________________________
> 
> Ceph-community mailing list
> ceph-commun...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
> 
> 
> --
> Sent from Kaiten Mail. Please excuse my brevity.
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to