Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Samuel Just
Cool!
-Sam

On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote:
 Sam,

 Thanks that did it :-)

health HEALTH_OK
monmap e17: 5 mons at
 {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0},
 election epoch 9794, quorum 0,1,2,3,4 a,b,c,d,e
osdmap e23445: 14 osds: 13 up, 13 in
 pgmap v13552855: 2102 pgs: 2102 active+clean; 531 GB data, 1564 GB used,
 9350 GB / 10914 GB avail; 13104KB/s rd, 4007KB/s wr, 560op/s
mdsmap e3: 0/0/1 up


 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Can you attach the output of ceph osd tree?

Also, can you run

ceph osd getmap -o /tmp/osdmap

and attach /tmp/osdmap?
-Sam

On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote:
 Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
 then restarting it, waiting 2 minutes and then doing the next one (all OSD's
 eventually restarted).  I tried this twice.

 --

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

I've attached both files.

Thanks!
Jeff

On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
 Can you attach the output of ceph osd tree?
 
 Also, can you run
 
 ceph osd getmap -o /tmp/osdmap
 
 and attach /tmp/osdmap?
 -Sam
 
 On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote:
  Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
  then restarting it, waiting 2 minutes and then doing the next one (all OSD's
  eventually restarted).  I tried this twice.
 
  --
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

# idweight  type name   up/down reweight
-1  14.61   root default
-3  14.61   rack unknownrack
-2  2.783   host ceph1
0   0.919   osd.0   up  1   
1   0.932   osd.1   up  1   
2   0.932   osd.2   up  0   
-5  2.783   host ceph2
3   0.919   osd.3   down0   
4   0.932   osd.4   up  1   
5   0.932   osd.5   up  1   
-4  3.481   host ceph3
10  0.699   osd.10  up  1   
6   0.685   osd.6   up  1   
7   0.699   osd.7   up  1   
8   0.699   osd.8   up  1   
9   0.699   osd.9   up  1   
-6  2.783   host ceph4
14  0.919   osd.14  down0   
15  0.932   osd.15  up  1   
16  0.932   osd.16  down0   
-7  2.782   host ceph5
11  0.92osd.11  up  0   
12  0.931   osd.12  up  1   
13  0.931   osd.13  up  1   



osdmap
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Are you using any kernel clients?  Will osds 3,14,16 be coming back?
-Sam

On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote:
 Sam,

 I've attached both files.

 Thanks!
 Jeff

 On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
 Can you attach the output of ceph osd tree?

 Also, can you run

 ceph osd getmap -o /tmp/osdmap

 and attach /tmp/osdmap?
 -Sam

 On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote:
  Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
  then restarting it, waiting 2 minutes and then doing the next one (all 
  OSD's
  eventually restarted).  I tried this twice.
 
  --
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Jeff Moskow
Sam,

3, 14 and 16 have been down for a while and I'll eventually replace 
those drives (I could do it now)
but didn't want to introduce more variables.

We are using RBD with Proxmox, so I think the answer about kernel 
clients is yes

Jeff

On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote:
 Are you using any kernel clients?  Will osds 3,14,16 be coming back?
 -Sam
 
 On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote:
  Sam,
 
  I've attached both files.
 
  Thanks!
  Jeff
 
  On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote:
  Can you attach the output of ceph osd tree?
 
  Also, can you run
 
  ceph osd getmap -o /tmp/osdmap
 
  and attach /tmp/osdmap?
  -Sam
 
  On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote:
   Thanks for the suggestion.  I had tried stopping each OSD for 30 seconds,
   then restarting it, waiting 2 minutes and then doing the next one (all 
   OSD's
   eventually restarted).  I tried this twice.
  
   --
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  --
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Hi,

I have a 5 node ceph cluster that is running well (no problems using 
any of the
rbd images and that's really all we use).  

I have replication set to 3 on all three pools (data, metadata and rbd).

ceph -s reports:
health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; 
recovery 5746/384795 degraded (1.493%)

I have tried everything I could think of to clear/fix those errors and 
they persist.

Most of them appear to be a problem with not having 3 copies

0.2a0   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.874427  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 08:59:34.035198  0'0 2013-07-29 01:49:40.018625
4.1d9   260 0   238 0   1021055488  0   0   
active+remapped 2013-08-06 05:56:20.447612  21920'12710 21920'53408 
[6,13]  [6,13,4]0'0 2013-08-05 06:59:44.717555  0'0 2013-08-05 
06:59:44.717555
1.1dc   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687830  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:51.226012  0'0 2013-07-28 23:47:13.404512
0.1dd   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687525  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:45.258459  0'0 2013-08-01 05:58:17.141625
1.29f   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.882865  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 09:01:40.075441  0'0 2013-07-29 01:53:10.068503
1.118   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.081067  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:20:03.933842  0'0 2034-02-12 23:20:03.933842
0.119   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.095446  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:18:07.310080  0'0 2034-02-12 23:18:07.310080
4.115   248 0   226 0   987364352   0   0   
active+remapped 2013-08-06 05:50:34.112139  21920'6840  21920'42982 
[8,15]  [8,15,5]0'0 2013-08-05 06:59:18.303823  0'0 2013-08-05 
06:59:18.303823
4.4a241 0   286 0   941573120   0   0   
active+degraded 2013-08-06 12:00:47.758742  21920'85238 21920'206648
[4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726  0'0 2013-08-05 
06:58:36.681726
0.4e0   0   0   0   0   0   0   active+remapped 
2013-08-06 12:00:47.765391  0'0 21920'489   [4,6]   [4,6,1] 0'0 
2013-08-04 08:58:12.783265  0'0 2013-07-28 14:21:38.227970


Can anyone suggest a way to clear this up?

Thanks!
Jeff


-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Wido den Hollander

On 08/09/2013 10:58 AM, Jeff Moskow wrote:

Hi,

I have a 5 node ceph cluster that is running well (no problems using 
any of the
rbd images and that's really all we use).

I have replication set to 3 on all three pools (data, metadata and rbd).

ceph -s reports:
health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; 
recovery 5746/384795 degraded (1.493%)

I have tried everything I could think of to clear/fix those errors and 
they persist.



Did you restart the primary OSD for that PGs?

Wido


Most of them appear to be a problem with not having 3 copies

0.2a0   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.874427  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 08:59:34.035198  0'0 2013-07-29 01:49:40.018625
4.1d9   260 0   238 0   1021055488  0   0   
active+remapped 2013-08-06 05:56:20.447612  21920'12710 21920'53408 
[6,13]  [6,13,4]0'0 2013-08-05 06:59:44.717555  0'0 2013-08-05 
06:59:44.717555
1.1dc   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687830  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:51.226012  0'0 2013-07-28 23:47:13.404512
0.1dd   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:55:44.687525  0'0 21920'3003  [6,13]  [6,13,4]
0'0 2013-08-04 10:56:45.258459  0'0 2013-08-01 05:58:17.141625
1.29f   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:40:07.882865  0'0 21920'388   [4,7]   [4,7,8] 0'0 
2013-08-04 09:01:40.075441  0'0 2013-07-29 01:53:10.068503
1.118   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.081067  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:20:03.933842  0'0 2034-02-12 23:20:03.933842
0.119   0   0   0   0   0   0   0   active+remapped 
2013-08-06 05:50:34.095446  0'0 21920'208   [8,15]  [8,15,5]
0'0 2034-02-12 23:18:07.310080  0'0 2034-02-12 23:18:07.310080
4.115   248 0   226 0   987364352   0   0   
active+remapped 2013-08-06 05:50:34.112139  21920'6840  21920'42982 
[8,15]  [8,15,5]0'0 2013-08-05 06:59:18.303823  0'0 2013-08-05 
06:59:18.303823
4.4a241 0   286 0   941573120   0   0   
active+degraded 2013-08-06 12:00:47.758742  21920'85238 21920'206648
[4,6]   [4,6]   0'0 2013-08-05 06:58:36.681726  0'0 2013-08-05 
06:58:36.681726
0.4e0   0   0   0   0   0   0   active+remapped 
2013-08-06 12:00:47.765391  0'0 21920'489   [4,6]   [4,6,1] 0'0 
2013-08-04 08:58:12.783265  0'0 2013-07-28 14:21:38.227970


Can anyone suggest a way to clear this up?

Thanks!
Jeff





--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-09 Thread Jeff Moskow
Thanks for the suggestion.  I had tried stopping each OSD for 30 
seconds, then restarting it, waiting 2 minutes and then doing the next 
one (all OSD's eventually restarted).  I tried this twice.


--

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com