Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Cool! -Sam On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote: Sam, Thanks that did it :-) health HEALTH_OK monmap e17: 5 mons at {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0}, election epoch 9794, quorum 0,1,2,3,4 a,b,c,d,e osdmap e23445: 14 osds: 13 up, 13 in pgmap v13552855: 2102 pgs: 2102 active+clean; 531 GB data, 1564 GB used, 9350 GB / 10914 GB avail; 13104KB/s rd, 4007KB/s wr, 560op/s mdsmap e3: 0/0/1 up -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- # idweight type name up/down reweight -1 14.61 root default -3 14.61 rack unknownrack -2 2.783 host ceph1 0 0.919 osd.0 up 1 1 0.932 osd.1 up 1 2 0.932 osd.2 up 0 -5 2.783 host ceph2 3 0.919 osd.3 down0 4 0.932 osd.4 up 1 5 0.932 osd.5 up 1 -4 3.481 host ceph3 10 0.699 osd.10 up 1 6 0.685 osd.6 up 1 7 0.699 osd.7 up 1 8 0.699 osd.8 up 1 9 0.699 osd.9 up 1 -6 2.783 host ceph4 14 0.919 osd.14 down0 15 0.932 osd.15 up 1 16 0.932 osd.16 down0 -7 2.782 host ceph5 11 0.92osd.11 up 0 12 0.931 osd.12 up 1 13 0.931 osd.13 up 1 osdmap Description: Binary data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Are you using any kernel clients? Will osds 3,14,16 be coming back? -Sam On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote: Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Sam, 3, 14 and 16 have been down for a while and I'll eventually replace those drives (I could do it now) but didn't want to introduce more variables. We are using RBD with Proxmox, so I think the answer about kernel clients is yes Jeff On Mon, Aug 12, 2013 at 02:41:11PM -0700, Samuel Just wrote: Are you using any kernel clients? Will osds 3,14,16 be coming back? -Sam On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote: Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] pgs stuck unclean -- how to fix? (fwd)
Hi, I have a 5 node ceph cluster that is running well (no problems using any of the rbd images and that's really all we use). I have replication set to 3 on all three pools (data, metadata and rbd). ceph -s reports: health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; recovery 5746/384795 degraded (1.493%) I have tried everything I could think of to clear/fix those errors and they persist. Most of them appear to be a problem with not having 3 copies 0.2a0 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:40:07.874427 0'0 21920'388 [4,7] [4,7,8] 0'0 2013-08-04 08:59:34.035198 0'0 2013-07-29 01:49:40.018625 4.1d9 260 0 238 0 1021055488 0 0 active+remapped 2013-08-06 05:56:20.447612 21920'12710 21920'53408 [6,13] [6,13,4]0'0 2013-08-05 06:59:44.717555 0'0 2013-08-05 06:59:44.717555 1.1dc 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:55:44.687830 0'0 21920'3003 [6,13] [6,13,4] 0'0 2013-08-04 10:56:51.226012 0'0 2013-07-28 23:47:13.404512 0.1dd 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:55:44.687525 0'0 21920'3003 [6,13] [6,13,4] 0'0 2013-08-04 10:56:45.258459 0'0 2013-08-01 05:58:17.141625 1.29f 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:40:07.882865 0'0 21920'388 [4,7] [4,7,8] 0'0 2013-08-04 09:01:40.075441 0'0 2013-07-29 01:53:10.068503 1.118 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:50:34.081067 0'0 21920'208 [8,15] [8,15,5] 0'0 2034-02-12 23:20:03.933842 0'0 2034-02-12 23:20:03.933842 0.119 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:50:34.095446 0'0 21920'208 [8,15] [8,15,5] 0'0 2034-02-12 23:18:07.310080 0'0 2034-02-12 23:18:07.310080 4.115 248 0 226 0 987364352 0 0 active+remapped 2013-08-06 05:50:34.112139 21920'6840 21920'42982 [8,15] [8,15,5]0'0 2013-08-05 06:59:18.303823 0'0 2013-08-05 06:59:18.303823 4.4a241 0 286 0 941573120 0 0 active+degraded 2013-08-06 12:00:47.758742 21920'85238 21920'206648 [4,6] [4,6] 0'0 2013-08-05 06:58:36.681726 0'0 2013-08-05 06:58:36.681726 0.4e0 0 0 0 0 0 0 active+remapped 2013-08-06 12:00:47.765391 0'0 21920'489 [4,6] [4,6,1] 0'0 2013-08-04 08:58:12.783265 0'0 2013-07-28 14:21:38.227970 Can anyone suggest a way to clear this up? Thanks! Jeff -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
On 08/09/2013 10:58 AM, Jeff Moskow wrote: Hi, I have a 5 node ceph cluster that is running well (no problems using any of the rbd images and that's really all we use). I have replication set to 3 on all three pools (data, metadata and rbd). ceph -s reports: health HEALTH_WARN 3 pgs degraded; 114 pgs stuck unclean; recovery 5746/384795 degraded (1.493%) I have tried everything I could think of to clear/fix those errors and they persist. Did you restart the primary OSD for that PGs? Wido Most of them appear to be a problem with not having 3 copies 0.2a0 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:40:07.874427 0'0 21920'388 [4,7] [4,7,8] 0'0 2013-08-04 08:59:34.035198 0'0 2013-07-29 01:49:40.018625 4.1d9 260 0 238 0 1021055488 0 0 active+remapped 2013-08-06 05:56:20.447612 21920'12710 21920'53408 [6,13] [6,13,4]0'0 2013-08-05 06:59:44.717555 0'0 2013-08-05 06:59:44.717555 1.1dc 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:55:44.687830 0'0 21920'3003 [6,13] [6,13,4] 0'0 2013-08-04 10:56:51.226012 0'0 2013-07-28 23:47:13.404512 0.1dd 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:55:44.687525 0'0 21920'3003 [6,13] [6,13,4] 0'0 2013-08-04 10:56:45.258459 0'0 2013-08-01 05:58:17.141625 1.29f 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:40:07.882865 0'0 21920'388 [4,7] [4,7,8] 0'0 2013-08-04 09:01:40.075441 0'0 2013-07-29 01:53:10.068503 1.118 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:50:34.081067 0'0 21920'208 [8,15] [8,15,5] 0'0 2034-02-12 23:20:03.933842 0'0 2034-02-12 23:20:03.933842 0.119 0 0 0 0 0 0 0 active+remapped 2013-08-06 05:50:34.095446 0'0 21920'208 [8,15] [8,15,5] 0'0 2034-02-12 23:18:07.310080 0'0 2034-02-12 23:18:07.310080 4.115 248 0 226 0 987364352 0 0 active+remapped 2013-08-06 05:50:34.112139 21920'6840 21920'42982 [8,15] [8,15,5]0'0 2013-08-05 06:59:18.303823 0'0 2013-08-05 06:59:18.303823 4.4a241 0 286 0 941573120 0 0 active+degraded 2013-08-06 12:00:47.758742 21920'85238 21920'206648 [4,6] [4,6] 0'0 2013-08-05 06:58:36.681726 0'0 2013-08-05 06:58:36.681726 0.4e0 0 0 0 0 0 0 active+remapped 2013-08-06 12:00:47.765391 0'0 21920'489 [4,6] [4,6,1] 0'0 2013-08-04 08:58:12.783265 0'0 2013-07-28 14:21:38.227970 Can anyone suggest a way to clear this up? Thanks! Jeff -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)
Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2 minutes and then doing the next one (all OSD's eventually restarted). I tried this twice. -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com