Re: ceph status reporting non-existing osd

2012-07-19 Thread Andrey Korolyov
On Thu, Jul 19, 2012 at 1:28 AM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did 
  you take it out? It sounds like maybe you just marked it in the OUT 
  state (and turned it off quite quickly) without actually taking it out 
  of the cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means 
 all the data gets moved off it, but that doesn't remove it from all the 
 places where it's registered in the monitor and in the map, for a couple 
 reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be 
 able to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a 
 new location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and 
 all will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

 Hrm. How did you create your new crush map? All the normal avenues of
 removing an OSD from the map set a flag which the PGMap uses to delete
 its records (which would prevent it reappearing in the full list), and
 I can't see how setcrushmap would remove an OSD from the map (although
 there might be a code path I haven't found).

 Manually, by deleting osd4|5 entries and reweighing remaining nodes.

 So you extracted the CRUSH map, edited it, and injected it using ceph
 osd setrcrushmap?

Yep, exactly.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
 On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  ceph pg set_full_ratio 0.95
  ceph pg set_nearfull_ratio 0.94
   
   
  On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
   
   On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
 On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
 (mailto:s...@inktank.com) wrote:
  On Fri, 13 Jul 2012, Gregory Farnum wrote:
   On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
   (mailto:and...@xdel.ru) wrote:
Hi,
 
Recently I`ve reduced my test suite from 6 to 4 osds at ~60% 
usage on
six-node,
and I have removed a bunch of rbd objects during recovery to 
avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state 
on
non-existing osd:
 
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}
 
HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%
 
Needless to say, osd.4 remains only in ceph.conf, but not at 
crushmap.
Reducing has been done 'on-line', e.g. without restart entire 
cluster.







   Whoops! It looks like Sage has written some patches to fix this, 
   but
   for now you should be good if you just update your ratios to a 
   larger
   number, and then bring them back down again. :)
   
   
   
   
   
   
   
  Restarting ceph-mon should also do the trick.
   
  Thanks for the bug report!
  sage
  
  
  
  
  
  
  
 Should I restart mons simultaneously?
I don't think restarting will actually do the trick for you — you 
actually will need to set the ratios again.
 
 Restarting one by one has no
 effect, same as filling up data pool up to ~95 percent(btw, when I
 deleted this 50Gb file on cephfs, mds was stuck permanently and usage
 remained same until I dropped and recreated data pool - hope it`s one
 of known posix layer bugs). I also deleted entry from config, and then
 restarted mons, with no effect. Any suggestions?
 
 
 
 
 
I'm not sure what you're asking about here?
-Greg





   Oh, sorry, I have mislooked and thought that you suggested filling up
   osds. How do I can set full/nearfull ratios correctly?

   $ceph injectargs '--mon_osd_full_ratio 96'
   parsed options
   $ ceph injectargs '--mon_osd_near_full_ratio 94'
   parsed options

   ceph pg dump | grep 'full'
   full_ratio 0.95
   nearfull_ratio 0.85

   Setting parameters in the ceph.conf and then restarting mons does not
   affect ratios either.
   
  
  
  
 Thanks, it worked, but setting values back result to turn warning back.  
Hrm. That shouldn't be possible if the OSD has been removed. How did you take 
it out? It sounds like maybe you just marked it in the OUT state (and turned it 
off quite quickly) without actually taking it out of the cluster?  
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com wrote:
 On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
 On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  ceph pg set_full_ratio 0.95
  ceph pg set_nearfull_ratio 0.94
 
 
  On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
 
   On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
 On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
 (mailto:s...@inktank.com) wrote:
  On Fri, 13 Jul 2012, Gregory Farnum wrote:
   On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
   (mailto:and...@xdel.ru) wrote:
Hi,
   
Recently I`ve reduced my test suite from 6 to 4 osds at ~60% 
usage on
six-node,
and I have removed a bunch of rbd objects during recovery to 
avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state 
on
non-existing osd:
   
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}
   
HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%
   
Needless to say, osd.4 remains only in ceph.conf, but not at 
crushmap.
Reducing has been done 'on-line', e.g. without restart entire 
cluster.
  
  
  
  
  
  
  
   Whoops! It looks like Sage has written some patches to fix this, 
   but
   for now you should be good if you just update your ratios to a 
   larger
   number, and then bring them back down again. :)
 
 
 
 
 
 
 
  Restarting ceph-mon should also do the trick.
 
  Thanks for the bug report!
  sage







 Should I restart mons simultaneously?
I don't think restarting will actually do the trick for you — you 
actually will need to set the ratios again.
   
 Restarting one by one has no
 effect, same as filling up data pool up to ~95 percent(btw, when I
 deleted this 50Gb file on cephfs, mds was stuck permanently and usage
 remained same until I dropped and recreated data pool - hope it`s one
 of known posix layer bugs). I also deleted entry from config, and 
 then
 restarted mons, with no effect. Any suggestions?
   
   
   
   
   
I'm not sure what you're asking about here?
-Greg
  
  
  
  
  
   Oh, sorry, I have mislooked and thought that you suggested filling up
   osds. How do I can set full/nearfull ratios correctly?
  
   $ceph injectargs '--mon_osd_full_ratio 96'
   parsed options
   $ ceph injectargs '--mon_osd_near_full_ratio 94'
   parsed options
  
   ceph pg dump | grep 'full'
   full_ratio 0.95
   nearfull_ratio 0.85
  
   Setting parameters in the ceph.conf and then restarting mons does not
   affect ratios either.
 



 Thanks, it worked, but setting values back result to turn warning back.
 Hrm. That shouldn't be possible if the OSD has been removed. How did you take 
 it out? It sounds like maybe you just marked it in the OUT state (and turned 
 it off quite quickly) without actually taking it out of the cluster?
 -Greg


As I have did removal, it was definitely not like that - at first
place, I have marked osds(4 and 5 on same host) out, then rebuilt
crushmap and then kill osd processes. As I mentioned before, osd.4
doest not exist in crushmap and therefore it shouldn`t be reported at
all(theoretically).
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
   On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
ceph pg set_full_ratio 0.95
ceph pg set_nearfull_ratio 0.94
 
 
On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
 
 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov 
 and...@xdel.ru (mailto:and...@xdel.ru) wrote:
  Hi,
   
  Recently I`ve reduced my test suite from 6 to 4 osds at 
  ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery 
  to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull 
  state on
  non-existing osd:
   
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 
  181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
   
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
   
  Needless to say, osd.4 remains only in ceph.conf, but not 
  at crushmap.
  Reducing has been done 'on-line', e.g. without restart 
  entire cluster.
  
  
  
  
  
  
  
  
  
 Whoops! It looks like Sage has written some patches to fix 
 this, but
 for now you should be good if you just update your ratios to 
 a larger
 number, and then bring them back down again. :)
 
 
 
 
 
 
 
 
 
Restarting ceph-mon should also do the trick.
 
Thanks for the bug report!
sage









   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you 
  actually will need to set the ratios again.
   
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when I
   deleted this 50Gb file on cephfs, mds was stuck permanently and 
   usage
   remained same until I dropped and recreated data pool - hope it`s 
   one
   of known posix layer bugs). I also deleted entry from config, and 
   then
   restarted mons, with no effect. Any suggestions?
   
   
   
   
   
   
   
  I'm not sure what you're asking about here?
  -Greg
  
  
  
  
  
  
  
 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?
  
 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options
  
 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85
  
 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.
 





   Thanks, it worked, but setting values back result to turn warning back.
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state (and 
  turned it off quite quickly) without actually taking it out of the cluster?
  -Greg
  
  
  
 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

Okay, that's what happened — marking an OSD out in the CRUSH map means all the 
data gets moved off it, but that doesn't remove it from all the places where 
it's registered in the monitor and in the map, for a couple reasons:  
1) You might want to mark an OSD out before taking it down, to allow for more 
orderly data movement.
2) OSDs can get marked out automatically, but the system shouldn't be able to 
forget about them on its own.
3) You might want to remove an 

Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote:
   On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com 
   (mailto:g...@inktank.com) wrote:
ceph pg set_full_ratio 0.95
ceph pg set_nearfull_ratio 0.94
   
   
On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:
   
 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov 
 and...@xdel.ru (mailto:and...@xdel.ru) wrote:
  Hi,
 
  Recently I`ve reduced my test suite from 6 to 4 osds at 
  ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery 
  to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull 
  state on
  non-existing osd:
 
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 
  181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
 
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
 
  Needless to say, osd.4 remains only in ceph.conf, but not 
  at crushmap.
  Reducing has been done 'on-line', e.g. without restart 
  entire cluster.









 Whoops! It looks like Sage has written some patches to fix 
 this, but
 for now you should be good if you just update your ratios to 
 a larger
 number, and then bring them back down again. :)
   
   
   
   
   
   
   
   
   
Restarting ceph-mon should also do the trick.
   
Thanks for the bug report!
sage
  
  
  
  
  
  
  
  
  
   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you 
  actually will need to set the ratios again.
 
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when 
   I
   deleted this 50Gb file on cephfs, mds was stuck permanently and 
   usage
   remained same until I dropped and recreated data pool - hope 
   it`s one
   of known posix layer bugs). I also deleted entry from config, 
   and then
   restarted mons, with no effect. Any suggestions?
 
 
 
 
 
 
 
  I'm not sure what you're asking about here?
  -Greg







 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?

 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options

 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85

 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.
   
  
  
  
  
  
   Thanks, it worked, but setting values back result to turn warning back.
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state (and 
  turned it off quite quickly) without actually taking it out of the cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for more 
 orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able to 
 forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the 

Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state 
  (and turned it off quite quickly) without actually taking it out of the 
  cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and all 
 will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

Hrm. How did you create your new crush map? All the normal avenues of
removing an OSD from the map set a flag which the PGMap uses to delete
its records (which would prevent it reappearing in the full list), and
I can't see how setcrushmap would remove an OSD from the map (although
there might be a code path I haven't found).
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov
On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did you 
  take it out? It sounds like maybe you just marked it in the OUT state 
  (and turned it off quite quickly) without actually taking it out of the 
  cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and 
 all will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

 Hrm. How did you create your new crush map? All the normal avenues of
 removing an OSD from the map set a flag which the PGMap uses to delete
 its records (which would prevent it reappearing in the full list), and
 I can't see how setcrushmap would remove an OSD from the map (although
 there might be a code path I haven't found).

Manually, by deleting osd4|5 entries and reweighing remaining nodes.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-18 Thread Gregory Farnum
On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
 On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
 On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote:
 On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote:
 On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  Hrm. That shouldn't be possible if the OSD has been removed. How did 
  you take it out? It sounds like maybe you just marked it in the OUT 
  state (and turned it off quite quickly) without actually taking it out 
  of the cluster?
  -Greg



 As I have did removal, it was definitely not like that - at first
 place, I have marked osds(4 and 5 on same host) out, then rebuilt
 crushmap and then kill osd processes. As I mentioned before, osd.4
 doest not exist in crushmap and therefore it shouldn`t be reported at
 all(theoretically).

 Okay, that's what happened — marking an OSD out in the CRUSH map means all 
 the data gets moved off it, but that doesn't remove it from all the places 
 where it's registered in the monitor and in the map, for a couple reasons:
 1) You might want to mark an OSD out before taking it down, to allow for 
 more orderly data movement.
 2) OSDs can get marked out automatically, but the system shouldn't be able 
 to forget about them on its own.
 3) You might want to remove an OSD from the CRUSH map in the process of 
 placing it somewhere else (perhaps you moved the physical machine to a new 
 location).
 etc.

 You want to run ceph osd rm 4 5 and that should unregister both of them 
 from everything[1]. :)
 -Greg
 [1]: Except for the full lists, which have a bug in the version of code 
 you're running — remove the OSDs, then adjust the full ratios again, and 
 all will be well.


 $ ceph osd rm 4
 osd.4 does not exist
 $ ceph -s
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 58, quorum 0,1,2 0,1,2
osdmap e2198: 4 osds: 4 up, 4 in
 pgmap v586056: 464 pgs: 464 active+clean; 66645 MB data, 231 GB
 used, 95877 MB / 324 GB avail
mdsmap e207: 1/1/1 up {0=a=up:active}

 $ ceph health detail
 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 $ ceph osd dump
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2183 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6800/4030
 192.168.10.128:6801/4030 192.168.10.128:6802/4030 exists,up
 68b3deec-e80a-48b7-9c29-1b98f5de4f62
 osd.1 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6800/2980
 192.168.10.129:6801/2980 192.168.10.129:6802/2980 exists,up
 b2a26fe9-aaa8-445f-be1f-fa7d2a283b57
 osd.2 up   in  weight 1 up_from 2181 up_thru 2187 down_at 2172
 last_clean_interval [2136,2171) 192.168.10.128:6803/4128
 192.168.10.128:6804/4128 192.168.10.128:6805/4128 exists,up
 378d367a-f7fb-4892-9ec9-db8ffdd2eb20
 osd.3 up   in  weight 1 up_from 2136 up_thru 2186 down_at 2135
 last_clean_interval [2115,2134) 192.168.10.129:6803/3069
 192.168.10.129:6804/3069 192.168.10.129:6805/3069 exists,up
 faf8eda8-55fc-4a0e-899f-47dbd32b81b8
 

 Hrm. How did you create your new crush map? All the normal avenues of
 removing an OSD from the map set a flag which the PGMap uses to delete
 its records (which would prevent it reappearing in the full list), and
 I can't see how setcrushmap would remove an OSD from the map (although
 there might be a code path I haven't found).

 Manually, by deleting osd4|5 entries and reweighing remaining nodes.

So you extracted the CRUSH map, edited it, and injected it using ceph
osd setrcrushmap?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-16 Thread Gregory Farnum
On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
 On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
 (mailto:s...@inktank.com) wrote:
  On Fri, 13 Jul 2012, Gregory Farnum wrote:
   On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
   (mailto:and...@xdel.ru) wrote:
Hi,
 
Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on
six-node,
and I have removed a bunch of rbd objects during recovery to avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state on
non-existing osd:
 
health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}
 
HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%
 
Needless to say, osd.4 remains only in ceph.conf, but not at crushmap.
Reducing has been done 'on-line', e.g. without restart entire cluster.



   Whoops! It looks like Sage has written some patches to fix this, but
   for now you should be good if you just update your ratios to a larger
   number, and then bring them back down again. :)
   
   
   
  Restarting ceph-mon should also do the trick.
   
  Thanks for the bug report!
  sage
  
  
  
 Should I restart mons simultaneously?
I don't think restarting will actually do the trick for you — you actually will 
need to set the ratios again.
  
 Restarting one by one has no
 effect, same as filling up data pool up to ~95 percent(btw, when I
 deleted this 50Gb file on cephfs, mds was stuck permanently and usage
 remained same until I dropped and recreated data pool - hope it`s one
 of known posix layer bugs). I also deleted entry from config, and then
 restarted mons, with no effect. Any suggestions?

I'm not sure what you're asking about here?  
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-16 Thread Gregory Farnum
ceph pg set_full_ratio 0.95  
ceph pg set_nearfull_ratio 0.94


On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:

 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
 (mailto:and...@xdel.ru) wrote:
  Hi,
   
  Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage 
  on
  six-node,
  and I have removed a bunch of rbd objects during recovery to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull state on
  non-existing osd:
   
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
   
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
   
  Needless to say, osd.4 remains only in ceph.conf, but not at 
  crushmap.
  Reducing has been done 'on-line', e.g. without restart entire 
  cluster.
  
  
  
  
  
 Whoops! It looks like Sage has written some patches to fix this, but
 for now you should be good if you just update your ratios to a larger
 number, and then bring them back down again. :)
 
 
 
 
 
Restarting ceph-mon should also do the trick.
 
Thanks for the bug report!
sage





   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you actually 
  will need to set the ratios again.
   
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when I
   deleted this 50Gb file on cephfs, mds was stuck permanently and usage
   remained same until I dropped and recreated data pool - hope it`s one
   of known posix layer bugs). I also deleted entry from config, and then
   restarted mons, with no effect. Any suggestions?
   
   
   
  I'm not sure what you're asking about here?
  -Greg
  
  
  
 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?
  
 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options
  
 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85
  
 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-16 Thread Andrey Korolyov
On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com wrote:
 ceph pg set_full_ratio 0.95
 ceph pg set_nearfull_ratio 0.94


 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote:

 On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com 
 (mailto:g...@inktank.com) wrote:
  On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote:
   On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com 
   (mailto:s...@inktank.com) wrote:
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru 
 (mailto:and...@xdel.ru) wrote:
  Hi,
 
  Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage 
  on
  six-node,
  and I have removed a bunch of rbd objects during recovery to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull state on
  non-existing osd:
 
  health HEALTH_WARN 1 near full osd(s)
  monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
  osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
  used, 143 GB / 324 GB avail
  mdsmap e181: 1/1/1 up {0=a=up:active}
 
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
 
  Needless to say, osd.4 remains only in ceph.conf, but not at 
  crushmap.
  Reducing has been done 'on-line', e.g. without restart entire 
  cluster.





 Whoops! It looks like Sage has written some patches to fix this, but
 for now you should be good if you just update your ratios to a larger
 number, and then bring them back down again. :)
   
   
   
   
   
Restarting ceph-mon should also do the trick.
   
Thanks for the bug report!
sage
  
  
  
  
  
   Should I restart mons simultaneously?
  I don't think restarting will actually do the trick for you — you actually 
  will need to set the ratios again.
 
   Restarting one by one has no
   effect, same as filling up data pool up to ~95 percent(btw, when I
   deleted this 50Gb file on cephfs, mds was stuck permanently and usage
   remained same until I dropped and recreated data pool - hope it`s one
   of known posix layer bugs). I also deleted entry from config, and then
   restarted mons, with no effect. Any suggestions?
 
 
 
  I'm not sure what you're asking about here?
  -Greg



 Oh, sorry, I have mislooked and thought that you suggested filling up
 osds. How do I can set full/nearfull ratios correctly?

 $ceph injectargs '--mon_osd_full_ratio 96'
 parsed options
 $ ceph injectargs '--mon_osd_near_full_ratio 94'
 parsed options

 ceph pg dump | grep 'full'
 full_ratio 0.95
 nearfull_ratio 0.85

 Setting parameters in the ceph.conf and then restarting mons does not
 affect ratios either.




Thanks, it worked, but setting values back result to turn warning back.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-14 Thread Andrey Korolyov
On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com wrote:
 On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru wrote:
  Hi,
 
  Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull state on
  non-existing osd:
 
 health HEALTH_WARN 1 near full osd(s)
 monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
 osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
  used, 143 GB / 324 GB avail
 mdsmap e181: 1/1/1 up {0=a=up:active}
 
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
 
  Needless to say, osd.4 remains only in ceph.conf, but not at crushmap.
  Reducing has been done 'on-line', e.g. without restart entire cluster.

 Whoops! It looks like Sage has written some patches to fix this, but
 for now you should be good if you just update your ratios to a larger
 number, and then bring them back down again. :)

 Restarting ceph-mon should also do the trick.

 Thanks for the bug report!
 sage

Should I restart mons simultaneously? Restarting one by one has no
effect, same as filling up data pool up to ~95 percent(btw, when I
deleted this 50Gb file on cephfs, mds was stuck permanently and usage
remained same until I dropped and recreated data pool - hope it`s one
of known posix layer bugs). I also deleted entry from config, and then
restarted mons, with no effect. Any suggestions?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph status reporting non-existing osd

2012-07-13 Thread Andrey Korolyov
Hi,

Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on
six-node,
and I have removed a bunch of rbd objects during recovery to avoid
overfill.
Right now I`m constantly receiving a warn about nearfull state on
non-existing osd:

   health HEALTH_WARN 1 near full osd(s)
   monmap e3: 3 mons at
{0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
election epoch 240, quorum 0,1,2 0,1,2
   osdmap e2098: 4 osds: 4 up, 4 in
pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
used, 143 GB / 324 GB avail
   mdsmap e181: 1/1/1 up {0=a=up:active}

HEALTH_WARN 1 near full osd(s)
osd.4 is near full at 89%

Needless to say, osd.4 remains only in ceph.conf, but not at crushmap.
Reducing has been done 'on-line', e.g. without restart entire cluster.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-13 Thread Gregory Farnum
On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru wrote:
 Hi,

 Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on
 six-node,
 and I have removed a bunch of rbd objects during recovery to avoid
 overfill.
 Right now I`m constantly receiving a warn about nearfull state on
 non-existing osd:

health HEALTH_WARN 1 near full osd(s)
monmap e3: 3 mons at
 {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
 election epoch 240, quorum 0,1,2 0,1,2
osdmap e2098: 4 osds: 4 up, 4 in
 pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
 used, 143 GB / 324 GB avail
mdsmap e181: 1/1/1 up {0=a=up:active}

 HEALTH_WARN 1 near full osd(s)
 osd.4 is near full at 89%

 Needless to say, osd.4 remains only in ceph.conf, but not at crushmap.
 Reducing has been done 'on-line', e.g. without restart entire cluster.

Whoops! It looks like Sage has written some patches to fix this, but
for now you should be good if you just update your ratios to a larger
number, and then bring them back down again. :)
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ceph status reporting non-existing osd

2012-07-13 Thread Sage Weil
On Fri, 13 Jul 2012, Gregory Farnum wrote:
 On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru wrote:
  Hi,
 
  Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on
  six-node,
  and I have removed a bunch of rbd objects during recovery to avoid
  overfill.
  Right now I`m constantly receiving a warn about nearfull state on
  non-existing osd:
 
 health HEALTH_WARN 1 near full osd(s)
 monmap e3: 3 mons at
  {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0},
  election epoch 240, quorum 0,1,2 0,1,2
 osdmap e2098: 4 osds: 4 up, 4 in
  pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB
  used, 143 GB / 324 GB avail
 mdsmap e181: 1/1/1 up {0=a=up:active}
 
  HEALTH_WARN 1 near full osd(s)
  osd.4 is near full at 89%
 
  Needless to say, osd.4 remains only in ceph.conf, but not at crushmap.
  Reducing has been done 'on-line', e.g. without restart entire cluster.
 
 Whoops! It looks like Sage has written some patches to fix this, but
 for now you should be good if you just update your ratios to a larger
 number, and then bring them back down again. :)

Restarting ceph-mon should also do the trick.

Thanks for the bug report!
sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html