On Wed, 11 Sep 2013, Chris Dunlop wrote:
On Fri, Sep 06, 2013 at 08:21:07AM -0700, Sage Weil wrote:
On Fri, 6 Sep 2013, Chris Dunlop wrote:
On Thu, Sep 05, 2013 at 07:55:52PM -0700, Sage Weil wrote:
Also, you should upgrade to dumpling. :)
I've been considering it. It was initially a little scary with
the various issues that were cropping up but that all seems to
have quietened down.
Of course I'd like my cluster to be clean before attempting an upgrade!
Definitely. Let us know how it goes! :)
Upgraded, directly from bobtail to dumpling.
Well, that was a mite more traumatic than I expected. I had two
issues, both my fault...
Firstly, I didn't realise I should have restarted the osds one
at a time rather than doing 'service ceph restart' on each host
quickly in succession. Restarting them all at once meant
everything was offline whilst PGs are upgrading.
Secondly, whilst I saw the 'osd crush update on start' issue in
the release notes, and checked that my crush map hostnames match
the actual hostnames, I have two separate pools (for fast SAS vs
bulk SATA disks) and I stupidly only noticed the one which
matched, but not the other which didn't match. So on restart all
the osds moved into the one pool, and started rebalancing.
The two issues at the same time produced quite the adrenaline
rush! :-)
I can imagine!
My current crush configuration is below (host b2 is recently
added and I haven't added it into the pools yet). Is there a
better/recommended way of using the crush map to support
separate pools to avoid setting 'osd crush update on start =
false'? It doesn't seem that I can use the same 'host' names
under the separate 'sas' and 'default' roots?
For now we don't have a better solution than setting 'osd crush update on
start = false'. Sorry! I'm guessing that it is pretty uncommong for
disks to switch hosts, at least. :/
We could come up with a 'standard' way of structuring these sorts of maps
with prefixes or suffixes on the bucket names; I'm open to suggestions.
However, I'm also wondering if we should take the next step at the same
time and embed another dimension in the CRUSH tree so that CRUSH itself
understands that it is host=b4 (say) but it is only looking at the sas or
ssd items. This would (help) allow rules along the lines of pick 3
hosts; choose the ssd from the first and sas disks from the other two.
I'm not convinced that is an especially good idea for most users, but it's
probably worth considering.
sage
Cheers,
Chris
--
# ceph osd tree
# idweight type name up/down reweight
-8 2 root sas
-7 2 rack sas-rack-1
-5 1 host b4-sas
4 0.5 osd.4 up 1
5 0.5 osd.5 up 1
-6 1 host b5-sas
2 0.5 osd.2 up 1
3 0.5 osd.3 up 1
-1 12.66 root default
-3 8 rack unknownrack
-2 4 host b4
0 2 osd.0 up 1
7 2 osd.7 up 1
-4 4 host b5
1 2 osd.1 up 1
6 2 osd.6 up 1
-9 4.66host b2
10 1.82osd.10 up 1
11 1.82osd.11 up 1
8 0.51osd.8 up 1
9 0.51osd.9 up 1
--
# begin crush map
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root
# buckets
host b4 {
id -2 # do not change unnecessarily
# weight 4.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 2.000
item osd.7 weight 2.000
}
host b5 {
id -4 # do not change unnecessarily
# weight 4.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 2.000
item osd.6 weight 2.000
}
rack unknownrack {
id -3 # do not change unnecessarily
# weight 8.000
alg straw
hash 0 # rjenkins1
item b4 weight 4.000
item b5 weight 4.000
}
host b2 {
id -9 # do not change unnecessarily
# weight 4.660
alg straw
hash 0 # rjenkins1
item osd.10 weight 1.820
item