Hi all, I've been working on the Ceph charm with the intention of making it much more powerful when it comes to the selection of OSD devices. I wanted to knock a few ideas around to see what might be possible.
The main problem I'm trying to address is that with the existing implementation, when a new SAS controller is added, or drive caddies get swapped around, drive letters (/dev/sd[a-z]) get swapped around. As the current charm just asks for a list of devices, and that list of devices is global across the entire cluster, it pretty-much requires all machines to be identical, and unchanging. I also looked into used /dev/disk/by-id, but found this to be too inflexible. Below I've pasted a patch I wrote as a stop-gap for myself. This patch allows you to list model numbers for your drives instead of /dev/XXXX devices. It then dynamically generates the list of /dev/ devices on each host. The patch is pretty unsophisticated, but it solves my immediate problem. However, I think we can do better than this. I've been thinking that xpath strings might be a better way to go. I played around with this idea a little. This will give some idea how it could work: ========================================== root@ceph-store1:~# lshw -xml -class disk > /tmp/disk.xml root@ceph-store1:~# echo 'cat //node[contains(product,"MG03SCA400")]/logicalname/text()'|xmllint --shell /tmp/disk.xml|grep '^/dev/' /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl ========================================== So, that takes care of selecting by model number. How about selecting drives that are larger than 3TB? ========================================== root@ceph-store1:~# echo 'cat //node[size>3000000000000]/logicalname/text()'|xmllint --shell /tmp/disk.xml|grep '^/dev/' /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl ========================================== Just to give some idea of the power of this, take a look at the info lshw compiles: <node id="disk:3" claimed="true" class="disk" handle="GUID:aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa"> <description>SCSI Disk</description> <product>MG03SCA400</product> <vendor>TOSHIBA</vendor> <physid>0.3.0</physid> <businfo>scsi@1:0.3.0</businfo> <logicalname>/dev/sdf</logicalname> <dev>8:80</dev> <version>DG02</version> <serial>X470A0XXXXXX</serial> <size units="bytes">4000787030016</size> <capacity units="bytes">5334969415680</capacity> <configuration> <setting id="ansiversion" value="6" /> <setting id="guid" value="aaaaaaaa-a5c7-4657-924d-8ed94e1b1aaa" /> <setting id="sectorsize" value="512" /> </configuration> <capabilities> <capability id="7200rpm" >7200 rotations per minute</capability> <capability id="gpt-1.00" >GUID Partition Table version 1.00</capability> <capability id="partitioned" >Partitioned disk</capability> <capability id="partitioned:gpt" >GUID partition table</capability> </capabilities> </node> So, you could be selecting your drives by vendor, size, model, sector size, or any combination of these and other attributes. The only reason I didn't go any further with this idea yet is that "lshw -C disk" is incredibly slow. I tried messing around with disabling tests, but it still crawls along. I figure that this wouldn't be that big a deal if you could cache the resulting xml file, but that's not fully satisfactory either. What if I want to hot-plug a new hard-drive into the system? lshw would need to be run again. I though that maybe udev could be used for doing this, but I certainly don't want udev running lshw once per drive at boot time as the drives are detected. I'm really wondering if anyone else has any advice on either speeding up lshw, or if there's any other simple way of pulling this kind of functionality off. Maybe I'm worrying too much about this. As long as the charm only fires this hook rarely, and caches the data for the duration of the hook run, maybe I don't need to worry? John Patch to match against model number (NOT REGRESSION TESTED): === modified file 'config.yaml' --- config.yaml 2014-10-06 22:07:41 +0000 +++ config.yaml 2014-11-29 15:42:41 +0000 @@ -42,16 +42,35 @@ These devices are the range of devices that will be checked for and used across all service units. . + This can be a list of devices, or a list of model numbers which will + be used to automatically compile a list of matching devices. + . For ceph >= 0.56.6 these can also be directories instead of devices - the charm assumes anything not starting with /dev is a directory instead. + Any device not starting with a / is assumed to be a model number osd-journal: type: string default: === modified file 'hooks/charmhelpers/contrib/storage/linux/utils.py' --- hooks/charmhelpers/contrib/storage/linux/utils.py 2014-09-22 08:51:15 +0000 +++ hooks/charmhelpers/contrib/storage/linux/utils.py 2014-11-29 15:30:25 +0000 @@ -1,5 +1,6 @@ import os import re +import subprocess from stat import S_ISBLK from subprocess import ( @@ -51,3 +52,7 @@ if is_partition: return bool(re.search(device + r"\b", out)) return bool(re.search(device + r"[0-9]+\b", out)) + +def devices_by_model(model): + proc = subprocess.Popen(['lsblk', '-nio', 'KNAME,MODEL'],stdout=subprocess.PIPE) + return [ '/dev/' + dev.split()[0] for dev in [line.strip() for line in proc.stdout] if re.search(model+'$',dev) ] === modified file 'hooks/hooks.py' --- hooks/hooks.py 2014-09-30 03:06:10 +0000 +++ hooks/hooks.py 2014-11-29 15:22:48 +0000 @@ -44,6 +44,9 @@ get_ipv6_addr, format_ipv6_addr ) +from charmhelpers.contrib.storage.linux.utils import ( + devices_by_model +) from utils import ( render_template, @@ -166,14 +169,18 @@ else: return False - def get_devices(): if config('osd-devices'): - return config('osd-devices').split(' ') + results = [] + for dev in config('osd-devices').split(' '): + if dev.startswith('/'): + results.append(dev) + else: + results += devices_by_model(dev) + return results else: return [] - @hooks.hook('mon-relation-joined') def mon_relation_joined(): for relid in relation_ids('mon'): -- ----------------------------- John McEleney Netservers Ltd. 21 Signet Court Cambridge CB5 8LA http://www.netservers.co.uk ----------------------------- Tel. 01223 446000 Fax. 0870 4861970 ----------------------------- Registered in England Number: 04028770 ----------------------------- -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju