from:"Dinu Vlad"

[ceph-users] Proper procedure for osd/host removal

2014-12-15 Thread Dinu Vlad

Hello,

I've been working to upgrade the hardware on a semi-production ceph cluster, 
following the instructions for OSD removal from 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual.
 Basically, I've added the new hosts to the cluster and now I'm removing the 
old ones from it. 

What I found curious is that after the sync triggered by the ceph osd out 
id finishes and I stop the osd process and remove it from the crush map, 
another session of synchronization is triggered - sometimes this one takes 
longer than the first. Also, removing an empty host bucket from the crush map 
triggred another resynchronization. 

I noticed that the overall weight of the host bucket does not change in the 
crush map as a result of one OSD being out, therefore what is happening is 
kinda' normal behavior - however it remains time-consuming. Is there something 
that can be done to avoid the double resync?

I'm running 0.72.2 on top of ubuntu 12.04 on the OSD hosts. 

Thanks,
Dinu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Proper procedure for osd/host removal

2014-12-15 Thread Dinu Vlad

Thanks - I was suspecting it. I was thinking at a course of action that would 
allow setting the weight of an entire host to zero in the crush map - thus 
forcing the migration of the data out of the OSDs of that host, followed by the 
crush and osd removal, one by one (hopefully this time without another backfill 
session).  

Problem is I don't have where to test how that would work and/or what would be 
the side-effects (if any). 


On 15 Dec 2014, at 21:07, Adeel Nazir ad...@ziptel.ca wrote:

 I'm going through something similar, and it seems like the double backfill 
 you're experiencing is about par for the course. According to the CERN 
 presentation (http://www.slideshare.net/Inktank_Ceph/scaling-ceph-at-cern 
 slide 19), doing a 'ceph osd crush rm osd ID' should save the double 
 backfill, but I haven't experienced that in my 0.80.5 cluster. Even after I 
 do a crush rm osd, and finally remove it via ceph rm osd.ID, it computes a 
 new map and does the backfill again. As far as I can tell, there's no way 
 around it without editing the map manually, making whatever changes you 
 require and then pushing the new map. I personally am not experienced enough 
 to feel comfortable making that kind of a change.
 
 
 Adeel
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Dinu Vlad
 Sent: Monday, December 15, 2014 11:35 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] Proper procedure for osd/host removal
 
 Hello,
 
 I've been working to upgrade the hardware on a semi-production ceph
 cluster, following the instructions for OSD removal from
 http://ceph.com/docs/master/rados/operations/add-or-rm-
 osds/#removing-osds-manual. Basically, I've added the new hosts to the
 cluster and now I'm removing the old ones from it.
 
 What I found curious is that after the sync triggered by the ceph osd out
 id finishes and I stop the osd process and remove it from the crush map,
 another session of synchronization is triggered - sometimes this one takes
 longer than the first. Also, removing an empty host bucket from the crush
 map triggred another resynchronization.
 
 I noticed that the overall weight of the host bucket does not change in the
 crush map as a result of one OSD being out, therefore what is happening is
 kinda' normal behavior - however it remains time-consuming. Is there
 something that can be done to avoid the double resync?
 
 I'm running 0.72.2 on top of ubuntu 12.04 on the OSD hosts.
 
 Thanks,
 Dinu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack Havana root fs resize don't work

2014-08-05 Thread Dinu Vlad

There’s a known issue with Havana’s rbd driver in nova and it has nothing to do 
with ceph. Unfortunately, it is only fixed in icehouse. See 
https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1219658 for more details. 

I can confirm that applying the patch manually works. 


On 05 Aug 2014, at 11:00, Hauke Bruno Wollentin 
hauke-bruno.wollen...@innovo-cloud.de wrote:

 Hi folks,
 
 we use Ceph Dumpling as storage backend for Openstack Havana. However our 
 instances are not able to resize its root filesystem.
 
 This issue just occurs for the virtual root disk. If we start instances with 
 an attached volume, the virtual volume disks size is correct.
 
 Our infrastructure:
 - 1 OpenStack Controller
 - 1 OpenStack Neutron Node
 - 1 OpenStack Cinder Node
 - 4 KVM Hypervisors
 - 4 Ceph-Storage Nodes including mons
 - 1 dedicated mon
 
 As OS we use Ubuntu 12.04.
 
 Our cinder.conf on Cinder Node:
 
 volume_driver = cinder.volume.driver.RBDDriver
 rbd_pool = volumes
 rbd_secret = SECRET
 rbd_user = cinder
 rbd_ceph_conf = /etc/ceph/ceph.conf
 rbd_max_clone_depth = 5
 glance_api_version = 2
 
 Our nova.conf on hypervisors:
 
 libvirt_images_type=rbd
 libvirt_images_rbd_pool=volumes
 libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf
 rbd_user=admin
 rbd_secret_uuid=SECRET
 libvirt_inject_password=false
 libvirt_inject_key=false
 libvirt_inject_partition=-2
 
 In our instances we see that the virtual disk isn't _updated_ in its size. It 
 still uses the size specified in the images.
 
 We use growrootfs in our images as described in the documentation + verified 
 its functionality (we switched temporarly to LVM as the storage backend, that 
 works).
 
 Our images are manually created regarding the documention (means only 1 
 partition, no swap, cloud-utils etc.).
 
 Does anyone has some hints how to solve this issue?
 
 Cheers,
 Hauke
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Move osd disks between hosts

2014-05-14 Thread Dinu Vlad


I'm running a ceph cluster with 3 mon and 4 osd nodes (32 disks total) and I've 
been looking at the possibility to migrate the data to 2 new nodes. The 
operation should happen by relocating the disks - I'm not getting any new 
hard-drives. The cluster is used as a backend for an openstack cloud, so 
downtime should be as short as possible - preferably not more than 24 h during 
the week-end.

I'd like a second opinion on the process - since I do not have the resources to 
test the move scenario. I'm running emperor (0.72.1) at the moment. All pools 
in the cluster have size 2. Each existing OSD nodes have each an SSD for 
journals; /dev/disk/by-id paths were used. 

Here's what I think would work:
1 - stop ceph on the existing OSD nodes (all of them) and shutdown the node 1  
2;
2 - take drives 1-16/ssds 1-2 out and put them in the new node #1; start it up 
with ceph's upstart script set on manual and check/correct journal paths 
3 - edit the CRUSH map on the monitors to reflect the new situation
4 - start ceph on the new node #1 and old nodes 3  4; wait for the rebuild to 
happen
5 - repeat steps 1-4 for the rest of the nodes/drives;

Any opinions? Or a better path to follow? 

Thanks!






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Move osd disks between hosts

2014-05-14 Thread Dinu Vlad

Hello Sage,

Yes, original deployment was done via ceph-deploy - and I am very happy to read 
this :)

Thank you!
Dinu


On May 14, 2014, at 4:17 PM, Sage Weil s...@inktank.com wrote:

 Hi Dinu,
 
 On Wed, 14 May 2014, Dinu Vlad wrote:
 
 I'm running a ceph cluster with 3 mon and 4 osd nodes (32 disks total) and 
 I've been looking at the possibility to migrate the data to 2 new nodes. 
 The operation should happen by relocating the disks - I'm not getting any 
 new hard-drives. The cluster is used as a backend for an openstack cloud, so 
 downtime should be as short as possible - preferably not more than 24 h 
 during the week-end.
 
 I'd like a second opinion on the process - since I do not have the resources 
 to test the move scenario. I'm running emperor (0.72.1) at the moment. All 
 pools in the cluster have size 2. Each existing OSD nodes have each an SSD 
 for journals; /dev/disk/by-id paths were used. 
 
 Here's what I think would work:
 1 - stop ceph on the existing OSD nodes (all of them) and shutdown the node 
 1  2;
 2 - take drives 1-16/ssds 1-2 out and put them in the new node #1; start it 
 up with ceph's upstart script set on manual and check/correct journal paths 
 3 - edit the CRUSH map on the monitors to reflect the new situation
 4 - start ceph on the new node #1 and old nodes 3  4; wait for the rebuild 
 to happen
 5 - repeat steps 1-4 for the rest of the nodes/drives;
 
 If you used ceph-deploy and/or ceph-disk to set up these OSDs (that is, if 
 they are stored on labeled GPT partitions such that upstart is 
 automagically starting up the ceph-osd daemons for you without you putting 
 anythign in /etc/fstab to manually mount the volumes) then all of this 
 should be plug and play for you--including step #3.  By default, the 
 startup process will 'fix' the CRUSH hierarchy position based on the 
 hostname and (if present) other positional data configured for 'crush 
 location' in ceph.conf.  The only real requirement is that both the osd 
 data and journal volumes get moved so that the daemon has everything it 
 needs to start up.
 
 sage
 
 
 
 Any opinions? Or a better path to follow? 
 
 Thanks!
 
 
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] rados federated gateway - selective replication

2014-03-26 Thread Dinu Vlad

I'm trying to figure out a way to configure selective replication of objects 
between 2 geographically-separated ceph clusters, via the radosgw-agent. 
Ideally that should happen at the bucket level - but as far as I can figure 
that seems impossible (running ceph emperor, 0.72.1). 

Is there any way to achieve this (with the current ceph stable release)? 

Thanks!
-- Dinu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Questions about federated rados gateway configuration

2013-12-04 Thread Dinu Vlad

Hi all,

I was going through the documentation 
(http://ceph.com/docs/master/radosgw/federated-config/), having in mind a 
(future) replicated swift object store between 2 geographically separated 
datacenters (and 2 different Ceph clusters) and a few things caught my 
attention. Considering I'm planning for 3 gateways in each datacenter:  

1. Concerning the list of pools that need to be pre-created: ALL pools for both 
zones have to exist on both clusters?

2. When using keystone integration, can I have all gateways (from both zones) 
authenticate using the same keystone instance? 

3. Concerning the Create a keyring section: is it necessary to have the same 
keyring file present on all nodes (monitors, osd, gateways) from both clusters?

4. What would be the process to add another gateway to a working federated 
environment? 

I'd appreciate any input on the matters above. 

Thanks,
-- Dinu

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-14 Thread Dinu Vlad

Thank you all for the info. Any chance this may make it into mainline? 

Thanks,
Dinu


On Nov 14, 2013, at 4:27 PM, Jens-Christian Fischer 
jens-christian.fisc...@switch.ch wrote:

 On Thu, Nov 14, 2013 at 9:12 PM, Jens-Christian Fischer 
 jens-christian.fisc...@switch.ch wrote:
 We have migration working partially - it works through Horizon (to a random 
 host) and sometimes through the CLI.
 
 random host? Do you mean cold-migration? Live-migration should be specified 
 destination host.
 
 I have just been digging through Horizon to find out what it does: it calls 
 migrate (reading some docs: ah this is indeed cold migration)
 
  
 We are using the nova fork by Josh Durgin 
 https://github.com/jdurgin/nova/commits/havana-ephemeral-rbd - are there 
 more patches that need to be integrated?
 
 I hope I can release or push commits to this branch contains live-migration, 
 incorrect filesystem size fix and ceph-snapshort support in a few days. 
 
 great looking very much forward to that!
 
 cheers
 jc
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

2013-11-12 Thread Dinu Vlad

Out of curiosity - can you live-migrate instances with this setup? 



On Nov 12, 2013, at 10:38 PM, Dmitry Borodaenko dborodae...@mirantis.com 
wrote:

 And to answer my own question, I was missing a meaningful error
 message: what the ObjectNotFound exception I got from librados didn't
 tell me was that I didn't have the images keyring file in /etc/ceph/
 on my compute node. After 'ceph auth get-or-create client.images 
 /etc/ceph/ceph.client.images.keyring' and reverting images caps back
 to original state, it all works!
 
 On Tue, Nov 12, 2013 at 12:19 PM, Dmitry Borodaenko
 dborodae...@mirantis.com wrote:
 I can get ephemeral storage for Nova to work with RBD backend, but I
 don't understand why it only works with the admin cephx user? With a
 different user starting a VM fails, even if I set its caps to 'allow
 *'.
 
 Here's what I have in nova.conf:
 libvirt_images_type=rbd
 libvirt_images_rbd_pool=images
 rbd_secret_uuid=fd9a11cc-6995-10d7-feb4-d338d73a4399
 rbd_user=images
 
 The secret UUID is defined following the same steps as for Cinder and Glance:
 http://ceph.com/docs/master/rbd/libvirt/
 
 BTW rbd_user option doesn't seem to be documented anywhere, is that a
 documentation bug?
 
 And here's what 'ceph auth list' tells me about my cephx users:
 
 client.admin
key: AQCoSX1SmIo0AxAAnz3NffHCMZxyvpz65vgRDg==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
 client.images
key: AQC1hYJS0LQhDhAAn51jxI2XhMaLDSmssKjK+g==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
 client.volumes
key: AQALSn1ScKruMhAAeSETeatPLxTOVdMIt10uRg==
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow
 rwx pool=volumes, allow rx pool=images
 
 Setting rbd_user to images or volumes doesn't work.
 
 What am I missing?
 
 Thanks,
 
 --
 Dmitry Borodaenko
 
 
 
 -- 
 Dmitry Borodaenko
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.72 with zfs

2013-11-07 Thread Dinu Vlad

Any chance this option will be included for future emperor binaries? I don't 
mind compiling software, but I would like to keep things upgradable via apt-get 
…

Thanks,
Dinu 


On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote:

 Hi Dinu,
 
 You currently need to compile yourself, and pass --with-zfs to 
 ./configure.
 
 Once it is built in, ceph-osd will detect whether the underlying fs is zfs 
 on its own.
 
 sage
 
 
 
 On Wed, 6 Nov 2013, Dinu Vlad wrote:
 
 Hello,
 
 I'm testing the 0.72 release and thought to give a spin to the zfs support. 
 
 While I managed to setup a cluster on top of a number of zfs datasets, the 
 ceph-osd logs show it's using the genericfilestorebackend: 
 
 2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is NOT supported
 2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
 2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
 syscall fully supported (by glibc and kernel)
 
 I noticed however that the ceph sources include some files related to zfs: 
 
 # find . | grep -i zfs
 ./src/os/ZFS.cc
 ./src/os/ZFS.h
 ./src/os/ZFSFileStoreBackend.cc
 ./src/os/ZFSFileStoreBackend.h 
 
 A coupel of questions: 
 
 - is 0.72-rc1 package currently in the raring repository compiled with zfs 
 support ? 
 - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? 
 
 Thanks,
 Dinu
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Dinu Vlad

I had great results from the older 530 series too.

In this case however, the SSDs were only used for journals and I don't know if
ceph-osd sends TRIM to the drive in the process of journaling over a block
device. They were also under-subscribed, with just 3 x 10G partitions out of
240 GB raw capacity. I did a manual trim, but it hasn't changed anything.

I'm still having fun with the configuration so I'll be able to use Mike
Dawson's suggested tools to check for latencies.

On Nov 6, 2013, at 11:35 PM, ja...@peacon.co.uk wrote:

On 2013-11-06 20:25, Mike Dawson wrote:

We just fixed a performance issue on our cluster related to spikes of high
latency on some of our SSDs used for osd journals. In our case, the slow
SSDs showed spikes of 100x higher latency than expected.

Many SSDs show this behaviour when 100% provisioned and/or never TRIM'd,
since the pool of ready erased cells is quickly depleted under steady write
workload, so it has to wait for cells to charge to accommodate the write.

The Intel 3700 SSDs look to have some of the best consistency ratings of any
of the more reasonably priced drives at the moment, and good IOPS too:

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

Obviously the quoted IOPS numbers are dependent on quite a deep queue mind.

There is a big range of performance in the market currently; some Enterprise
SSDs are quoted at just 4,000 IOPS yet cost as many pounds!

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph 0.72 with zfs

2013-11-07 Thread Dinu Vlad

Looking forward to it. Tests done so far show some interesting results - so I'm 
considering it for future production use.  

On Nov 7, 2013, at 1:01 PM, Sage Weil s...@newdream.net wrote:

 The challenge here is that libzfs is currently a build time dependency, which 
 means it needs to be included in the target distro already, or we need to 
 bundle it in the Ceph.com repos.
 
 I am currently looking at the possibility of making the OSD back end 
 dynamically linked at runtime, which would allow a separately packaged zfs 
 back end; that may (or may not!) help.
 
 sage
 
 
 
 Dinu Vlad dinuvla...@gmail.com wrote:
 Any chance this option will be included for future emperor binaries? I don't 
 mind compiling software, but I would like to keep things upgradable via 
 apt-get …
 
 Thanks,
 Dinu 
 
 
 On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote:
 
  Hi Dinu,
  
  You currently need to compile yourself, and pass --with-zfs to 
  ./configure.
  
  Once it is built in, ceph-osd will detect whether the underlying fs is zfs 
  on its own.
  
  sage
  
  
  
  On Wed, 6 Nov 2013, Dinu Vlad wrote:
  
  Hello,
  
  I'm testing the 0.72 release and thought to give a spin to the zfs support. 
  
  While I managed to setup a cluster
 on top of a number of zfs datasets, the ceph-osd logs show it's using the 
 genericfilestorebackend: 
  
  2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is NOT supported
  2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP 
 ioctl is disabled via 'filestore fiemap' config option
  2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
 syscall fully supported (by glibc and kernel)
  
  I noticed however that the ceph sources include some files related to zfs: 
  
  # find . | grep -i zfs
  ./src/os/ZFS.cc
  ./src/os/ZFS.h
  ./src/os/ZFSFileStoreBackend.cc
  ./src/os/ZFSFileStoreBackend.h 
  
  A coupel of questions: 
  
  - is 0.72-rc1 package
 currently in the raring repository compiled with zfs support ? 
  - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? 
  
  Thanks,
  Dinu
 
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Havana RBD - a few problems

2013-11-07 Thread Dinu Vlad

Under grizzly we disabled completely the image injection via 
libvirt_inject_partition = -2 in nova.conf. I'm not sure rbd images can even be 
mounted that way - but then again, I don't have experience with havana. We're 
using config disks (which break live migrations) and/or the metadata service 
(which does not) in combination with cloud-init, to bootstrap instances. 

On Nov 7, 2013, at 6:15 PM, Jens-Christian Fischer 
jens-christian.fisc...@switch.ch wrote:

 Hi all
 
 we have installed a Havana OpenStack cluster with RBD as the backing storage 
 for volumes, images and the ephemeral images. The code as delivered in 
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498
  fails, because the RBD.path it not set. I have patched this to read:
 
 @@ -419,10 +419,12 @@ class Rbd(Image):
  if path:
  try:
  self.rbd_name = path.split('/')[1]
 +self.path = path
  except IndexError:
  raise exception.InvalidDevicePath(path=path)
  else:
  self.rbd_name = '%s_%s' % (instance['name'], disk_name)
 +self.path = 'volumes/%s' % self.rbd_name
  self.snapshot_name = snapshot_name
  if not CONF.libvirt_images_rbd_pool:
  raise RuntimeError(_('You should specify'
 
 but am not sure this is correct. I have the following problems:
 
 1) can't inject data into image
 
 2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver 
 [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
 2096e25f5e814882b5907bc5db342308] [instance: 
 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into
  image fc8179d4-14f3-4f21-a76d-72b03b5c1862
 2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api 
 [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 
 2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image 
 (Error mounting volumes/ instance-
 0089_disk with libguestfs (volumes/instance-0089_disk: No such file or 
 directory))
 
 possibly the self.path = … is wrong - but what are the correct values?
 
 
 2) Creating a new instance from an ISO image fails completely - no bootable 
 disk found, says the KVM console. Related?
 
 3) When creating a new instance from an image (non ISO images work), the disk 
 is not resized to the size specified in the flavor (but left at the size of 
 the original image)
 
 I would be really grateful, if those people that have Grizzly/Havana running 
 with an RBD backend could pipe in here…
 
 thanks
 Jens-Christian
 
 
 -- 
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/socialmedia
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Dinu Vlad

I was under the same impression - using a small portion of the SSD via 
partitioning (in my case - 30 gigs out of 240) would have the same effect as 
activating the HPA explicitly. 

Am I wrong? 


On Nov 7, 2013, at 8:16 PM, ja...@peacon.co.uk wrote:

 On 2013-11-07 17:47, Gruher, Joseph R wrote:
 
 I wonder how effective trim would be on a Ceph journal area.
 If the journal empties and is then trimmed the next write cycle should
 be faster, but if the journal is active all the time the benefits
 would be lost almost immediately, as those cells are going to receive
 data again almost immediately and go back to an untrimmed state
 until the next trim occurs.
 
 If it's under-provisioned (so the device knows there are unused cells), the 
 device would simply write to an empty cell and flag the old cell for erasing, 
 so there should be no change.  Latency would rise when sustained write rate 
 exceeded the devices' ability to clear cells, so eventually the stock of 
 ready cells would be depleted.
 
 FWIW, I think there is considerable mileage in the larger-consumer grade 
 argument.  Assuming drives will be half the price in a years time, so 
 selecting devices that can last only a year is preferable to spending 3x the 
 price on one that can survive three.  That though opens the tin of worms that 
 is SMART reporting and moving journals at some future point mind.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-07 Thread Dinu Vlad

I have 2 SSDs (same model, smaller capacity) for / connected on the mainboard. 
Their sync write performance is also poor - less than 600 iops, 4k blocks. 

On Nov 7, 2013, at 9:44 PM, Kyle Bader kyle.ba...@gmail.com wrote:

 ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i.
 
 The problem might be SATA transport protocol overhead at the expander.
 Have you tried directly connecting the SSDs to SATA2/3 ports on the
 mainboard?
 
 -- 
 
 Kyle
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Dinu Vlad

I'm using the latest 3.8.0 branch from raring. Is there a more recent/better
kernel recommended?

Meanwhile, I think I might have identified the culprit - my SSD drives are
extremely slow on sync writes, doing 5-600 iops max with 4k blocksize. By
comparison, an Intel 530 in another server (also installed behind a SAS
expander is doing the same test with ~ 8k iops. I guess I'm good for replacing
them.

Removing the SSD drives from the setup and re-testing with ceph = 595 MB/s
throughput under the same conditions (only mechanical drives, journal on a
separate partition on each one, 8 rados bench processes, 16 threads each).

On Nov 5, 2013, at 4:38 PM, Mark Nelson mark.nel...@inktank.com wrote:

Ok, some more thoughts:

1) What kernel are you using?

2) Mixing SATA and SAS on an expander backplane can some times have bad
effects. We don't really know how bad this is and in what circumstances, but
the Nexenta folks have seen problems with ZFS on solaris and it's not
impossible linux may suffer too:

http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

3) If you are doing tests and look at disk throughput with something like
collectl -sD -oT do the writes look balanced across the spinning disks?
Do any devices have much really high service times or queue times?

4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {}
dump_historic_ops \; foo

and then grep for duration in foo. You'll get a list of the slowest
operations over the last 10 minutes from every osd on the node. Once you
identify a slow duration, you can go back and in an editor search for the
slow duration and look at where in the OSD it hung up. That might tell us
more about slow/latent operations.

5) Something interesting here is that I've heard from another party that in a
36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 6 SSDs on
a SAS9207-8i controller and were pushing significantly faster throughput than
you are seeing (even given the greater number of drives). So it's very
interesting to me that you are pushing so much less. The 36 drive supermicro
chassis I have with no expanders and 30 drives with 6 SSDs can push about
2100MB/s with a bunch of 9207-8i controllers and XFS (no replication).

Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:
Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle*
ceph settings I was able to get 440 MB/s from 8 rados bench instances, over
a single osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks
reaches 2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!

On Oct 31, 2013, at 6:35 PM, Dinu Vlad dinuvla...@gmail.com wrote:

I tested the osd performance from a single node. For this purpose I
deployed a new cluster (using ceph-deploy, as before) and on
fresh/repartitioned drives. I created a single pool, 1800 pgs. I ran the
rados bench both on the osd server and on a remote one. Cluster
configuration stayed default, with the same additions about xfs mount
mkfs.xfs as before.

With a single host, the pgs were stuck unclean (active only, not
active+clean):

# ceph -s
cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
health HEALTH_WARN 1800 pgs stuck unclean
monmap e1: 3 mons at
{cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB
/ 16759 GB avail
mdsmap e1: 0/0/1 up

Test results:
Local test, 1 process, 16 threads: 241.7 MB/s
Local test, 8 processes, 128 threads: 374.8 MB/s
Remote test, 1 process, 16 threads: 231.8 MB/s
Remote test, 8 processes, 128 threads: 366.1 MB/s

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu

On Oct 30, 2013, at 8:59 PM, Mark Nelson mark.nel...@inktank.com wrote:

On 10/30/2013 01:51 PM, Dinu Vlad wrote:
Mark,

The SSDs are
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
and the HDDs are
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.

The chasis is a SiliconMechanics C602 - but I don't have the exact
model. It's based on Supermicro, has 24 slots front and 2 in the back and
a SAS expander.

I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out
according to what the driver reports in dmesg). here are the results
(filtered):

Sequential:
Run status group 0 (all jobs):
WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s,
mint=60444msec, maxt=61463msec

Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave
153:189 MB/s

[ceph-users] ceph 0.72 with zfs

2013-11-06 Thread Dinu Vlad

Hello,

I'm testing the 0.72 release and thought to give a spin to the zfs support. 

While I managed to setup a cluster on top of a number of zfs datasets, the 
ceph-osd logs show it's using the genericfilestorebackend: 

2013-11-06 09:27:59.386392 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is NOT supported
2013-11-06 09:27:59.386409 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2013-11-06 09:27:59.391026 7fdfee0ab7c0  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)

I noticed however that the ceph sources include some files related to zfs: 

# find . | grep -i zfs
./src/os/ZFS.cc
./src/os/ZFS.h
./src/os/ZFSFileStoreBackend.cc
./src/os/ZFSFileStoreBackend.h 

A coupel of questions: 

- is 0.72-rc1 package currently in the raring repository compiled with zfs 
support ? 
- if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? 

Thanks,
Dinu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-06 Thread Dinu Vlad

ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i.

By fixed - you mean replaced the SSDs?

Thanks,
Dinu

On Nov 6, 2013, at 10:25 PM, Mike Dawson mike.daw...@cloudapt.com wrote:

We just fixed a performance issue on our cluster related to spikes of high
latency on some of our SSDs used for osd journals. In our case, the slow SSDs
showed spikes of 100x higher latency than expected.

What SSDs were you using that were so slow?

Cheers,
Mike

On 11/6/2013 12:39 PM, Dinu Vlad wrote:
I'm using the latest 3.8.0 branch from raring. Is there a more recent/better
kernel recommended?

On Nov 5, 2013, at 4:38 PM, Mark Nelson mark.nel...@inktank.com wrote:

Ok, some more thoughts:

1) What kernel are you using?

2) Mixing SATA and SAS on an expander backplane can some times have bad
effects. We don't really know how bad this is and in what circumstances,
but the Nexenta folks have seen problems with ZFS on solaris and it's not
impossible linux may suffer too:

http://gdamore.blogspot.com/2010/08/why-sas-sata-is-not-such-great-idea.html

4) Also, after the test is done, you can try:

find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {}
dump_historic_ops \; foo

5) Something interesting here is that I've heard from another party that in
a 36 drive Supermicro SC847E16 chassis they had 30 7.2K RPM disks and 6
SSDs on a SAS9207-8i controller and were pushing significantly faster
throughput than you are seeing (even given the greater number of drives).
So it's very interesting to me that you are pushing so much less. The 36
drive supermicro chassis I have with no expanders and 30 drives with 6 SSDs
can push about 2100MB/s with a bunch of 9207-8i controllers and XFS (no
replication).

Mark

On 11/05/2013 05:15 AM, Dinu Vlad wrote:
Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle*
ceph settings I was able to get 440 MB/s from 8 rados bench instances,
over a single osd node (pool pg_num = 1800, size = 1)

This still looks awfully slow to me - fio throughput across all disks
reaches 2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!

On Oct 31, 2013, at 6:35 PM, Dinu Vlad dinuvla...@gmail.com wrote:

With a single host, the pgs were stuck unclean (active only, not
active+clean):

Maybe it's just me, but it seems on the low side too.

Thanks,
Dinu

On Oct 30, 2013, at 8:59 PM, Mark Nelson mark.nel...@inktank.com wrote:

On 10/30/2013 01:51 PM, Dinu Vlad wrote:
Mark,

The chasis is a SiliconMechanics C602 - but I don't have

Re: [ceph-users] ceph cluster performance

2013-11-05 Thread Dinu Vlad

Ok, so after tweaking the deadline scheduler and the filestore_wbthrottle* ceph 
settings I was able to get 440 MB/s from 8 rados bench instances, over a single 
osd node (pool pg_num = 1800, size = 1) 

This still looks awfully slow to me - fio throughput across all disks reaches 
2.8 GB/s!!

I'd appreciate any suggestion, where to look for the issue. Thanks!


On Oct 31, 2013, at 6:35 PM, Dinu Vlad dinuvla...@gmail.com wrote:

 
 I tested the osd performance from a single node. For this purpose I deployed 
 a new cluster (using ceph-deploy, as before) and on fresh/repartitioned 
 drives. I created a single pool, 1800 pgs. I ran the rados bench both on the 
 osd server and on a remote one. Cluster configuration stayed default, with 
 the same additions about xfs mount  mkfs.xfs as before. 
 
 With a single host, the pgs were stuck unclean (active only, not 
 active+clean):
 
 # ceph -s
  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
   health HEALTH_WARN 1800 pgs stuck unclean
   monmap e1: 3 mons at 
 {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
   osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
 16759 GB avail
   mdsmap e1: 0/0/1 up
 
 
 Test results: 
 Local test, 1 process, 16 threads: 241.7 MB/s
 Local test, 8 processes, 128 threads: 374.8 MB/s
 Remote test, 1 process, 16 threads: 231.8 MB/s
 Remote test, 8 processes, 128 threads: 366.1 MB/s
 
 Maybe it's just me, but it seems on the low side too. 
 
 Thanks,
 Dinu
 
 
 On Oct 30, 2013, at 8:59 PM, Mark Nelson mark.nel...@inktank.com wrote:
 
 On 10/30/2013 01:51 PM, Dinu Vlad wrote:
 Mark,
 
 The SSDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
  and the HDDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
 
 The chasis is a SiliconMechanics C602 - but I don't have the exact model. 
 It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
 expander.
 
 I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according 
 to what the driver reports in dmesg). here are the results (filtered):
 
 Sequential:
 Run status group 0 (all jobs):
  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
 mint=60444msec, maxt=61463msec
 
 Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 
 153:189 MB/s
 
 Ok, that looks like what I'd expect to see given the controller being used.  
 SSDs are probably limited by total aggregate throughput.
 
 
 Random:
 Run status group 0 (all jobs):
  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
 mint=60404msec, maxt=61875msec
 
 Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one 
 out of 6 doing 101)
 
 This is on just one of the osd servers.
 
 Where the ceph tests to one OSD server or across all servers?  It might be 
 worth trying tests against a single server with no replication using 
 multiple rados bench instances and just seeing what happens.
 
 
 Thanks,
 Dinu
 
 
 On Oct 30, 2013, at 6:38 PM, Mark Nelson mark.nel...@inktank.com wrote:
 
 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
 Hello,
 
 I've been doing some tests on a newly installed ceph cluster:
 
 # ceph osd create bench1 2048 2048
 # ceph osd create bench2 2048 2048
 # rbd -p bench1 create test
 # rbd -p bench1 bench-write test --io-pattern rand
 elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
 
 # rados -p bench2 bench 300 write --show-time
 # (run 1)
 Total writes made:  20665
 Write size: 4194304
 Bandwidth (MB/sec): 274.923
 
 Stddev Bandwidth:   96.3316
 Max bandwidth (MB/sec): 748
 Min bandwidth (MB/sec): 0
 Average Latency:0.23273
 Stddev Latency: 0.262043
 Max latency:1.69475
 Min latency:0.057293
 
 These results seem to be quite poor for the configuration:
 
 MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
 OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
 controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
 journal, attached to a LSI 9207-8i controller.
 All servers have dual 10GE network cards, connected to a pair of 
 dedicated switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with a couple of concurrent rados bench processes going on our 
 SC847A chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs 
 is definitely on the low side.
 
 I'm actually not too familiar with what the RBD benchmarking commands are 
 doing behind the scenes.  Typically I've tested

Re: [ceph-users] testing ceph

2013-11-04 Thread Dinu Vlad

Is disk sda on server1 empty or does it contain already a partition? 


On Nov 4, 2013, at 5:25 PM, charles L charlesboy...@hotmail.com wrote:

 
  Pls can somebody help?  Im  getting this error.
 
 ceph@CephAdmin:~$ ceph-deploy osd create server1:sda:/dev/sdj1
 [ceph_deploy.cli][INFO  ] Invoked (1.3): /usr/bin/ceph-deploy osd create 
 server1:sda:/dev/sdj1
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
 server1:/dev/sda:/dev/sdj1
 [server1][DEBUG ] connected to host: server1
 [server1][DEBUG ] detect platform information from remote host
 [server1][DEBUG ] detect machine type
 [ceph_deploy.osd][INFO  ] Distro info: Ubuntu 12.04 precise
 [ceph_deploy.osd][DEBUG ] Deploying osd to server1
 [server1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
 [server1][INFO  ] Running command: sudo udevadm trigger 
 --subsystem-match=block --action=add
 [ceph_deploy.osd][DEBUG ] Preparing host server1 disk /dev/sda journal 
 /dev/sdj1 activate True
 [server1][INFO  ] Running command: sudo ceph-disk-prepare --fs-type xfs 
 --cluster ceph -- /dev/sda /dev/sdj1
 [server1][ERROR ] WARNING:ceph-disk:OSD will not be hot-swappable if journal 
 is not the same device as the osd data
 [server1][ERROR ] Could not create partition 1 from 34 to 2047
 [server1][ERROR ] Error encountered; not saving changes.
 [server1][ERROR ] ceph-disk: Error: Command '['sgdisk', '--largest-new=1', 
 '--change-name=1:ceph data', 
 '--partition-guid=1:d3ca8a92-7ba5-412e-abf5-06af958b788d', 
 '--typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be', '--', '/dev/sda']' 
 returned non-zero exit status 4
 [server1][ERROR ] Traceback (most recent call last):
 [server1][ERROR ]   File 
 /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 
 68, in run
 [server1][ERROR ] reporting(conn, result, timeout)
 [server1][ERROR ]   File 
 /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in 
 reporting
 [server1][ERROR ] received = result.receive(timeout)
 [server1][ERROR ]   File 
 /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py,
  line 455, in receive
 [server1][ERROR ] raise self._getremoteerror() or EOFError()
 [server1][ERROR ] RemoteError: Traceback (most recent call last):
 [server1][ERROR ]   File string, line 806, in executetask
 [server1][ERROR ]   File , line 35, in _remote_run
 [server1][ERROR ] RuntimeError: command returned non-zero exit status: 1
 [server1][ERROR ]
 [server1][ERROR ]
 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk-prepare 
 --fs-type xfs --cluster ceph -- /dev/sda /dev/sdj1
 [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
 
 
 
 
 
 
  Date: Thu, 31 Oct 2013 10:55:56 +
  From: joao.l...@inktank.com
  To: charlesboy...@hotmail.com; ceph-de...@vger.kernel.org
  Subject: Re: testing ceph
  
  On 10/31/2013 04:54 AM, charles L wrote:
   Hi,
   Pls is this a good setup for a production environment test of ceph? My 
   focus is on the SSD ... should it be partitioned(sdf1,2 ,3,4) and shared 
   by the four OSDs on a host? or is this a better configuration for the SSD 
   to be just one partition(sdf1) while all osd uses that one partition?
   my setup:
   - 6 Servers with one 250gb boot disk for OS(sda),
   four-2Tb Disks each for the OSDs i.e Total disks = 6x4 = 24 disks (sdb 
   -sde)
   and one-60GB SSD for Osd Journal(sdf).
   -RAM = 32GB on each server with 2 GB network link.
   hostname for servers: Server1 -Server6
  
  Charles,
  
  What you are describing on the ceph.conf below is definitely not a good 
  idea. If you really want to use just one SSD and share it across 
  multiple OSDs, then you have two possible approaches:
  
  - partition that disk and assign a *different* partition to each OSD; or
  - keep only one partition, format it with some filesystem, and assign a 
  *different* journal file within that fs to each OSD.
  
  What you are describing has you using the same partition for all OSDs. 
  This will likely create issues due to multiple OSDs writing and reading 
  from a single journal. TBH I'm not familiar enough with the journal 
  mechanism to know whether the OSDs will detect that situation.
  
  -Joao
  
  
   [osd.0]
   host = server1
   devs = /dev/sdb
   osd journal = /dev/sdf1
   [osd.1]
   host = server1
   devs = /dev/sdc
   osd journal = /dev/sdf2
  
   [osd.3]
   host = server1
   devs = /dev/sdd
   osd journal = /dev/sdf2
  
   [osd.4]
   host = server1
   devs = /dev/sde
   osd journal = /dev/sdf2
   [osd.5]
   host = server2
   devs = /dev/sdb
   osd journal = /dev/sdf2
   ...
   [osd.23]
   host = server6
   devs = /dev/sde
   osd journal = /dev/sdf2
  
   Thanks. --
   To unsubscribe from this list: send the line unsubscribe ceph-devel in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at http://vger.kernel.org/majordomo-info.html
  
  
  
  -- 
  Joao Eduardo Luis
  Software Engineer | http://inktank.com | http://ceph.com

Re: [ceph-users] Openstack Instances and RBDs

2013-11-02 Thread Dinu Vlad

I don't know of any guide besides the official install docs from 
grizzly/havana, but I'm running openstack grizzly on top of rbd storage using 
glance  cinder and it makes (almost) no use of /var/lib/nova/instances. Live 
migrations also work. The only files there should be config.xml and console 
- otherwise, live-migrations won't work OR the path should be a mounted shared 
storage (NFS, GlusterFS etc).  

Nova-compute stores disk* files under that path in the following cases:
- when one starts an instance only by using --image image id argument to 
nova-boot, without a pre-created cinder volume and without the 
--block-device-mapping argument
- when one uses a config disk for bootstrapping instances
- when one configures a swap disk in the flavor used to start the instance 


On Nov 2, 2013, at 2:32 AM, Gaylord Holder ghol...@cs.drexel.edu wrote:

 http://www.sebastien-han.fr/blog/2013/06/03/ceph-integration-in-openstack-grizzly-update-and-roadmap-for-havana/
 
 suggests it is possible to run openstack instances (not only images) off of 
 RBDs in grizzly and havana (which I'm running), and to use RBDs in lieu of a 
 shared file system.
 
 I've followed
 
 http://ceph.com/docs/next/rbd/libvirt/
 
 but I can only get boot-from-volume to work.  Instances still are being 
 housed in /var/lib/nova/instances, making live-migration a non-starter.
 
 Is there a better guide for running openstack instances out of RBDs, or is it 
 just not ready yet?
 
 Thanks,
 
 -Gaylord
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-11-02 Thread Dinu Vlad

Any other options or ideas? 

Thanks,
Dinu 


On Oct 31, 2013, at 6:35 PM, Dinu Vlad dinuvla...@gmail.com wrote:

 
 I tested the osd performance from a single node. For this purpose I deployed 
 a new cluster (using ceph-deploy, as before) and on fresh/repartitioned 
 drives. I created a single pool, 1800 pgs. I ran the rados bench both on the 
 osd server and on a remote one. Cluster configuration stayed default, with 
 the same additions about xfs mount  mkfs.xfs as before. 
 
 With a single host, the pgs were stuck unclean (active only, not 
 active+clean):
 
 # ceph -s
  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
   health HEALTH_WARN 1800 pgs stuck unclean
   monmap e1: 3 mons at 
 {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
   osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
 16759 GB avail
   mdsmap e1: 0/0/1 up
 
 
 Test results: 
 Local test, 1 process, 16 threads: 241.7 MB/s
 Local test, 8 processes, 128 threads: 374.8 MB/s
 Remote test, 1 process, 16 threads: 231.8 MB/s
 Remote test, 8 processes, 128 threads: 366.1 MB/s
 
 Maybe it's just me, but it seems on the low side too. 
 
 Thanks,
 Dinu
 
 
 On Oct 30, 2013, at 8:59 PM, Mark Nelson mark.nel...@inktank.com wrote:
 
 On 10/30/2013 01:51 PM, Dinu Vlad wrote:
 Mark,
 
 The SSDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
  and the HDDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
 
 The chasis is a SiliconMechanics C602 - but I don't have the exact model. 
 It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
 expander.
 
 I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according 
 to what the driver reports in dmesg). here are the results (filtered):
 
 Sequential:
 Run status group 0 (all jobs):
  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
 mint=60444msec, maxt=61463msec
 
 Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 
 153:189 MB/s
 
 Ok, that looks like what I'd expect to see given the controller being used.  
 SSDs are probably limited by total aggregate throughput.
 
 
 Random:
 Run status group 0 (all jobs):
  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
 mint=60404msec, maxt=61875msec
 
 Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one 
 out of 6 doing 101)
 
 This is on just one of the osd servers.
 
 Where the ceph tests to one OSD server or across all servers?  It might be 
 worth trying tests against a single server with no replication using 
 multiple rados bench instances and just seeing what happens.
 
 
 Thanks,
 Dinu
 
 
 On Oct 30, 2013, at 6:38 PM, Mark Nelson mark.nel...@inktank.com wrote:
 
 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
 Hello,
 
 I've been doing some tests on a newly installed ceph cluster:
 
 # ceph osd create bench1 2048 2048
 # ceph osd create bench2 2048 2048
 # rbd -p bench1 create test
 # rbd -p bench1 bench-write test --io-pattern rand
 elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
 
 # rados -p bench2 bench 300 write --show-time
 # (run 1)
 Total writes made:  20665
 Write size: 4194304
 Bandwidth (MB/sec): 274.923
 
 Stddev Bandwidth:   96.3316
 Max bandwidth (MB/sec): 748
 Min bandwidth (MB/sec): 0
 Average Latency:0.23273
 Stddev Latency: 0.262043
 Max latency:1.69475
 Min latency:0.057293
 
 These results seem to be quite poor for the configuration:
 
 MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
 OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
 controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
 journal, attached to a LSI 9207-8i controller.
 All servers have dual 10GE network cards, connected to a pair of 
 dedicated switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with a couple of concurrent rados bench processes going on our 
 SC847A chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs 
 is definitely on the low side.
 
 I'm actually not too familiar with what the RBD benchmarking commands are 
 doing behind the scenes.  Typically I've tested fio on top of a filesystem 
 on RBD.
 
 
 Using ubuntu 13.04, ceph 0.67.4, XFS for backend storage. Cluster was 
 installed using ceph-deploy. ceph.conf pretty much out of the box (diff 
 from default follows)
 
 osd_journal_size = 10240
 osd mount options xfs = rw,noatime,nobarrier,inode64
 osd mkfs options xfs = -f

Re: [ceph-users] ceph cluster performance

2013-10-31 Thread Dinu Vlad


I tested the osd performance from a single node. For this purpose I deployed a 
new cluster (using ceph-deploy, as before) and on fresh/repartitioned drives. I 
created a single pool, 1800 pgs. I ran the rados bench both on the osd server 
and on a remote one. Cluster configuration stayed default, with the same 
additions about xfs mount  mkfs.xfs as before. 

With a single host, the pgs were stuck unclean (active only, not 
active+clean):

# ceph -s
  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
   health HEALTH_WARN 1800 pgs stuck unclean
   monmap e1: 3 mons at 
{cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
 election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
   osdmap e101: 18 osds: 18 up, 18 in
pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
16759 GB avail
   mdsmap e1: 0/0/1 up


Test results: 
Local test, 1 process, 16 threads: 241.7 MB/s
Local test, 8 processes, 128 threads: 374.8 MB/s
Remote test, 1 process, 16 threads: 231.8 MB/s
Remote test, 8 processes, 128 threads: 366.1 MB/s

Maybe it's just me, but it seems on the low side too. 

Thanks,
Dinu


On Oct 30, 2013, at 8:59 PM, Mark Nelson mark.nel...@inktank.com wrote:

 On 10/30/2013 01:51 PM, Dinu Vlad wrote:
 Mark,
 
 The SSDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
  and the HDDs are 
 http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
 
 The chasis is a SiliconMechanics C602 - but I don't have the exact model. 
 It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
 expander.
 
 I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according 
 to what the driver reports in dmesg). here are the results (filtered):
 
 Sequential:
 Run status group 0 (all jobs):
   WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
 mint=60444msec, maxt=61463msec
 
 Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 
 153:189 MB/s
 
 Ok, that looks like what I'd expect to see given the controller being used.  
 SSDs are probably limited by total aggregate throughput.
 
 
 Random:
 Run status group 0 (all jobs):
   WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
 mint=60404msec, maxt=61875msec
 
 Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one out 
 of 6 doing 101)
 
 This is on just one of the osd servers.
 
 Where the ceph tests to one OSD server or across all servers?  It might be 
 worth trying tests against a single server with no replication using multiple 
 rados bench instances and just seeing what happens.
 
 
 Thanks,
 Dinu
 
 
 On Oct 30, 2013, at 6:38 PM, Mark Nelson mark.nel...@inktank.com wrote:
 
 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
 Hello,
 
 I've been doing some tests on a newly installed ceph cluster:
 
 # ceph osd create bench1 2048 2048
 # ceph osd create bench2 2048 2048
 # rbd -p bench1 create test
 # rbd -p bench1 bench-write test --io-pattern rand
 elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
 
 # rados -p bench2 bench 300 write --show-time
 # (run 1)
 Total writes made:  20665
 Write size: 4194304
 Bandwidth (MB/sec): 274.923
 
 Stddev Bandwidth:   96.3316
 Max bandwidth (MB/sec): 748
 Min bandwidth (MB/sec): 0
 Average Latency:0.23273
 Stddev Latency: 0.262043
 Max latency:1.69475
 Min latency:0.057293
 
 These results seem to be quite poor for the configuration:
 
 MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
 OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
 controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
 journal, attached to a LSI 9207-8i controller.
 All servers have dual 10GE network cards, connected to a pair of dedicated 
 switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with a couple of concurrent rados bench processes going on our 
 SC847A chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs 
 is definitely on the low side.
 
 I'm actually not too familiar with what the RBD benchmarking commands are 
 doing behind the scenes.  Typically I've tested fio on top of a filesystem 
 on RBD.
 
 
 Using ubuntu 13.04, ceph 0.67.4, XFS for backend storage. Cluster was 
 installed using ceph-deploy. ceph.conf pretty much out of the box (diff 
 from default follows)
 
 osd_journal_size = 10240
 osd mount options xfs = rw,noatime,nobarrier,inode64
 osd mkfs options xfs = -f -i size=2048
 
 [osd]
 public network = 10.4.0.0/24
 cluster network = 10.254.254.0/24
 
 All tests were run from a server outside

[ceph-users] ceph cluster performance

2013-10-30 Thread Dinu Vlad

Hello,

I've been doing some tests on a newly installed ceph cluster: 

# ceph osd create bench1 2048 2048
# ceph osd create bench2 2048 2048
# rbd -p bench1 create test
# rbd -p bench1 bench-write test --io-pattern rand
elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36

# rados -p bench2 bench 300 write --show-time 
# (run 1)
Total writes made:  20665
Write size: 4194304
Bandwidth (MB/sec): 274.923

Stddev Bandwidth:   96.3316
Max bandwidth (MB/sec): 748
Min bandwidth (MB/sec): 0
Average Latency:0.23273
Stddev Latency: 0.262043
Max latency:1.69475
Min latency:0.057293

These results seem to be quite poor for the configuration: 

MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
journal, attached to a LSI 9207-8i controller.
All servers have dual 10GE network cards, connected to a pair of dedicated 
switches. Each SSD has 3 10 GB partitions for journals. 

Using ubuntu 13.04, ceph 0.67.4, XFS for backend storage. Cluster was installed 
using ceph-deploy. ceph.conf pretty much out of the box (diff from default 
follows)
 
osd_journal_size = 10240
osd mount options xfs = rw,noatime,nobarrier,inode64
osd mkfs options xfs = -f -i size=2048

[osd]
public network = 10.4.0.0/24
cluster network = 10.254.254.0/24

All tests were run from a server outside the cluster, connected to the storage 
network with 2x 10 GE nics. 

I've done a few other tests of the individual components: 
- network: avg. 7.6 Gbit/s (iperf, mtu=1500), 9.6 Gbit/s (mtu=9000)
- md raid0 write across all 18 HDDs - 1.4 GB/s sustained throughput
- fio SSD write (xfs, 4k blocks, directio): ~ 250 MB/s, ~55K IOPS

I'd appreciate any suggestion that might help improve the performance or 
identify a bottleneck. 

Thanks
Dinu 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster performance

2013-10-30 Thread Dinu Vlad

Mark,

The SSDs are 
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
 and the HDDs are 
http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
 

The chasis is a SiliconMechanics C602 - but I don't have the exact model. 
It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
expander. 

I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according to 
what the driver reports in dmesg). here are the results (filtered): 

Sequential: 
Run status group 0 (all jobs):
  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
mint=60444msec, maxt=61463msec

Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 153:189 
MB/s 

Random: 
Run status group 0 (all jobs):
  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
mint=60404msec, maxt=61875msec

Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one out of 
6 doing 101)

This is on just one of the osd servers.

Thanks,
Dinu


On Oct 30, 2013, at 6:38 PM, Mark Nelson mark.nel...@inktank.com wrote:

 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
 Hello,
 
 I've been doing some tests on a newly installed ceph cluster:
 
 # ceph osd create bench1 2048 2048
 # ceph osd create bench2 2048 2048
 # rbd -p bench1 create test
 # rbd -p bench1 bench-write test --io-pattern rand
 elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
 
 # rados -p bench2 bench 300 write --show-time
 # (run 1)
 Total writes made:  20665
 Write size: 4194304
 Bandwidth (MB/sec): 274.923
 
 Stddev Bandwidth:   96.3316
 Max bandwidth (MB/sec): 748
 Min bandwidth (MB/sec): 0
 Average Latency:0.23273
 Stddev Latency: 0.262043
 Max latency:1.69475
 Min latency:0.057293
 
 These results seem to be quite poor for the configuration:
 
 MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
 OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
 controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
 journal, attached to a LSI 9207-8i controller.
 All servers have dual 10GE network cards, connected to a pair of dedicated 
 switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with a couple of concurrent rados bench processes going on our SC847A 
 chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs is 
 definitely on the low side.
 
 I'm actually not too familiar with what the RBD benchmarking commands are 
 doing behind the scenes.  Typically I've tested fio on top of a filesystem on 
 RBD.
 
 
 Using ubuntu 13.04, ceph 0.67.4, XFS for backend storage. Cluster was 
 installed using ceph-deploy. ceph.conf pretty much out of the box (diff from 
 default follows)
 
 osd_journal_size = 10240
 osd mount options xfs = rw,noatime,nobarrier,inode64
 osd mkfs options xfs = -f -i size=2048
 
 [osd]
 public network = 10.4.0.0/24
 cluster network = 10.254.254.0/24
 
 All tests were run from a server outside the cluster, connected to the 
 storage network with 2x 10 GE nics.
 
 I've done a few other tests of the individual components:
 - network: avg. 7.6 Gbit/s (iperf, mtu=1500), 9.6 Gbit/s (mtu=9000)
 - md raid0 write across all 18 HDDs - 1.4 GB/s sustained throughput
 - fio SSD write (xfs, 4k blocks, directio): ~ 250 MB/s, ~55K IOPS
 
 What you might want to try doing is 4M direct IO writes using libaio and a 
 high iodepth to all drives (spinning disks and SSDs) concurrently and see how 
 both the per-drive and aggregate throughput is.
 
 With just SSDs, I've been able to push the 9207-8i up to around 3GB/s with 
 Ceph writes (1.5GB/s if you don't count journal writes), but perhaps there is 
 something interesting about the way the hardware is setup on your system.
 
 
 I'd appreciate any suggestion that might help improve the performance or 
 identify a bottleneck.
 
 Thanks
 Dinu
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Proper procedure for osd/host removal

Re: [ceph-users] Proper procedure for osd/host removal

Re: [ceph-users] Openstack Havana root fs resize don't work

[ceph-users] Move osd disks between hosts

Re: [ceph-users] Move osd disks between hosts

[ceph-users] rados federated gateway - selective replication

[ceph-users] Questions about federated rados gateway configuration

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

Re: [ceph-users] Ephemeral RBD with Havana and Dumpling

Re: [ceph-users] ceph 0.72 with zfs

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] ceph 0.72 with zfs

Re: [ceph-users] Havana RBD - a few problems

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] ceph cluster performance

[ceph-users] ceph 0.72 with zfs

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] testing ceph

Re: [ceph-users] Openstack Instances and RBDs

Re: [ceph-users] ceph cluster performance

Re: [ceph-users] ceph cluster performance

[ceph-users] ceph cluster performance

Re: [ceph-users] ceph cluster performance

25 matches

Site Navigation

Mail list logo

Footer information