[ceph-users] RBD back
Hi all Ceph Object Store service can spans geographical locals . Now ceph also provides FS and RBD .IF our applications need the RBD service .Can we provide backup and disaster recovery for it via gateway through some transfermation ? In fact the cluster stored RBD data as objects in pools(default rbd), for another words, can we accomplish that backup some pools in a ceph cluster (without s3 )via the gateway . lixuehui___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph 0.72 with zfs
Any chance this option will be included for future emperor binaries? I don't mind compiling software, but I would like to keep things upgradable via apt-get … Thanks, Dinu On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote: Hi Dinu, You currently need to compile yourself, and pass --with-zfs to ./configure. Once it is built in, ceph-osd will detect whether the underlying fs is zfs on its own. sage On Wed, 6 Nov 2013, Dinu Vlad wrote: Hello, I'm testing the 0.72 release and thought to give a spin to the zfs support. While I managed to setup a cluster on top of a number of zfs datasets, the ceph-osd logs show it's using the genericfilestorebackend: 2013-11-06 09:27:59.386392 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is NOT supported 2013-11-06 09:27:59.386409 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-06 09:27:59.391026 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) I noticed however that the ceph sources include some files related to zfs: # find . | grep -i zfs ./src/os/ZFS.cc ./src/os/ZFS.h ./src/os/ZFSFileStoreBackend.cc ./src/os/ZFSFileStoreBackend.h A coupel of questions: - is 0.72-rc1 package currently in the raring repository compiled with zfs support ? - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? Thanks, Dinu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph 0.72 with zfs
The challenge here is that libzfs is currently a build time dependency, which means it needs to be included in the target distro already, or we need to bundle it in the Ceph.com repos. I am currently looking at the possibility of making the OSD back end dynamically linked at runtime, which would allow a separately packaged zfs back end; that may (or may not!) help. sage Dinu Vlad dinuvla...@gmail.com wrote: Any chance this option will be included for future emperor binaries? I don't mind compiling software, but I would like to keep things upgradable via apt-get … Thanks, Dinu On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote: Hi Dinu, You currently need to compile yourself, and pass --with-zfs to ./configure. Once it is built in, ceph-osd will detect whether the underlying fs is zfs on its own. sage On Wed, 6 Nov 2013, Dinu Vlad wrote: Hello, I'm testing the 0.72 release and thought to give a spin to the zfs support. While I managed to setup a cluster on top of a number of zfs datasets, the ceph-osd logs show it's using the genericfilestorebackend: 2013-11-06 09:27:59.386392 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is NOT supported 2013-11-06 09:27:59.386409 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-06 09:27:59.391026 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) I noticed however that the ceph sources include some files related to zfs: # find . | grep -i zfs ./src/os/ZFS.cc ./src/os/ZFS.h ./src/os/ZFSFileStoreBackend.cc ./src/os/ZFSFileStoreBackend.h A coupel of questions: - is 0.72-rc1 package currently in the raring repository compiled with zfs support ? - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? Thanks, Dinu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
I had great results from the older 530 series too. In this case however, the SSDs were only used for journals and I don't know if ceph-osd sends TRIM to the drive in the process of journaling over a block device. They were also under-subscribed, with just 3 x 10G partitions out of 240 GB raw capacity. I did a manual trim, but it hasn't changed anything. I'm still having fun with the configuration so I'll be able to use Mike Dawson's suggested tools to check for latencies. On Nov 6, 2013, at 11:35 PM, ja...@peacon.co.uk wrote: On 2013-11-06 20:25, Mike Dawson wrote: We just fixed a performance issue on our cluster related to spikes of high latency on some of our SSDs used for osd journals. In our case, the slow SSDs showed spikes of 100x higher latency than expected. Many SSDs show this behaviour when 100% provisioned and/or never TRIM'd, since the pool of ready erased cells is quickly depleted under steady write workload, so it has to wait for cells to charge to accommodate the write. The Intel 3700 SSDs look to have some of the best consistency ratings of any of the more reasonably priced drives at the moment, and good IOPS too: http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html Obviously the quoted IOPS numbers are dependent on quite a deep queue mind. There is a big range of performance in the market currently; some Enterprise SSDs are quoted at just 4,000 IOPS yet cost as many pounds! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph 0.72 with zfs
Looking forward to it. Tests done so far show some interesting results - so I'm considering it for future production use. On Nov 7, 2013, at 1:01 PM, Sage Weil s...@newdream.net wrote: The challenge here is that libzfs is currently a build time dependency, which means it needs to be included in the target distro already, or we need to bundle it in the Ceph.com repos. I am currently looking at the possibility of making the OSD back end dynamically linked at runtime, which would allow a separately packaged zfs back end; that may (or may not!) help. sage Dinu Vlad dinuvla...@gmail.com wrote: Any chance this option will be included for future emperor binaries? I don't mind compiling software, but I would like to keep things upgradable via apt-get … Thanks, Dinu On Nov 7, 2013, at 4:05 AM, Sage Weil s...@inktank.com wrote: Hi Dinu, You currently need to compile yourself, and pass --with-zfs to ./configure. Once it is built in, ceph-osd will detect whether the underlying fs is zfs on its own. sage On Wed, 6 Nov 2013, Dinu Vlad wrote: Hello, I'm testing the 0.72 release and thought to give a spin to the zfs support. While I managed to setup a cluster on top of a number of zfs datasets, the ceph-osd logs show it's using the genericfilestorebackend: 2013-11-06 09:27:59.386392 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is NOT supported 2013-11-06 09:27:59.386409 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-11-06 09:27:59.391026 7fdfee0ab7c0 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) I noticed however that the ceph sources include some files related to zfs: # find . | grep -i zfs ./src/os/ZFS.cc ./src/os/ZFS.h ./src/os/ZFSFileStoreBackend.cc ./src/os/ZFSFileStoreBackend.h A coupel of questions: - is 0.72-rc1 package currently in the raring repository compiled with zfs support ? - if yes - how can I inform ceph-osd to use the ZFSFileStoreBackend ? Thanks, Dinu ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ANN] ceph-deploy 1.3 released!
Hi every one, The version 1.3 of ceph-deploy I installed yesterday from official repo used: sudo wget ... | apt-key add to install which failed because the apt-key command was not run with sudo, but the version 1.3.1 I got this morning seems to work (no pipe anymore, it uses a file, and sudo for both commands). The version 1.3 also used Python's os.rename() in a weird way, which triggered errors about cross-device renaming (my root filesystem has separate mountpoints for /tmp and /var), for instance with ceph-deploy config push, or when ceph-deploy osd create would call write_keyring(), and this bug also disappeared in 1.3.1. So, to sum up: all bugs went away, so I am happy Could you indicate where release notes for ceph-deploy may be found, so I do not have to blindly struggle with that kind of issue again? Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ANN] ceph-deploy 1.3 released!
On Thu, Nov 7, 2013 at 7:53 AM, nicolasc nicolas.cance...@surfsara.nl wrote: Hi every one, The version 1.3 of ceph-deploy I installed yesterday from official repo used: sudo wget ... | apt-key add to install which failed because the apt-key command was not run with sudo, but the version 1.3.1 I got this morning seems to work (no pipe anymore, it uses a file, and sudo for both commands). The version 1.3 also used Python's os.rename() in a weird way, which triggered errors about cross-device renaming (my root filesystem has separate mountpoints for /tmp and /var), for instance with ceph-deploy config push, or when ceph-deploy osd create would call write_keyring(), and this bug also disappeared in 1.3.1. So, to sum up: all bugs went away, so I am happy Could you indicate where release notes for ceph-deploy may be found, so I do not have to blindly struggle with that kind of issue again? You are correct, all of those issues have been corrected and released as ceph-deploy 1.3.1 The problem here is that you beat me to the punch :) We try to release ceph-deploy as often as possible and even more so when there are bugs that are clearly causing issues for users and preventing them to complete core functionality (like installing). The announcement with the complete changelog (and link) is not sent out immediately however, because the packaging and repository synchronization can take a few hours, so I usually wait until that is complete to make sure the announcement goes out when all the repositories are in sync and there were no issues getting ceph-deploy out. You can find the changelog here: https://github.com/ceph/ceph-deploy/blob/master/docs/source/changelog.rst But I should get the announcement out next. Sorry for all the trouble! Best regards, Nicolas Canceill Scalable Storage Systems SURFsara (Amsterdam, NL) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.3.1 released
Hi All, There is a new (bug-fix) release of ceph-deploy, the easy deployment tool for Ceph. There were a couple of issues related to GPG keys when installing in Debian and Debian-based distros that where addressed. A fix was added to improve moving temporary files to overwrite other files like ceph.conf that was preventing some OSD operations. The full changelog can be found at: https://github.com/ceph/ceph-deploy/blob/master/docs/source/changelog.rst Make sure you update! Thanks, Alfredo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] computing PG IDs
Hi everyone, I just started to look at the documentation of Ceph and I've hit something I don't understand. It's about something on http://ceph.com/docs/master/architecture/ use the following steps to compute PG IDs. The client inputs the pool ID and the object ID. (e.g., pool = ?liverpool? and object-id = ?john?) CRUSH takes the object ID and hashes it. -- CRUSH calculates the hash modulo the number of OSDs. (e.g., 0x58) to get a PG ID. --- CRUSH gets the pool ID given the pool name (e.g., ?liverpool? = 4) CRUSH prepends the pool ID to the PG ID (e.g., 4.0x58). Shouldn't this be 'CRUSH calculates the hash modulo the the number of PGs to get a PG ID' ? But then what happens if you add more PGs to the pool? Then most of the data will be reallocated to another PG? Thanks for your help! Kind Regards, Kenneth Waegeman ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can't activate OSD with journal and data on the same disk
Hi! I have a question about activating OSD on whole disk. I can't bypass this issue. Conf spec: 8 VMs - ceph-deploy; ceph-admin; ceph-mon0-2 and ceph-node0-2; I started from creating MON - all good . After that I want to prepare and activate 3x OSD with dm-crypt. So I put on ceph.conf this [osd.0] host = ceph-node0 cluster addr = 10.0.0.75:6800 public addr = 10.0.0.75:6801 devs = /dev/sdb Next I use ceph-deploy to activate a OSD and this shows root@ceph-deploy:~/ceph# ceph-deploy osd prepare ceph-node0:/dev/sdb --dmcrypt [ceph_deploy.cli][INFO ] Invoked (1.3.1): /usr/bin/ceph-deploy osd prepare ceph-node0:/dev/sdb --dmcrypt [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-node0:/dev/sdb: [ceph-node0][DEBUG ] connected to host: ceph-node0 [ceph-node0][DEBUG ] detect platform information from remote host [ceph-node0][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Deploying osd to ceph-node0 [ceph-node0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph-node0][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host ceph-node0 disk /dev/sdb journal None activate False [ceph-node0][INFO ] Running command: ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb [ceph-node0][ERROR ] INFO:ceph-disk:Will colocate journal with data on /dev/sdb [ceph-node0][ERROR ] ceph-disk: Error: partition 1 for /dev/sdb does not appear to exist [ceph-node0][DEBUG ] Information: Moved requested sector from 34 to 2048 in [ceph-node0][DEBUG ] order to align on 2048-sector boundaries. [ceph-node0][DEBUG ] The operation has completed successfully. [ceph-node0][DEBUG ] Information: Moved requested sector from 2097153 to 2099200 in [ceph-node0][DEBUG ] order to align on 2048-sector boundaries. [ceph-node0][DEBUG ] Warning: The kernel is still using the old partition table. [ceph-node0][DEBUG ] The new table will be used at the next reboot. [ceph-node0][DEBUG ] The operation has completed successfully. [ceph-node0][ERROR ] Traceback (most recent call last): [ceph-node0][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/process.py, line 68, in run [ceph-node0][ERROR ] reporting(conn, result, timeout) [ceph-node0][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/log.py, line 13, in reporting [ceph-node0][ERROR ] received = result.receive(timeout) [ceph-node0][ERROR ] File /usr/lib/python2.7/dist-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py, line 455, in receive [ceph-node0][ERROR ] raise self._getremoteerror() or EOFError() [ceph-node0][ERROR ] RemoteError: Traceback (most recent call last): [ceph-node0][ERROR ] File string, line 806, in executetask [ceph-node0][ERROR ] File , line 35, in _remote_run [ceph-node0][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph-node0][ERROR ] [ceph-node0][ERROR ] [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs It's looks like ceph-disk-prepare can't mount (activate?) the one of disk. So I go to ceph-node0 and listed disk, this shows: root@ceph-node0:~# ls /dev/sd sda sda1 sda2 sda5 sdb sdb2 Ups - there are no sdb1. So I printed all partitions on /dev/sdb and there is two: Number Beg End Size Filesystem Name Flags 2 1049kB1074MB 1073MB ceph journal 1 1075MB16,1GB 15,0GB ceph data Where sdb1 should be for data and sdb2 for journal. When I restart the VM /dev/sdb1 start showing. root@ceph-node0:~# ls /dev/sd sda sda1 sda2 sda5 sdb sdb1 sdb2 But I cant mount When I put journal to separate file/disk, there is no problem with activating (journal are on separate disk, and all partition data are on sdb1). There is log from this acction (I put journal to file in /mnt/sdb2) root@ceph-deploy:~/ceph# ceph-deploy osd prepare ceph-node0:/dev/sdb:/mnt/sdb2 --dmcrypt [ceph_deploy.cli][INFO ] Invoked (1.3.1): /usr/bin/ceph-deploy osd prepare ceph-node0:/dev/sdb:/mnt/sdb2 --dmcrypt [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-node0:/dev/sdb:/mnt/sdb2 [ceph-node0][DEBUG ] connected to host: ceph-node0 [ceph-node0][DEBUG ] detect platform information from remote host [ceph-node0][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Ubuntu 13.04 raring [ceph_deploy.osd][DEBUG ] Deploying osd to ceph-node0 [ceph-node0][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph-node0][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add [ceph_deploy.osd][DEBUG ] Preparing host ceph-node0 disk /dev/sdb journal /mnt/sdb2
[ceph-users] please help me.problem with my ceph
1. I have installed ceph with one mon/mds and one osd.When i use 'ceph -s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) 2. i mount a client.'192.168.3.189:/ 100G 1009M 97G 2% /mnt/ceph' but i can't creat a file or a directory because of no permission. my conf is listed bellow.please tell my how to fix these problems,thanks ; ; Sample ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. ; If a 'host' is defined for a daemon, the init.d start/stop script will ; verify that it matches the hostname (or else ignore it). If it is ; not defined, it is assumed that the daemon is intended to start on ; the current host (e.g., in a setup with a startup.conf on each ; node). ; The variables $type, $id and $name are available to use in paths ; $type = The type of daemon, possible values: mon, mds and osd ; $id = The ID of the daemon, for mon.alpha, $id will be alpha ; $name = $type.$id ; For example: ; osd.0 ; $type = osd ; $id = 0 ; $name = osd.0 ; mon.beta ; $type = mon ; $id = beta ; $name = mon.beta ; global [global] ; enable secure authentication auth supported = cephx ; allow ourselves to open a lot of files max open files = 131072 ; set log file log file = /var/log/ceph/$name.log ; log_to_syslog = true; uncomment this line to log to syslog ; set up pid files pid file = /var/run/ceph/$name.pid ; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible ;ms bind ipv6 = true ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data = /data/$name ; If you are using for example the RADOS Gateway and want to have your newly created ; pools a higher replication level, you can set a default osd pool default size = 1 ; You can also specify a CRUSH rule for new pools ; Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH ;osd pool default crush rule = 0 ; Timing is critical for monitors, but if you want to allow the clocks to drift a ; bit more, you can specify the max drift. ;mon clock drift allowed = 1 ; Tell the monitor to backoff from this warning for 30 seconds ;mon clock drift warn backoff = 30 ; logging, for debugging monitor crashes, in order of ; their likelihood of being helpful :) ;debug ms = 1 ;debug mon = 20 ;debug paxos = 20 ;debug auth = 20 [mon.alpha] host = ca189 mon addr = 192.168.3.189:6789 ; mds ; You need at least one. Define two to get a standby. [mds] ; where the mds keeps it's secret encryption keys keyring = /data/keyring.$name ; mds logging to debug issues. ;debug ms = 1 ;debug mds = 20 [mds.alpha] host = ca189 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] ; This is where the osd expects its data osd data = /data/$name ; Ideally, make the journal a separate disk or partition. ; 1-10GB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/$name/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. osd journal = /data/$name/journal osd journal size = 1000 ; journal size, in megabytes ; If you want to run the journal on a tmpfs (don't), disable DirectIO ;journal dio = false ; You can change the number of recovery operations to speed up recovery ; or slow it down if your machines can't handle it ; osd recovery max active = 3 ; osd logging to debug osd issues, in order of likelihood of being ; helpful ;debug ms = 1 ;debug osd = 20 ;debug filestore = 20 ;debug journal = 20 ; ### The below options only apply if you're using mkcephfs ; ### and the devs options ; The filesystem used on the volumes osd mkfs type = btrfs ; If you want to specify some other mount options, you can do so. ; for other filesystems use 'osd mount options $fstype' osd mount options btrfs = rw,noatime ; The options used to format the filesystem via mkfs.$fstype ; for other filesystems use 'osd mkfs options $fstype' ; osd mkfs options btrfs = [osd.0] host = ca191 ; if 'devs' is not specified, you're responsible for ; setting up the 'osd data' dir. devs =
Re: [ceph-users] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel), Requires: python-sphinx
On Wed, Nov 6, 2013 at 8:25 PM, Eyal Gutkind ey...@mellanox.com wrote: Trying to install ceph on my machines. Using RHEL6.3 I get the following error while invoking ceph-deploy. Tried to install sphinx on ceph-node, seems to be success full and installed. Still, it seems that during the installation there is an unresolved dependency. This looks like you don't have the right repos enabled on your box. I think you need to enable the EPEL repositories to resolve these. [apollo006][INFO ] Running command: sudo yum -y -q install ceph [apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel) [apollo006][ERROR ]Requires: python-sphinx Below is the deploying command line $ ceph-deploy install apollo006 [ceph_deploy.cli][INFO ] Invoked (1.3): /usr/bin/ceph-deploy install apollo006 [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster ceph hosts apollo006 [ceph_deploy.install][DEBUG ] Detecting platform for host apollo006 ... [apollo006][DEBUG ] connected to host: apollo006 [apollo006][DEBUG ] detect platform information from remote host [apollo006][DEBUG ] detect machine type [ceph_deploy.install][INFO ] Distro info: Red Hat Enterprise Linux Server 6.3 Santiago [apollo006][INFO ] installing ceph on apollo006 [apollo006][INFO ] Running command: sudo rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc [apollo006][INFO ] Running command: sudo rpm -Uvh --replacepkgs http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm [apollo006][DEBUG ] Retrieving http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm [apollo006][DEBUG ] Preparing... ## [apollo006][DEBUG ] ceph-release ## [apollo006][INFO ] Running command: sudo yum -y -q install ceph [apollo006][ERROR ] Error: Package: 1:python-flask-0.9-5.el6.noarch (epel) [apollo006][ERROR ]Requires: python-sphinx [apollo006][DEBUG ] You could try using --skip-broken to work around the problem [apollo006][DEBUG ] You could try running: rpm -Va --nofiles --nodigest [apollo006][ERROR ] Traceback (most recent call last): [apollo006][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/process.py, line 68, in run [apollo006][ERROR ] reporting(conn, result, timeout) [apollo006][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/log.py, line 13, in reporting [apollo006][ERROR ] received = result.receive(timeout) [apollo006][ERROR ] File /usr/lib/python2.6/site-packages/ceph_deploy/lib/remoto/lib/execnet/gateway_base.py, line 455, in receive [apollo006][ERROR ] raise self._getremoteerror() or EOFError() [apollo006][ERROR ] RemoteError: Traceback (most recent call last): [apollo006][ERROR ] File string, line 806, in executetask [apollo006][ERROR ] File , line 35, in _remote_run [apollo006][ERROR ] RuntimeError: command returned non-zero exit status: 1 [apollo006][ERROR ] [apollo006][ERROR ] [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: yum -y -q install ceph Thank you for your help, EyalG ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Havana RBD - a few problems
Hi all we have installed a Havana OpenStack cluster with RBD as the backing storage for volumes, images and the ephemeral images. The code as delivered in https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498 fails, because the RBD.path it not set. I have patched this to read: @@ -419,10 +419,12 @@ class Rbd(Image): if path: try: self.rbd_name = path.split('/')[1] +self.path = path except IndexError: raise exception.InvalidDevicePath(path=path) else: self.rbd_name = '%s_%s' % (instance['name'], disk_name) +self.path = 'volumes/%s' % self.rbd_name self.snapshot_name = snapshot_name if not CONF.libvirt_images_rbd_pool: raise RuntimeError(_('You should specify' but am not sure this is correct. I have the following problems: 1) can't inject data into image 2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] [instance: 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into image fc8179d4-14f3-4f21-a76d-72b03b5c1862 2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image (Error mounting volumes/ instance- 0089_disk with libguestfs (volumes/instance-0089_disk: No such file or directory)) possibly the self.path = … is wrong - but what are the correct values? 2) Creating a new instance from an ISO image fails completely - no bootable disk found, says the KVM console. Related? 3) When creating a new instance from an image (non ISO images work), the disk is not resized to the size specified in the flavor (but left at the size of the original image) I would be really grateful, if those people that have Grizzly/Havana running with an RBD backend could pipe in here… thanks Jens-Christian -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fisc...@switch.ch http://www.switch.ch http://www.switch.ch/socialmedia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
-Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Dinu Vlad Sent: Thursday, November 07, 2013 3:30 AM To: ja...@peacon.co.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph cluster performance In this case however, the SSDs were only used for journals and I don't know if ceph-osd sends TRIM to the drive in the process of journaling over a block device. They were also under-subscribed, with just 3 x 10G partitions out of 240 GB raw capacity. I did a manual trim, but it hasn't changed anything. If your SSD capacity is well in excess of your journal capacity requirements you could consider overprovisioning the SSD. Overprovisioning should increase SSD performance and lifetime. This achieves the same effect as trim to some degree (lets the SSD better understand what cells have real data and which can be treated as free). I wonder how effective trim would be on a Ceph journal area. If the journal empties and is then trimmed the next write cycle should be faster, but if the journal is active all the time the benefits would be lost almost immediately, as those cells are going to receive data again almost immediately and go back to an untrimmed state until the next trim occurs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Havana RBD - a few problems
Under grizzly we disabled completely the image injection via libvirt_inject_partition = -2 in nova.conf. I'm not sure rbd images can even be mounted that way - but then again, I don't have experience with havana. We're using config disks (which break live migrations) and/or the metadata service (which does not) in combination with cloud-init, to bootstrap instances. On Nov 7, 2013, at 6:15 PM, Jens-Christian Fischer jens-christian.fisc...@switch.ch wrote: Hi all we have installed a Havana OpenStack cluster with RBD as the backing storage for volumes, images and the ephemeral images. The code as delivered in https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498 fails, because the RBD.path it not set. I have patched this to read: @@ -419,10 +419,12 @@ class Rbd(Image): if path: try: self.rbd_name = path.split('/')[1] +self.path = path except IndexError: raise exception.InvalidDevicePath(path=path) else: self.rbd_name = '%s_%s' % (instance['name'], disk_name) +self.path = 'volumes/%s' % self.rbd_name self.snapshot_name = snapshot_name if not CONF.libvirt_images_rbd_pool: raise RuntimeError(_('You should specify' but am not sure this is correct. I have the following problems: 1) can't inject data into image 2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] [instance: 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into image fc8179d4-14f3-4f21-a76d-72b03b5c1862 2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image (Error mounting volumes/ instance- 0089_disk with libguestfs (volumes/instance-0089_disk: No such file or directory)) possibly the self.path = … is wrong - but what are the correct values? 2) Creating a new instance from an ISO image fails completely - no bootable disk found, says the KVM console. Related? 3) When creating a new instance from an image (non ISO images work), the disk is not resized to the size specified in the flavor (but left at the size of the original image) I would be really grateful, if those people that have Grizzly/Havana running with an RBD backend could pipe in here… thanks Jens-Christian -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fisc...@switch.ch http://www.switch.ch http://www.switch.ch/socialmedia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
On 11/07/2013 11:47 AM, Gruher, Joseph R wrote: -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of Dinu Vlad Sent: Thursday, November 07, 2013 3:30 AM To: ja...@peacon.co.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph cluster performance In this case however, the SSDs were only used for journals and I don't know if ceph-osd sends TRIM to the drive in the process of journaling over a block device. They were also under-subscribed, with just 3 x 10G partitions out of 240 GB raw capacity. I did a manual trim, but it hasn't changed anything. If your SSD capacity is well in excess of your journal capacity requirements you could consider overprovisioning the SSD. Overprovisioning should increase SSD performance and lifetime. This achieves the same effect as trim to some degree (lets the SSD better understand what cells have real data and which can be treated as free). I wonder how effective trim would be on a Ceph journal area. If the journal empties and is then trimmed the next write cycle should be faster, but if the journal is active all the time the benefits would be lost almost immediately, as those cells are going to receive data again almost immediately and go back to an untrimmed state until the next trim occurs. over-provisioning is definitely something to consider, especially if you aren't buying SSDs with high write endurance. The more cells you can spread the load out over the better. We've had some interesting conversations on here in the past about whether or not it's more cost effective to buy large capacity consumer grade SSDs with more cells or shell out for smaller capacity enterprise grade drives. My personal opinion is that it's worth paying a bit extra for a drive that employs something like MLC-HET, but there's a lot of enterprise grade drives out there with low write endurance that you really have to watch out for. If you are going to pay extra, at least get something with high write endurance and reasonable write speeds. Mark ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
On 2013-11-07 17:47, Gruher, Joseph R wrote: I wonder how effective trim would be on a Ceph journal area. If the journal empties and is then trimmed the next write cycle should be faster, but if the journal is active all the time the benefits would be lost almost immediately, as those cells are going to receive data again almost immediately and go back to an untrimmed state until the next trim occurs. If it's under-provisioned (so the device knows there are unused cells), the device would simply write to an empty cell and flag the old cell for erasing, so there should be no change. Latency would rise when sustained write rate exceeded the devices' ability to clear cells, so eventually the stock of ready cells would be depleted. FWIW, I think there is considerable mileage in the larger-consumer grade argument. Assuming drives will be half the price in a years time, so selecting devices that can last only a year is preferable to spending 3x the price on one that can survive three. That though opens the tin of worms that is SMART reporting and moving journals at some future point mind. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Manual Installation steps without ceph-deploy
I've seen this before too. CentOS starts up without networking on by default. In my case, the problem was that the monitors cannot form a quorum and OSDs cannot find each other or monitors. Hence, you get that broken pipe error. You either need to have the networking start on startup before the OSDs, or start ceph after you boot up and ensure the network is running properly. The nodes have to be able to reach each other for Ceph to work. As for Ubuntu, I believe the networking is on by default. On Wed, Nov 6, 2013 at 1:35 PM, Trivedi, Narendra narendra.triv...@savvis.com wrote: Hi All, I did a fresh install of Ceph (this might be like 10th or 11th install) on 4 new VMs (one admin, one MON and two OSDs) built from CentOS 6.4 (x64) .iso , did a yum update on all of them. They are all running on vmware ESXi 5.1.0. I did everything sage et al suggested (i.e. creation of /ceph/osd* and making sure /etc/ceph is present on all nodes. /etc/ceph gets created all the ceph-deploy install and contains rbdmap FYI). Unusually, I ended up with the same problem while activating OSDs (the last 4 lines keep going on and on forever): 2013-11-06 14:37:39,626 [ceph_deploy.cli][INFO ] Invoked (1.3): /usr/bin/ceph-deploy osd activate ceph-node2-osd0-centos-6-4:/ceph/osd0 ceph-node3-osd1-centos-6-4:/ceph/osd1 2013-11-06 14:37:39,627 [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ceph-node2-osd0-centos-6-4:/ceph/osd0: ceph-node3-osd1-centos-6-4:/ceph/osd1: 2013-11-06 14:37:39,901 [ceph-node2-osd0-centos-6-4][DEBUG ] connected to host: ceph-node2-osd0-centos-6-4 2013-11-06 14:37:39,902 [ceph-node2-osd0-centos-6-4][DEBUG ] detect platform information from remote host 2013-11-06 14:37:39,917 [ceph-node2-osd0-centos-6-4][DEBUG ] detect machine type 2013-11-06 14:37:39,925 [ceph_deploy.osd][INFO ] Distro info: CentOS 6.4 Final 2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] activating host ceph-node2-osd0-centos-6-4 disk /ceph/osd0 2013-11-06 14:37:39,925 [ceph_deploy.osd][DEBUG ] will use init type: sysvinit 2013-11-06 14:37:39,925 [ceph-node2-osd0-centos-6-4][INFO ] Running command: sudo ceph-disk-activate --mark-init sysvinit --mount /ceph/osd0 2013-11-06 14:37:40,145 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 14:37:41.075310 7fac2414c700 0 -- :/1029546 10.12.0.70:6789/0 pipe(0x7fac20024480 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac200246e0).fault 2013-11-06 14:37:43,167 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 14:37:44.071697 7fac1ebfd700 0 -- :/1029546 10.12.0.70:6789/0 pipe(0x7fac14000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14000e60).fault 2013-11-06 14:37:46,140 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 14:37:47.071938 7fac2414c700 0 -- :/1029546 10.12.0.70:6789/0 pipe(0x7fac14003010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003270).fault 2013-11-06 14:37:50,165 [ceph-node2-osd0-centos-6-4][ERROR ] 2013-11-06 14:37:51.071245 7fac1ebfd700 0 -- :/1029546 10.12.0.70:6789/0 pipe(0x7fac14003a70 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7fac14003cd0).fault It might be bad luck but I want to try a manual installation without ceph-deploy because it seems I am jinxed with ceph-deploy. Could anyone please forward me the steps. I am happy to share the ceph.log with anyone who would like to research on this error but I don’t a have clue. Thanks a lot! Narendra Trivedi | savviscloud This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
I was under the same impression - using a small portion of the SSD via partitioning (in my case - 30 gigs out of 240) would have the same effect as activating the HPA explicitly. Am I wrong? On Nov 7, 2013, at 8:16 PM, ja...@peacon.co.uk wrote: On 2013-11-07 17:47, Gruher, Joseph R wrote: I wonder how effective trim would be on a Ceph journal area. If the journal empties and is then trimmed the next write cycle should be faster, but if the journal is active all the time the benefits would be lost almost immediately, as those cells are going to receive data again almost immediately and go back to an untrimmed state until the next trim occurs. If it's under-provisioned (so the device knows there are unused cells), the device would simply write to an empty cell and flag the old cell for erasing, so there should be no change. Latency would rise when sustained write rate exceeded the devices' ability to clear cells, so eventually the stock of ready cells would be depleted. FWIW, I think there is considerable mileage in the larger-consumer grade argument. Assuming drives will be half the price in a years time, so selecting devices that can last only a year is preferable to spending 3x the price on one that can survive three. That though opens the tin of worms that is SMART reporting and moving journals at some future point mind. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw questions
For #2, I just wrote a document on setting up a federated architecture. You can view it here: http://ceph.com/docs/master/radosgw/federated-config/ This functionality will be available in the Emperor release. The use case I described involved two zones in a master region talking to the same underlying Ceph Storage Cluster, but with different sets of pools for each zone. You can certainly set up pools for zones on completely different Ceph Storage Clusters. I assumed that was overkill, but you can certainly do it. See http://ceph.com/docs/master/radosgw/federated-config/#configure-a-master-region for configuring a master region. If you want to use separate storage clusters for each zone, you need to: 1. Setup the set of pools for each zone in the respective ceph storage cluster for your data center. 2. http://ceph.com/docs/master/radosgw/federated-config/#create-a-keyring should use different cluster names to ensure that the keyring gets populated in both Ceph Storage Clusters. We assume the default -c /etc/ceph/ceph.conf for simplicity. 3. http://ceph.com/docs/master/radosgw/federated-config/#add-instances-to-ceph-config-file when adding the instances to the Ceph configuration file, you need to note that the storage cluster might be named. For example, instead of ceph.conf, it might be us-west.conf and us-east.conf for the respective zones, assuming you are setting up Ceph clusters specifically to run the gateways--or whatever naming convention you already use. 4. Most of the usage examples omit the Ceph configuration file (-c file/path.conf) and the admin key (-k path/to/admin.keyring). You may need to specify them explicitly when calling radosgw-admin so that you are issuing commands to the right Ceph Storage Cluster. I'd love to get your feedback on the document! For #3. Yes. In fact, if you just setup a master region with one master zone, that works fine. You don't have to respect pool naming. Whatever you create in the storage cluster and map to a zone pool will work. However, I would suggest following the conventions as laid out in the document. You can create a garbage collection pool called lemonade, but you will probably confuse the community when looking for help as they will expect .{region-name}-{zone-name}.rgw.gc. If you just use region-zone.{pool-name-default}, like us-west.rgw.root most people in the community will understand any questions you have and can more readily help you with additional questions. On Wed, Nov 6, 2013 at 3:17 AM, Alessandro Brega alessandro.bre...@gmail.com wrote: Good day ceph users, I'm new to ceph but installation went well so far. Now I have a lot of questions regarding radosgw. Hope you don't mind... 1. To build a high performance yet cheap radosgw storage, which pools should be placed on ssd and which on hdd backed pools? Upon installation of radosgw, it created the following pools: .rgw, .rgw.buckets, .rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users, .users.email. 2. In order to have very high availability I like to setup two different ceph clusters, each in its own datacenter. How to configure radowsgw to make use of this layout? Can I have a multi-master setup with having a load balancer (or using geo-dns) which distributes the load to radosgw instances in both datacenters? 3. Is it possible to start with a simple setup now (only one ceph cluster) and later add the multi-datacenter redundancy described above without downtime? Do I have to respect any special pool-naming requirements? 4. Which number of replaction would you suggest? In other words, which replication is need to achive 99.9% durability like dreamobjects states? 5. Is it possible to map fqdn custom domain to buckets, not only subdomains? 6. The command radosgw-admin pool list returns could not list placement set: (2) No such file or directory. But radosgw seems to work as expected anyway? Looking forward to your suggestions. Alessandro Brega ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Running on disks that lose their head
Once I know a drive has had a head failure, do I trust that the rest of the drive isn't going to go at an inconvenient moment vs just fixing it right now when it's not 3AM on Christmas morning? (true story) As good as Ceph is, do I trust that Ceph is smart enough to prevent spreading corrupt data all over the cluster if I leave bad disks in place and they start doing terrible things to the data? I have a lot more disks than I have trust in disks. If a drive lost a head then I want it gone. I love the idea of using smart data but can foresee see some implementation issues. We have seen some raid configurations where polling smart will halt all raid operations momentarily. Also, some controllers require you to use their CLI tool to pool for smart vs smartmontools. It would be similarly awesome to embed something like an apdex score against each osd, especially if it factored in hierarchy to identify poor performing osds, nodes, racks, etc.. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] please help me.problem with my ceph
Hi 皓月, You can try ls -al /mnt/ceph , check if the current user have W/R access to the directory. Maybe you need to use chown to change the directory owner. Regards, Kai At 2013-11-06 22:03:31,皓月 suzhenh...@qq.com wrote: 1. I have installed ceph with one mon/mds and one osd.When i use 'ceph -s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) 2. i mount a client.'192.168.3.189:/ 100G 1009M 97G 2% /mnt/ceph' but i can't creat a file or a directory because of no permission. my conf is listed bellow.please tell my how to fix these problems,thanks ; ; Sample ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. ; If a 'host' is defined for a daemon, the init.d start/stop script will ; verify that it matches the hostname (or else ignore it). If it is ; not defined, it is assumed that the daemon is intended to start on ; the current host (e.g., in a setup with a startup.conf on each ; node). ; The variables $type, $id and $name are available to use in paths ; $type = The type of daemon, possible values: mon, mds and osd ; $id = The ID of the daemon, for mon.alpha, $id will be alpha ; $name = $type.$id ; For example: ; osd.0 ; $type = osd ; $id = 0 ; $name = osd.0 ; mon.beta ; $type = mon ; $id = beta ; $name = mon.beta ; global [global] ; enable secure authentication auth supported = cephx ; allow ourselves to open a lot of files max open files = 131072 ; set log file log file = /var/log/ceph/$name.log ; log_to_syslog = true; uncomment this line to log to syslog ; set up pid files pid file = /var/run/ceph/$name.pid ; If you want to run a IPv6 cluster, set this to true. Dual-stack isn't possible ;ms bind ipv6 = true ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data = /data/$name ; If you are using for example the RADOS Gateway and want to have your newly created ; pools a higher replication level, you can set a default osd pool default size = 1 ; You can also specify a CRUSH rule for new pools ; Wiki: http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH ;osd pool default crush rule = 0 ; Timing is critical for monitors, but if you want to allow the clocks to drift a ; bit more, you can specify the max drift. ;mon clock drift allowed = 1 ; Tell the monitor to backoff from this warning for 30 seconds ;mon clock drift warn backoff = 30 ; logging, for debugging monitor crashes, in order of ; their likelihood of being helpful :) ;debug ms = 1 ;debug mon = 20 ;debug paxos = 20 ;debug auth = 20 [mon.alpha] host = ca189 mon addr = 192.168.3.189:6789 ; mds ; You need at least one. Define two to get a standby. [mds] ; where the mds keeps it's secret encryption keys keyring = /data/keyring.$name ; mds logging to debug issues. ;debug ms = 1 ;debug mds = 20 [mds.alpha] host = ca189 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] ; This is where the osd expects its data osd data = /data/$name ; Ideally, make the journal a separate disk or partition. ; 1-10GB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/$name/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. osd journal = /data/$name/journal osd journal size = 1000 ; journal size, in megabytes ; If you want to run the journal on a tmpfs (don't), disable DirectIO ;journal dio = false ; You can change the number of recovery operations to speed up recovery ; or slow it down if your machines can't handle it ; osd recovery max active = 3 ; osd logging to debug osd issues, in order of likelihood of being ; helpful ;debug ms = 1 ;debug osd = 20 ;debug filestore = 20 ;debug journal = 20 ; ### The below options only apply if you're using mkcephfs ; ### and the devs options ; The filesystem used on the volumes osd mkfs type = btrfs ; If you want to specify some other mount options, you can do so. ; for other filesystems use 'osd mount options $fstype' osd mount options btrfs = rw,noatime ; The options used to format the filesystem via mkfs.$fstype ; for other filesystems use 'osd mkfs options $fstype' ; osd mkfs options btrfs = [osd.0] host = ca191 ; if 'devs' is not specified, you're responsible for ; setting up the 'osd data' dir. devs = /dev/mapper/vg_ca191-lv_ceph ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] please help me.problem with my ceph
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- boun...@lists.ceph.com] On Behalf Of ?? Sent: Wednesday, November 06, 2013 10:04 PM To: ceph-users Subject: [ceph-users] please help me.problem with my ceph 1. I have installed ceph with one mon/mds and one osd.When i use 'ceph - s',there si a warning:health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) I would think this is because Ceph defaults to a replication level of 2 and you only have one OSD (nowhere to write a second copy) so you are degraded? You could add a second OSD or perhaps you could set the replication level to 1? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw questions
1. To build a high performance yet cheap radosgw storage, which pools should be placed on ssd and which on hdd backed pools? Upon installation of radosgw, it created the following pools: .rgw, .rgw.buckets, .rgw.buckets.index, .rgw.control, .rgw.gc, .rgw.root, .usage, .users, .users.email. There is a lot that goes into high performance, a few questions come to mind: Do you want high performance reads, writes or both? How hot is your data, can you bet better performance from buying more memory for caching? What size objects do you expect to handle, how many per bucket? 4. Which number of replaction would you suggest? In other words, which replication is need to achive 99.9% durability like dreamobjects states? DreamObjects Engineer here, we used Ceph's durability modeling tools here: https://github.com/ceph/ceph-tools You will need to research your data disk's MTBF numbers and convert them to FITS, measure your OSD backfill MTTR and factor in your replication count. DreamObjects uses 3 replicas on enterprise SAS disks. The durability figures exclude black swan events like fires and other such datacenter or regional disasters, which is why having a second location is important for DR. 5. Is it possible to map fqdn custom domain to buckets, not only subdomains? You could map a domain's A/ records to an endpoint but if the endpoint changes your SOL, using a CNAME at the domain root violates DNS rfcs. Some DNS providers will fake a CNAME by doing a recursive lookup in response to an A/ request as a work around. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Running on disks that lose their head
Thanks, Mike Dawson Co-Founder Director of Cloud Architecture Cloudapt LLC 6330 East 75th Street, Suite 170 Indianapolis, IN 46250 On 11/7/2013 2:12 PM, Kyle Bader wrote: Once I know a drive has had a head failure, do I trust that the rest of the drive isn't going to go at an inconvenient moment vs just fixing it right now when it's not 3AM on Christmas morning? (true story) As good as Ceph is, do I trust that Ceph is smart enough to prevent spreading corrupt data all over the cluster if I leave bad disks in place and they start doing terrible things to the data? I have a lot more disks than I have trust in disks. If a drive lost a head then I want it gone. I love the idea of using smart data but can foresee see some implementation issues. We have seen some raid configurations where polling smart will halt all raid operations momentarily. Also, some controllers require you to use their CLI tool to pool for smart vs smartmontools. It would be similarly awesome to embed something like an apdex score against each osd, especially if it factored in hierarchy to identify poor performing osds, nodes, racks, etc.. Kyle, I think you are spot-on here. Apdex or similar scoring for gear performance is important for Ceph, IMO. Due to pseudo-random placement and replication, it can be quite difficult to identify 1) if hardware, software, or configuration are the cause of slowness, and 2) which hardware (if any) is slow. I recently discovered a method that seems address both points built. Zackc, Loicd, and I have been the main participants in a weekly Teuthology call the past few weeks. We've talked mostly about methods to extend Teuthology to capture performance metrics. Would you be willing to join us during the Teuthology and Ceph-Brag sessions at the Firefly Developer Summit? Cheers, Mike ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph Block Storage QoS
Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i. The problem might be SATA transport protocol overhead at the expander. Have you tried directly connecting the SSDs to SATA2/3 ports on the mainboard? -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Running on disks that lose their head
Zackc, Loicd, and I have been the main participants in a weekly Teuthology call the past few weeks. We've talked mostly about methods to extend Teuthology to capture performance metrics. Would you be willing to join us during the Teuthology and Ceph-Brag sessions at the Firefly Developer Summit? I'd be happy to! -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On 11/07/2013 08:42 PM, Gruher, Joseph R wrote: Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? I don't know if OpenStack supports it, but in CloudStack we recently implemented the I/O throttling mechanism of Qemu via libvirt. That might be a solution if OpenStack implements that as well? Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On Thu, Nov 7, 2013 at 11:50 PM, Wido den Hollander w...@42on.com wrote: On 11/07/2013 08:42 PM, Gruher, Joseph R wrote: Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? I don't know if OpenStack supports it, but in CloudStack we recently implemented the I/O throttling mechanism of Qemu via libvirt. That might be a solution if OpenStack implements that as well? Just a side note - current QEMU implements more gentle throttling than a rest of the versions and it is very useful thing for NBD I/O burst handling. Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage
I've looked around but could not find it. Can I open a ticket for this issue? Not being able to enumerate users via API is a road block for me and I'd like to work and get it resolved. Thanks. -- Nelson Jeppesen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage
You can do it through the metadata api. Try doing something like: GET /admin/metadata/user Yehuda On Thu, Nov 7, 2013 at 12:06 PM, Nelson Jeppesen nelson.jeppe...@gmail.com wrote: I've looked around but could not find it. Can I open a ticket for this issue? Not being able to enumerate users via API is a road block for me and I'd like to work and get it resolved. Thanks. -- Nelson Jeppesen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Can someone please help me here?
I can't install Ubuntu... I am not sure why would it do on a new install of CentOS. I wanted to try this to if I can take it as RBD/Radosgw backend for OpenStack production but I can't believe it has taken forever to get it running and I am not there yet! -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra Sent: Wednesday, November 06, 2013 4:45 PM To: ja...@peacon.co.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy Unfortunately, I don't have that luxury. Thanks! Narendra -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ja...@peacon.co.uk Sent: Wednesday, November 06, 2013 4:43 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy I also had some difficulty with ceph-deploy on CentOS. I eventually moved to Ubuntu 13.04 - and haven't looked back. On 2013-11-06 21:35, Trivedi, Narendra wrote: Hi All, I did a fresh install of Ceph (this might be like 10th or 11th install) on 4 new VMs (one admin, one MON and two OSDs) built from CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
I think this is a great idea. One of the big questions users have is what kind of hardware should I buy. An easy way for users to publish information about their setup (hardware, software versions, use-case, performance) when they have successful deployments would be very valuable. Maybe a section of wiki? It would be interesting to a site where a Ceph admin can download an API key/package that could be optionally installed and report configuration information to a community API. The admin could then supplement/correct that base information. Having much of the data collection be automated lowers the barrier for contribution. Bonus points if this could be extended to SMART and failed drives so we could have a community generated report similar to Google's disk population study they presented at FAST'07. -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to use Admin Ops API in Ceph Object Storage
Sweet, thanks! I had to add --caps=metadata=read ,but it worked great. On Thu, Nov 7, 2013 at 12:11 PM, Yehuda Sadeh yeh...@inktank.com wrote: You can do it through the metadata api. Try doing something like: GET /admin/metadata/user Yehuda On Thu, Nov 7, 2013 at 12:06 PM, Nelson Jeppesen nelson.jeppe...@gmail.com wrote: I've looked around but could not find it. Can I open a ticket for this issue? Not being able to enumerate users via API is a road block for me and I'd like to work and get it resolved. Thanks. -- Nelson Jeppesen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Nelson Jeppesen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] deployment architecture practices / new ideas?
Am 06.11.2013 15:05, schrieb Gautam Saxena: We're looking to deploy CEPH on about 8 Dell servers to start, each of which typically contain 6 to 8 harddisks with Perc RAID controllers which support write-back cache (~512 MB usually). Most machines have between 32 and 128 GB RAM. Our questions are as follows. Please feel free to comment on even just one of the questions below if that's the area of your expertise/interest. 1. Based on various best practice guides, they suggest putting the OS on a separate disk. But, we though that would not be good because we'd sacrifice a whole disk on each machine (~3 TB) or even two whole disks (~6 TB) if we did a hardware RAID 1 on it. So, do people normally just sacrifice one whole disk? Specifically, we came up with this idea: 1. We set up all hard disks as pass-through in the raid controller, so that the RAID controller's cache is still in effect, but the OS sees just a bunch of disks (6 to 8 in our case) 2. We then do a SOFTWARE-baised RAID 1 (using Centos 6.4) for the OS across all 6 to 8 hardisks 3. We then do a SOFTWARE-baised RAID 0 (using Centos 6.4) for the SWAP space. 4. *Does anyone see any flaws in our idea above? We think that RAID 1 is not computationally expensive for the machines to computer, and most of the time, the OS should be in RAM. Similarly, we think RAID 0 should be easy for the CPU to compute, and hopefully, we won't hit much SWAP if we have enough RAM. And this way, we don't sacrific 1 or 2 whole disks for just the OS.* Why not simply using smaller disks for the system? That is what we do: use e.g. 500G 2.5 disks (e.g. WB VelociRaptor) for the root system, if needed put these disks into a RAID1 HW-Raid. I would always prefer hw- oder sw raid. You could use the not needed space on these disks also for the OSD journals if you use HDDs with 10.000 RPM. 2. Based on the performance benchmark blog of Marc Nelson ( http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/), has anything substantially changed since then? Specifically, it suggests that SSDs may not be really necessary if one has raid controllers with write-back cache. Is this still true even though the article was written with a version of CEPH that was over 1 year old? (Marc suggests that things may change with newer versions of CEPH) 3. Based on our understanding, it would seem that CEPH can deliver very high throughput performance (especially for reads) if dozens and dozeons of hard disks are being accessed simultaneously across multiple machines. So, we could have several GBs throughput, right? (CEPH never advertises the advantage of read throughput with distributed architecture, so I'm wondering if I'm missing something.) If so, then is it reasonable to assume that one common bottleneck is the ethernet? So if we only use 1 NIC card at 1 GBs, that'll be a major bottleneck? If so, we're thinking of trying to bond multiple 1 GB/s ethernet cards to make a bonded ethernet connection of 4 GBs (4 * 1 GB/s). Even these 4GB could be easily your bottleneck. That depends on you workload. Especially if you use separated networks for the clients and the cluster OSD backend (replication, backfill, recovery). But we didn't see anyone discuss this strategy? Is there any holes in it? Or does CEPH automatically take advantage of multiple NIC cards without us having to deal with the complexity (and expense of buying a new switch which supports bonding) for doing bonding? That is, is it possible and a good idea to have CEPH OSDs be set up to use specific NICs, so that we spread the load? (We read through the recommendation of having different NICs for front-end traffic vs back-end traffic, but we're not worried about network attacks -- so we're thinking that just creating a big fat ethernet pipe gives us the most flexibility.) Depending on your budget it may makes sense to use 10G cards instead. Separated traffic networkss isn't only about DDoS it's also to make sure your replication traffic doesn't affect your client traffic and vice versa. Danny ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cinder-volume rbd driver assumes that image is raw without checking
It appears that the RBD driver in Cinder only checks that the image is accessible, and if it is, assumes it is cloneable, regardless of its format. I think it would be more useful if the driver also confirmed the image format, and reverted to straight copy instead of copy-on-write if the format is anything else but raw. Background: https://bugs.launchpad.net/fuel/+bug/1246219 Thoughts? -- Dmitry Borodaenko ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster performance
I have 2 SSDs (same model, smaller capacity) for / connected on the mainboard. Their sync write performance is also poor - less than 600 iops, 4k blocks. On Nov 7, 2013, at 9:44 PM, Kyle Bader kyle.ba...@gmail.com wrote: ST240FN0021 connected via a SAS2x36 to a LSI 9207-8i. The problem might be SATA transport protocol overhead at the expander. Have you tried directly connecting the SSDs to SATA2/3 ports on the mainboard? -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Can someone please help me here?
On Thu, Nov 7, 2013 at 3:25 PM, Trivedi, Narendra narendra.triv...@savvis.com wrote: I can't install Ubuntu... I am not sure why would it do on a new install of CentOS. I wanted to try this to if I can take it as RBD/Radosgw backend for OpenStack production but I can't believe it has taken forever to get it running and I am not there yet! -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trivedi, Narendra Sent: Wednesday, November 06, 2013 4:45 PM To: ja...@peacon.co.uk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy Unfortunately, I don't have that luxury. Thanks! Narendra -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ja...@peacon.co.uk Sent: Wednesday, November 06, 2013 4:43 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Manual Installation steps without ceph-deploy I also had some difficulty with ceph-deploy on CentOS. It would be useful to know what didn't work for you so we can improve it, even if it is in the form of better/more docs. I eventually moved to Ubuntu 13.04 - and haven't looked back. On 2013-11-06 21:35, Trivedi, Narendra wrote: Hi All, I did a fresh install of Ceph (this might be like 10th or 11th install) on 4 new VMs (one admin, one MON and two OSDs) built from CentOS 6.4 (x64)... it seems I am jinxed with ceph-deploy. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This message contains information which may be confidential and/or privileged. Unless you are the intended recipient (or authorized to receive for the intended recipient), you may not read, use, copy or disclose to anyone the message or any information contained in the message. If you have received the message in error, please advise the sender by reply e-mail and delete the message and any attachment(s) thereto without retaining any copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
On 08/11/2013 04:57, Kyle Bader wrote: I think this is a great idea. One of the big questions users have is what kind of hardware should I buy. An easy way for users to publish information about their setup (hardware, software versions, use-case, performance) when they have successful deployments would be very valuable. Maybe a section of wiki? It would be interesting to a site where a Ceph admin can download an API key/package that could be optionally installed and report configuration information to a community API. The admin could then supplement/correct that base information. Having much of the data collection be automated lowers the barrier for contribution. Bonus points if this could be extended to SMART and failed drives so we could have a community generated report similar to Google's disk population study they presented at FAST'07. Would this be something like http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ? Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Running on disks that lose their head
On 2013-11-06 09:33, Sage Weil wrote: On Wed, 6 Nov 2013, Loic Dachary wrote: Hi Ceph, People from Western Digital suggested ways to better take advantage of the disk error reporting... when one head out of ten fails : disks can keep working with the nine remaining heads. Losing 1/10 of the disk is likely to result in a full re-install of the Ceph osd. But, again, the disk could keep going after that, with 9/10 of its original capacity. And Ceph is good at handling osd failures. Yeah...but if you lose 1/10 of a block device any existing local file system is going to blow up. I suspet this is something that newgangled interfaces like Kinetic will be much better at I found some info on this at last in the SATA-IO 3.2 Spec, seemingly it's more to with RAID rebuilds: Rebuild Assist: when a drive in a RAID configuration fails due to excessive data errors, it is possible to reconstruct the data from the failed drive from the remaining drives – this is called a Rebuild. The Rebuild Assist function speeds up the rebuild process by quickly recognizing which data on the failed drive is unreadable Source: https://www.sata-io.org/sites/default/files/images/SATA-IO%20FAQ%20-%20071813a%20%283%29.pdf There is also some interesting info on SSHDs: SSHD Optimization: a Solid State Hybrid Drive (SSHD) is an HDD that contains some amount of Flash memory, thus increasing the performance of the drive. The Hybrid Information feature provides a mechanism wherein a host can tell the drive which data to cache, further enhancing the performance of the SSHD. In today’s SATA drives, reading and writing log data required the use of non-queued commands, impacting overall system performance, especially SSHDs. A new feature in v3.2 allows such commands to be queued, minimizing the impact on performance But it seems the manufacturers are taking a different path; the new Seagate 3.5 Hybrid drives don't even support the ATA-8 NV Cache feature set unfortunately, according to the product manual: http://www.seagate.com/files/staticfiles/support/docs/manual/desktop%20sshd/100726566.pdf ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
Would this be something like http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ? Something very much like that :) -- Kyle ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
Hi, It looks like there indeed is enough interest to move forward :-) The next action items would be : * Setup a home page somewhere ( should it be a separate web site or could we simply take over http://ceph.com/ ? ) * Create the About page describing the User Committee and get a consensus from interested parties on its goal and scope. * Create an Event section to record meetups / conferences schedules ( http://ceph.com/event/ ? ) * Collect use cases ( publish them under http://ceph.com/community/blog/ ? elsewhere ? ) * Schedule a User Committee session for the next CDS Cloudwatt kindly agreed to assign someone ( starting next week ) to help with logistics to organize events. For instance I'll be talking about Ceph in two weeks in Toulouse http://capitoledulibre.org/ : I will happily distribute goodies, if they are there. But if I have to acquire them ... It will be even more useful to organize a Ceph booth during FOSDEM or even ( too short notice maybe ? ) for the Cloudstack event in Amsterdam. I'm willing to take action on setting up the home page, the about page and collecting use cases. Anyone willing to work on the other two ( event page, CDS session ) ? Or have ideas for even more action items ? Maybe organizing a local meetup ( the Cloudwatt person could help with that too, as long as she is told where and when ) ? Cheers P.S. If you're in Berlin november 30th 2013, we're having a Ceph oriented friendly poker game at http://c-base.org/ starting 7pm. I'm not sure that counts as an event but it's definitely an opportunity to discuss Ceph ;-) On 07/11/2013 01:35, Loic Dachary wrote: Hi Ceph, I would like to open a discussion about organizing a Ceph User Committee. We briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil today during the OpenStack summit. A pad was created and roughly summarizes the idea: http://pad.ceph.com/p/user-committee If there is enough interest, I'm willing to devote one day a week working for the Ceph User Committee. And yes, that includes sitting at the Ceph booth during the FOSDEM :-) And interviewing Ceph users and describing their use cases, which I enjoy very much. But also contribute to a user centric roadmap, which is what ultimately matters for the company I work for. If you'd like to see this happen but don't have time to participate in this discussion, please add your name + email at the end of the pad. What do you think ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Havana RBD - a few problems
On 11/08/2013 12:15 AM, Jens-Christian Fischer wrote: Hi all we have installed a Havana OpenStack cluster with RBD as the backing storage for volumes, images and the ephemeral images. The code as delivered in https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L498 fails, because the RBD.path it not set. I have patched this to read: Using libvirt_image_type=rbd to replace ephemeral disks is new with Havana, and unfortunately some bug fixes did not make it into the release. I've backported the current fixes on top of the stable/havana branch here: https://github.com/jdurgin/nova/tree/havana-ephemeral-rbd * @@ -419,10 +419,12 @@ class Rbd(Image): * if path: * try: * self.rbd_name = path.split('/')[1] * + self.path = path * except IndexError: * raise exception.InvalidDevicePath(path=path) * else: * self.rbd_name = '%s_%s' % (instance['name'], disk_name) * + self.path = 'volumes/%s' % self.rbd_name * self.snapshot_name = snapshot_name * if not CONF.libvirt_images_rbd_pool: * raise RuntimeError(_('You should specify' but am not sure this is correct. I have the following problems: 1) can't inject data into image 2013-11-07 16:59:25.251 24891 INFO nova.virt.libvirt.driver [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] [instance: 2fa02e4f-f804-4679-9507-736eeebd9b8d] Injecting key into image fc8179d4-14f3-4f21-a76d-72b03b5c1862 2013-11-07 16:59:25.269 24891 WARNING nova.virt.disk.api [req-f813ef24-de7d-4a05-ad6f-558e27292495 c66a737acf0545fdb9a0a920df0794d9 2096e25f5e814882b5907bc5db342308] Ignoring error injecting data into image (Error mounting volumes/ instance- 0089_disk with libguestfs (volumes/instance-0089_disk: No such file or directory)) possibly the self.path = … is wrong - but what are the correct values? Like Dinu mentioned, I'd suggest disabling file injection and using the metadata service + cloud-init instead. We should probably change nova to log an error about this configuration when ephemeral volumes are rbd. 2) Creating a new instance from an ISO image fails completely - no bootable disk found, says the KVM console. Related? This sounds like a bug in the ephemeral rbd code - could you file it in launchpad if you can reproduce with file injection disabled? I suspect it's not being attached as a cdrom. 3) When creating a new instance from an image (non ISO images work), the disk is not resized to the size specified in the flavor (but left at the size of the original image) This one is fixed in the backports already. I would be really grateful, if those people that have Grizzly/Havana running with an RBD backend could pipe in here… You're seeing some issues in the ephemeral rbd code, which is new in Havana. None of these affect non-ephemeral rbd, or Grizzly. Thanks for reporting them! Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
Thanks for floating this out there Loic! A few thoughts inline below: On Thu, Nov 7, 2013 at 6:45 PM, Loic Dachary l...@dachary.org wrote: Hi, It looks like there indeed is enough interest to move forward :-) The next action items would be : * Setup a home page somewhere ( should it be a separate web site or could we simply take over http://ceph.com/ ? ) I think the best idea would be to use Ceph.com. I don't have a problem with giving certain people access to help curate content and clean up site design during their tenure on the user committee. Ceph.com really belongs to the community anyway (and is in need of some love). * Create the About page describing the User Committee and get a consensus from interested parties on its goal and scope. Definitely. I'd like to have a place to define the what, keep a running tab of the who, and document some of the how. * Create an Event section to record meetups / conferences schedules ( http://ceph.com/event/ ? ) This is probably best as a wiki page so that anyone can add meetups. * Collect use cases ( publish them under http://ceph.com/community/blog/ ? elsewhere ? ) I'm guessing this starts as a wiki page for collection and can probably be curated into Ceph.com. * Schedule a User Committee session for the next CDS I can guarantee it. Cloudwatt kindly agreed to assign someone ( starting next week ) to help with logistics to organize events. For instance I'll be talking about Ceph in two weeks in Toulouse http://capitoledulibre.org/ : I will happily distribute goodies, if they are there. But if I have to acquire them ... It will be even more useful to organize a Ceph booth during FOSDEM or even ( too short notice maybe ? ) for the Cloudstack event in Amsterdam. Awesome, logistics are often the most difficult part. I'm guessing part of the user committee is going to be interfacing with corps (like Inktank and others) who are willing to support the efforts of the community. I'm willing to take action on setting up the home page, the about page and collecting use cases. Anyone willing to work on the other two ( event page, CDS session ) ? Or have ideas for even more action items ? Maybe organizing a local meetup ( the Cloudwatt person could help with that too, as long as she is told where and when ) ? I'm happy to make sure the CDS session is set up and can facilitate getting some folks access to ceph.com once we have a few volunteers that raise their hands for web dev/content work. Cheers P.S. If you're in Berlin november 30th 2013, we're having a Ceph oriented friendly poker game at http://c-base.org/ starting 7pm. I'm not sure that counts as an event but it's definitely an opportunity to discuss Ceph ;-) Awesome. Wish I was a wee bit closer. The only other thing I want to add here is that while I think it's important to outline a few things to give it structure (we don't want folks to just flounder around and get frustrated), I also don't want this to be extremely heavy-handed. The more that we can keep this a lightweight group the better off we'll be. For now I'm happy to keep this ad hoc and try to coordinate things around ceph.com; with the expectation being that we come out of CDS with a solid plan for moving forward. Best Regards, Patrick On 07/11/2013 01:35, Loic Dachary wrote: Hi Ceph, I would like to open a discussion about organizing a Ceph User Committee. We briefly discussed the idea with Ross Turk, Patrick McGarry and Sage Weil today during the OpenStack summit. A pad was created and roughly summarizes the idea: http://pad.ceph.com/p/user-committee If there is enough interest, I'm willing to devote one day a week working for the Ceph User Committee. And yes, that includes sitting at the Ceph booth during the FOSDEM :-) And interviewing Ceph users and describing their use cases, which I enjoy very much. But also contribute to a user centric roadmap, which is what ultimately matters for the company I work for. If you'd like to see this happen but don't have time to participate in this discussion, please add your name + email at the end of the pad. What do you think ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On 11/08/2013 03:50 AM, Wido den Hollander wrote: On 11/07/2013 08:42 PM, Gruher, Joseph R wrote: Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? I don't know if OpenStack supports it, but in CloudStack we recently implemented the I/O throttling mechanism of Qemu via libvirt. That might be a solution if OpenStack implements that as well? Indeed, that was implemented in OpenStack Havana. I think the docs haven't been updated yet, but one of the related blueprints is: https://blueprints.launchpad.net/cinder/+spec/pass-ratelimit-info-to-nova Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw-agent failed to sync object
On 11/07/2013 09:48 AM, lixuehui wrote: Hi all : After we build a region with two zones distributed in two ceph cluster.Start the agent ,it start works! But what we find in the radosgw-agent stdout is that it failed to sync objects all the time .Paste the info: (env)root@ceph-rgw41:~/myproject# ./radosgw-agent -c cluster-data-sync.conf -q region map is: {u'us': [u'us-west', u'us-east']} ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: state is error ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: state is error ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: state is error ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: state is error ERROR:radosgw_agent.worker:failed to sync object new-east-bucket/new-east.json: state is error Metadata has already been copied form the master zone.I'd like to know the reason ,and what the'state is error 'mean! This means the destination radosgw failed to fetch to the object from the source radosgw. Does the system user from the secondary zone exist in the master zone? If you enable 'debug rgw=30' for both radosgw and share the logs we can see why the sync is failing. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On Fri, Nov 8, 2013 at 9:31 AM, Josh Durgin josh.dur...@inktank.com wrote: On 11/08/2013 03:50 AM, Wido den Hollander wrote: On 11/07/2013 08:42 PM, Gruher, Joseph R wrote: Is there any plan to implement some kind of QoS in Ceph? Say I want to provide service level assurance to my OpenStack VMs and I might have to throttle bandwidth to some to provide adequate bandwidth to others - is anything like that planned for Ceph? Generally with regard to block storage (rbds), not object or filesystem. Or is there already a better way to do this elsewhere in the OpenStack cloud? I don't know if OpenStack supports it, but in CloudStack we recently implemented the I/O throttling mechanism of Qemu via libvirt. That might be a solution if OpenStack implements that as well? Indeed, that was implemented in OpenStack Havana. I think the docs haven't been updated yet, but one of the related blueprints is: https://blueprints.launchpad.net/cinder/+spec/pass-ratelimit-info-to-nova Yes, it seemed lack of necessary docs to guide users. I just list commands below to help users to understand: cinder qos-create high_read_low_write consumer=front-end read_iops_sec=1000 write_iops_sec=10 cinder type-create type1 cinder qos-associate [qos-spec-id] [type-id] cinder create --display-name high-read-low-write-volume --volume-type type1 100 nova volume-attach vm-1 high-read-low-write-volume /dev/vdb Thanks, Joe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Best Regards, Wheat ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph User Committee
Hi Loic, On 08.11.2013 00:19, Loic Dachary wrote: On 08/11/2013 04:57, Kyle Bader wrote: I think this is a great idea. One of the big questions users have is what kind of hardware should I buy. An easy way for users to publish information about their setup (hardware, software versions, use-case, performance) when they have successful deployments would be very valuable. Maybe a section of wiki? It would be interesting to a site where a Ceph admin can download an API key/package that could be optionally installed and report configuration information to a community API. The admin could then supplement/correct that base information. Having much of the data collection be automated lowers the barrier for contribution. Bonus points if this could be extended to SMART and failed drives so we could have a community generated report similar to Google's disk population study they presented at FAST'07. Would this be something like http://wiki.ceph.com/01Planning/02Blueprints/Firefly/Ceph-Brag ? It seems that all eyes are looking in the same or very close directions :-) Sage initially said wiki page per reference setup - outlined overview of the context, specifics (e.g. defaults overrides and their reasoning), possibly essential notes on some regular maintenance activities, etc. In summary: the minimal readme or receipt enough for an admin to adapt and replicate a proven setup. Publishing of few concrete deployments in this form doesn't need any development and will generate positive effect immediately - I'm doing setup based on {wiki-page} with ... (differences), but ... You (Loic) are developing on the practical basis for scaling all of this at large: Convenient ceph-brag tool and online service - collecting of detailed snapshot of the setup as it is visible from a Ceph node. Kyle combines the two, saying: application of the collecting tool followed by handcrafted shaping, linking and annotations before/after publishing. Personally, I most like Kyle's workflow - iterations of: tool based collection - results in new version in the tool branch; applying fixes trough the web editor - merging handcrafted defs branch; publishing/communication. Once the working prototype goes live, various derivatives could be considered, e.g.: * Nice, possibly interactive diagrams (visual documentation) of the setup. * Standard reports with anchors for referencing in the mails. * Side projects for build and maintenance artifacts generation for various management platforms - ceph-deploy or different (of course assuming rejoining-back the private bits) * View/Report aiming extracting the essentials, roughly equivalent to the handcrafted Ceph setup receipt for the context. Regards, Alek ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Pool without a name, how to remove it?
I don't remember how this has come up or been dealt with in the past, but I believe it has been. Have you tried just doing it via the ceph or rados CLI tools with an empty pool name? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Nov 5, 2013 at 6:58 AM, Wido den Hollander w...@42on.com wrote: Hi, On a Ceph cluster I have a pool without a name. I have no idea how it got there, but how do I remove it? pool 14 '' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 158 owner 18446744073709551615 Is there a way to remove a pool by it's ID? I couldn't find anything in librados do to so. -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] near full osd
It sounds like maybe your PG counts on your pools are too low and so you're just getting a bad balance. If that's the case, you can increase the PG count with ceph osd pool name set pgnum higher value. OSDs should get data approximately equal to node weight/sum of node weights, so higher weights get more data and all its associated traffic. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Nov 5, 2013 at 8:30 AM, Kevin Weiler kevin.wei...@imc-chicago.com wrote: All of the disks in my cluster are identical and therefore all have the same weight (each drive is 2TB and the automatically generated weight is 1.82 for each one). Would the procedure here be to reduce the weight, let it rebal, and then put the weight back to where it was? -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.wei...@imc-chicago.com From: Aronesty, Erik earone...@expressionanalysis.com Date: Tuesday, November 5, 2013 10:27 AM To: Greg Chavez greg.cha...@gmail.com, Kevin Weiler kevin.wei...@imc-chicago.com Cc: ceph-users@lists.ceph.com ceph-users@lists.ceph.com Subject: RE: [ceph-users] near full osd If there’s an underperforming disk, why on earth would more data be put on it? You’d think it would be less…. I would think an overperforming disk should (desirably) cause that case,right? From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Greg Chavez Sent: Tuesday, November 05, 2013 11:20 AM To: Kevin Weiler Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] near full osd Kevin, in my experience that usually indicates a bad or underperforming disk, or a too-high priority. Try running ceph osd crush reweight osd.## 1.0. If that doesn't do the trick, you may want to just out that guy. I don't think the crush algorithm guarantees balancing things out in the way you're expecting. --Greg On Tue, Nov 5, 2013 at 11:11 AM, Kevin Weiler kevin.wei...@imc-chicago.com wrote: Hi guys, I have an OSD in my cluster that is near full at 90%, but we're using a little less than half the available storage in the cluster. Shouldn't this be balanced out? -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.wei...@imc-chicago.com The information in this e-mail is intended only for the person or entity to which it is addressed. It may contain confidential and /or privileged material. If someone other than the intended recipient should receive this e-mail, he / she shall not be entitled to read, disseminate, disclose or duplicate it. If you receive this e-mail unintentionally, please inform us immediately by reply and then delete it from your system. Although this information has been compiled with great care, neither IMC Financial Markets Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments. In the event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delete this message and any attachments. Messages and attachments are scanned for all known viruses. Always scan attachments before opening them. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com The information in this e-mail is intended only for the person or entity to which it is addressed. It may contain confidential and /or privileged material. If someone other than the intended recipient should receive this e-mail, he / she shall not be entitled to read, disseminate, disclose or duplicate it. If you receive this e-mail unintentionally, please inform us immediately by reply and then delete it from your system. Although this information has been compiled with great care, neither IMC Financial Markets Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccuracies in this information or for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail or its attachments. In the event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delete this message and any attachments. Messages and attachments are scanned for all known viruses. Always scan attachments before opening them. ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] About memory usage of ceph-mon on arm
I don't think this is anything we've observed before. Normally when a Ceph node is using more memory than its peers it's a consequence of something in that node getting backed up. You might try looking at the perf counters via the admin socket and seeing if something about them is different between your ARM and AMD processors. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Nov 5, 2013 at 7:21 AM, Yu Changyuan rei...@gmail.com wrote: Finally, my tiny ceph cluster get 3 monitors, newly added mon.b and mon.c both running on cubieboard2, which is cheap but still with enough cpu power(dual-core arm A7 cpu, 1.2G) and memory(1G). But compare to mon.a which running on an amd64 cpu, both mon.b and mon.c easily consume too much memory, so I want to know whether this is caused by memory leak. Below is the output of 'ceph tell mon.a heap stats' and 'ceph tell mon.c heap stats'(mon.c only start 12hr ago, while mon.a already running for more than 10 days) mon.atcmalloc heap stats: MALLOC:5480160 (5.2 MiB) Bytes in use by application MALLOC: + 28065792 ( 26.8 MiB) Bytes in page heap freelist MALLOC: + 15242312 ( 14.5 MiB) Bytes in central cache freelist MALLOC: + 10116608 (9.6 MiB) Bytes in transfer cache freelist MALLOC: + 10432216 (9.9 MiB) Bytes in thread cache freelists MALLOC: + 1667224 (1.6 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 71004312 ( 67.7 MiB) Actual memory used (physical + swap) MALLOC: + 57540608 ( 54.9 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: =128544920 ( 122.6 MiB) Virtual address space used MALLOC: MALLOC: 4655 Spans in use MALLOC: 34 Thread heaps in use MALLOC: 8192 Tcmalloc page size Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the mon.ctcmalloc heap stats: MALLOC: 175861640 ( 167.7 MiB) Bytes in use by application MALLOC: + 2220032 (2.1 MiB) Bytes in page heap freelist MALLOC: + 1007560 (1.0 MiB) Bytes in central cache freelist MALLOC: + 2871296 (2.7 MiB) Bytes in transfer cache freelist MALLOC: + 4686000 (4.5 MiB) Bytes in thread cache freelists MALLOC: + 2758880 (2.6 MiB) Bytes in malloc metadata MALLOC: MALLOC: =189405408 ( 180.6 MiB) Actual memory used (physical + swap) MALLOC: +0 (0.0 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: =189405408 ( 180.6 MiB) Virtual address space used MALLOC: MALLOC: 3445 Spans in use MALLOC: 14 Thread heaps in use MALLOC: 8192 Tcmalloc page size Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the The ceph versin is 0.67.4, compiled with tcmalloc enabled, gcc(armv7a-hardfloat-linux-gnueabi-gcc) version 4.7.3 and I also try to dump heap, but I can not find anything useful, below is a recent dump, output by command pprof --text /usr/bin/ceph-mon mon.c.profile.0021.heap. What extra step should I take to make the dump more meaningful? Using local file /usr/bin/ceph-mon. Using local file mon.c.profile.0021.heap. Total: 149.3 MB 146.2 97.9% 97.9%146.2 97.9% b6a7ce7c 1.4 0.9% 98.9% 1.4 0.9% std::basic_string::_Rep::_S_create ??:0 1.4 0.9% 99.8% 1.4 0.9% 002dd794 0.1 0.1% 99.9% 0.1 0.1% b6a81170 0.1 0.1% 99.9% 0.1 0.1% b6a80894 0.0 0.0% 100.0% 0.0 0.0% b6a7e2ac 0.0 0.0% 100.0% 0.0 0.0% b6a81410 0.0 0.0% 100.0% 0.0 0.0% 00367450 0.0 0.0% 100.0% 0.0 0.0% 001d4474 0.0 0.0% 100.0% 0.0 0.0% 0028847c 0.0 0.0% 100.0% 0.0 0.0% b6a7e8d8 0.0 0.0% 100.0% 0.0 0.0% 0020c80c 0.0 0.0% 100.0% 0.0 0.0% 0028bd20 0.0 0.0% 100.0% 0.0 0.0% b6a63248 0.0 0.0% 100.0% 0.0 0.0% b6a83478 0.0 0.0% 100.0% 0.0 0.0% b6a806f0 0.0 0.0% 100.0% 0.0 0.0% 002eb8b8 0.0 0.0% 100.0% 0.0 0.0% 0024efb4 0.0 0.0% 100.0% 0.0 0.0% 0027e550 0.0 0.0% 100.0% 0.0 0.0% b6a77104 0.0 0.0% 100.0% 0.0 0.0% _dl_mcount ??:0 0.0 0.0% 100.0% 0.0 0.0% 003673ec 0.0 0.0% 100.0% 0.0 0.0% b6a7a91c 0.0 0.0% 100.0% 0.0 0.0% 00295e44 0.0 0.0% 100.0%
Re: [ceph-users] RBD back
On Thu, Nov 7, 2013 at 1:26 AM, lixuehui lixue...@chinacloud.com.cn wrote: Hi all Ceph Object Store service can spans geographical locals . Now ceph also provides FS and RBD .IF our applications need the RBD service .Can we provide backup and disaster recovery for it via gateway through some transfermation ? In fact the cluster stored RBD data as objects in pools(default rbd), for another words, can we accomplish that backup some pools in a ceph cluster (without s3 )via the gateway . There's not any way to do raw RADOS geo-replication at this time. You *can* do RBD disaster recovery using snapshots and incremental snapshot exports, though. Check out the export-diff/import-diff and associated commands in the rbd tool. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Kernel Panic / RBD Instability
Well, as you've noted you're getting some slow requests on the OSDs when they turn back on; and then the iSCSI gateway is panicking (probably because the block device write request is just hanging). We've gotten prior reports that iSCSI is a lot more sensitive to a few slow requests than most use cases, and OSDs coming back in can cause some slow requests, but if it's a common case for you then there's probably something that can be done to optimize that recovery. Have you checked into what's blocking the slow operations or why the PGs are taking so long to get ready? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Nov 5, 2013 at 1:33 AM, James Wilkins james.wilk...@fasthosts.com wrote: Hello, Wondering if anyone else has come over an issue we're having with our POC CEPH Cluster at the moment. Some details about its setup; 6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics 4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB Nics 3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel ISCSI) - 4 x 10GB Nics. An RBD device is mounted and exported via targetcli, this is then mounted on a client device to push backup data. All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4 Machines are split over two racks (distinct layer 2 domains) using a leaf/spine model and we use ECMP/quagga on the ISCSI heads to reach the CEPH Cluster. Crush map has racks defined to spread data over 2 racks - I've attached the ceph.conf The cluster performs great normally, and we only have issues when simulating rack failure. The issue comes when the following steps are taken o) Initiate load against the cluster (backups going via ISCSI) o) ceph osd set noout o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the nodes in 1 Rack) o) Cluster goes degraded, as expected cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4 health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs stuck unclean; recovery 1286582/2700870 degraded (47.636%); 108/216 in osds are down; noout flag(s) set monmap e3: 5 mons at {fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0}, election epoch 74, quorum 0,1,2,3,4 fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ceph01-mon-04,fh-ceph01-mon-05 osdmap e4237: 216 osds: 108 up, 216 in pgmap v117686: 7328 pgs: 266 active+clean, 6 stale+active+clean, 6780 active+degraded, 276 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 1286582/2700870 degraded (47.636%) mdsmap e1: 0/0/1 up 2013-11-05 08:51:44.830393 mon.0 [INF] pgmap v117685: 7328 pgs: 1489 active+clean, 1289 stale+active+clean, 3215 active+degraded, 1335 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 1048742/2700870 degraded (38.830%); recovering 7 o/s, 28969KB/s o) As OSDS start returning 2013-11-05 08:52:42.019295 mon.0 [INF] osd.165 172.17.11.9:6864/6074 boot 2013-11-05 08:52:42.023055 mon.0 [INF] osd.154 172.17.11.9:6828/5943 boot 2013-11-05 08:52:42.024226 mon.0 [INF] osd.159 172.17.11.9:6816/5820 boot 2013-11-05 08:52:42.031996 mon.0 [INF] osd.161 172.17.11.9:6856/6059 boot o) We then see some slow requests; 2013-11-05 08:53:11.677044 osd.153 [WRN] 6 slow requests, 6 included below; oldest blocked for 30.409992 secs 2013-11-05 08:53:11.677052 osd.153 [WRN] slow request 30.409992 seconds old, received at 2013-11-05 08:52:41.266994: osd_op(client.16010.1:13441679 rb.0.21ec.238e1f29.0012fa28 [write 2854912~4096] 3.516ef071 RETRY=-1 e4240) currently reached pg 2013-11-05 08:53:11.677056 osd.153 [WRN] slow request 30.423024 seconds old, received at 2013-11-05 08:52:41.253962: osd_op(client.15755.1:13437999 rb.0.21ec.238e1f29.0012fa28 [write 0~233472] 3.516ef071 RETRY=1 e4240) v4 currently reached pg o) A few minutes , the ISCSI heads start panicking Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664305] [ cut here ] Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664313] WARNING: at /build/buildd/linux-lts-raring-3.8.0/kernel/watchdog.c:246 wat chdog_overflow_callback+0x9a/0xc0() Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664315] Hardware name: PowerEdge R620 Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664317] Watchdog detected hard LOCKUP on cpu 6 Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664318] Modules linked in: ib_srpt(F) tcm_qla2xxx(F) tcm_loop(F) tcm_fc(F) iscsi_t arget_mod(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) target_core_mod(F) rbd(F) libceph(F) ipmi_devintf(F) ipm i_si(F) ipmi_msghandler(F) qla2xxx(F) libfc(F) scsi_transport_fc(F) scsi_tgt(F) configfs(F) dell_rbu(F) ib_iser(F) rdma_cm(F) ib_cm( F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F)
[ceph-users] radosgw-agent sync object:state is error
Hi list We deploied a master zone and a slave zone in two clusters to test the multi-locals backup . The radosgw-anget sync buckets successfully .We can find the same buckets info in the slave zone . But the running radosgw-agent throw out error info that objects failed to sync , just like that: ERROR:radosgw_agent.worker:failed to sync object bucket-test4/s3stor.py: state is error At first , I considered that it was owned to I forget to set the placement_pools parm in the zone configure. After correction , it goes on .Zone configure : { domain_root: .us-east.rgw.root, control_pool: .us-east.rgw.control, gc_pool: .us-east.rgw.gc, log_pool: .us-east.log, intent_log_pool: .us-east.intent-log, usage_log_pool: .us-east.usage, user_keys_pool: .us-east.users, user_email_pool: .us-east.users.email, user_swift_pool: .us-east.users.swift, user_uid_pool: .us-east.users.uid, system_key: { access_key: PSUXAQBOE0N60C0Y3QJ7, secret_key: l5peNL/nfTkAjl28uLw/WCKk2LSNa4hdS6VheJ6x}, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets} } ] } { domain_root: .us-west.rgw.root, control_pool: .us-west.rgw.control, gc_pool: .us-west.rgw.gc, log_pool: .us-west.log, intent_log_pool: .us-west.intent-log, usage_log_pool: .us-west.usage, user_keys_pool: .us-west.users, user_email_pool: .us-west.users.email, user_swift_pool: .us-west.users.swift, user_uid_pool: .us-west.users.uid, system_key: { access_key: WUHDCDMWBG4GMT9B7QL7, secret_key: RSaYh90tNIdaImcn9QoSyK\/EuIrZSeXdOoa6Fw7o}, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets} } ] } In slave zone, the .rgw.buckets's constent like this: .dir.us-east.4513.1 .dir.us-east.4513.2 .dir.us-east.4513.3 .dir.us-east.4513.4 There's nothing about objects in slave zone ,which stored in the master zone. The .rgw.buckets.index of slave zone is empty,while that of master zone contains some contents. .dir.default.4647.1 What the problem could it be? We wait for any suggestion from you ! lixuehui___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On 2013-11-08 03:20, Haomai Wang wrote: On Fri, Nov 8, 2013 at 9:31 AM, Josh Durgin josh.dur...@inktank.com wrote: I just list commands below to help users to understand: cinder qos-create high_read_low_write consumer=front-end read_iops_sec=1000 write_iops_sec=10 Does this have any normalisation of the IO units, for example to 8K or something? In VMware we have similar controls for ages but they're not useful, as a Windows server will through out 4MB IO's and skew all the metrics. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Running on disks that lose their head
when one head out of ten fails: disks can keep working with the nine remaining heads... some info on this at last in the SATA-IO 3.2 Spec... Rebuild Assist... Some info on the command set (SAS SATA implementations): http://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/tp620-1-1110us-reducing-raid-recovery.pdf ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Block Storage QoS
On 11/08/2013 03:13 PM, ja...@peacon.co.uk wrote: On 2013-11-08 03:20, Haomai Wang wrote: On Fri, Nov 8, 2013 at 9:31 AM, Josh Durgin josh.dur...@inktank.com wrote: I just list commands below to help users to understand: cinder qos-create high_read_low_write consumer=front-end read_iops_sec=1000 write_iops_sec=10 Does this have any normalisation of the IO units, for example to 8K or something? In VMware we have similar controls for ages but they're not useful, as a Windows server will through out 4MB IO's and skew all the metrics. I don't think it does any normalization, but you could have different limits for different volume types, and use one volume type for windows and one volume type for non-windows. This might not make sense for all deployments, but it may be a usable workaround for that issue. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com