[ceph-users] 1 mons down, ceph-create-keys
Hello Could you help me please ceph status cluster 4da1f6d8-ca10-4bfa-bff7-c3c1cdb3f888 health HEALTH_WARN 229 pgs peering; 102 pgs stuck inactive; 236 pgs stuck unclean; 1 mons down, quorum 0,1 st1,st2 monmap e3: 3 mons at {st1=109.233.57.226:6789/0,st2=91.224.140.229:6789/0,st3=176.9.250.166:6789/0}, election epoch 72432, quorum 0,1 st1,st2 osdmap e714: 3 osds: 3 up, 3 in pgmap v1824: 292 pgs, 4 pools, 135 bytes data, 2 objects 137 MB used, 284 GB / 284 GB avail 7 active 56 active+clean 188 peering 41 remapped+peering I try to restart st3 monitor service ceph -a restart mon.st3 ps aux | grep ceph root 9642 1.7 19.8 785988 202260 ? Ssl 12:16 0:11 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf root 21375 5.0 3.5 212996 35852 pts/0Sl 12:27 0:00 /usr/bin/ceph-mon -i st3 --pid-file /var/run/ceph/mon.st3.pid -c /etc/ceph/ceph.conf root 21393 0.5 0.5 51308 6060 pts/0S12:27 0:00 python /usr/sbin/ceph-create-keys -i st3 Process ceph-create-keys - stuck and have never finished. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1 mons down, ceph-create-keys
Hi, perhaps your filesystem is too full? df -k du -hs /var/lib/ceph/mon/ceph-st3/store.db What output/Error-Message you get if you start the mon in the foreground? ceph-mon -i st3 -d -c /etc/ceph/ceph.conf Udo On 15.02.2014 09:30, Vadim Vatlin wrote: Hello Could you help me please ceph status cluster 4da1f6d8-ca10-4bfa-bff7-c3c1cdb3f888 health HEALTH_WARN 229 pgs peering; 102 pgs stuck inactive; 236 pgs stuck unclean; 1 mons down, quorum 0,1 st1,st2 monmap e3: 3 mons at {st1=109.233.57.226:6789/0,st2=91.224.140.229:6789/0,st3=176.9.250.166:6789/0}, election epoch 72432, quorum 0,1 st1,st2 osdmap e714: 3 osds: 3 up, 3 in pgmap v1824: 292 pgs, 4 pools, 135 bytes data, 2 objects 137 MB used, 284 GB / 284 GB avail 7 active 56 active+clean 188 peering 41 remapped+peering I try to restart st3 monitor service ceph -a restart mon.st3 ps aux | grep ceph root 9642 1.7 19.8 785988 202260 ? Ssl 12:16 0:11 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf root 21375 5.0 3.5 212996 35852 pts/0Sl 12:27 0:00 /usr/bin/ceph-mon -i st3 --pid-file /var/run/ceph/mon.st3.pid -c /etc/ceph/ceph.conf root 21393 0.5 0.5 51308 6060 pts/0S12:27 0:00 python /usr/sbin/ceph-create-keys -i st3 Process ceph-create-keys - stuck and have never finished. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Problem starting RADOS Gateway
Dear all, I am following this guide http://ceph.com/docs/master/radosgw/config/ to setup Object Storage on CentOS 6.5. My problem is that when I try to start the service as indicated here: http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway I get nothing # service ceph-radosgw start Starting radosgw instance(s)... and if I check if the service is running obviously it is not! # service ceph-radosgw status /usr/bin/radosgw is not running. If I try to start it manually without using the service command I get the following: # /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10 2014-02-15 16:03:38.709235 7fb65ba64820 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619 2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone / region transfer performance may be affected 2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0 2014-02-15 16:03:38.714323 7fb65ba64820 1 -- :/0 messenger.start 2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2014-02-15 16:03:38.714440 7fb65ba64820 0 librados: client.admin initialization error (2) No such file or directory 2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown :/1024619 2014-02-15 16:03:38.714468 7fb65ba64820 1 -- :/1024619 mark_down_all 2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting for dispatch queue 2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry start 2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper 2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done 2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch queue is stopped 2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry done 2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: closing pipes 2014-02-15 16:03:38.714826 7fb65ba64820 10 -- :/1024619 reaper 2014-02-15 16:03:38.714828 7fb65ba64820 10 -- :/1024619 reaper done 2014-02-15 16:03:38.714830 7fb65ba64820 10 -- :/1024619 wait: waiting for pipes to close 2014-02-15 16:03:38.714832 7fb65ba64820 10 -- :/1024619 wait: done. 2014-02-15 16:03:38.714833 7fb65ba64820 1 -- :/1024619 shutdown complete. 2014-02-15 16:03:38.714916 7fb65ba64820 -1 Couldn't init storage provider (RADOS) Obviously the problem is some missing keyring but which one and how can I solve this problem? Furthermore, why this is happening since I am following the guide to the letter?? Is something missing?? Best, G. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem starting RADOS Gateway
Hi, does ceph -s also stuck on missing keyring? Do you have an keyring like: cat /etc/ceph/keyring [client.admin] key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp== Or do you have anothe defined keyring in ceph.conf? global-section - keyring = /etc/ceph/keyring The key is in ceph - see ceph auth get-key client.admin AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp== or ceph auth list for all keys. Key-genaration is doing by get-or-create key like this (but in this case for bootstap-osd): ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd Udo On 15.02.2014 15:35, Georgios Dimitrakakis wrote: Dear all, I am following this guide http://ceph.com/docs/master/radosgw/config/ to setup Object Storage on CentOS 6.5. My problem is that when I try to start the service as indicated here: http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway I get nothing # service ceph-radosgw start Starting radosgw instance(s)... and if I check if the service is running obviously it is not! # service ceph-radosgw status /usr/bin/radosgw is not running. If I try to start it manually without using the service command I get the following: # /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10 2014-02-15 16:03:38.709235 7fb65ba64820 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619 2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone / region transfer performance may be affected 2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0 2014-02-15 16:03:38.714323 7fb65ba64820 1 -- :/0 messenger.start 2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2014-02-15 16:03:38.714440 7fb65ba64820 0 librados: client.admin initialization error (2) No such file or directory 2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown :/1024619 2014-02-15 16:03:38.714468 7fb65ba64820 1 -- :/1024619 mark_down_all 2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting for dispatch queue 2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry start 2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper 2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done 2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch queue is stopped 2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry done 2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: closing pipes 2014-02-15 16:03:38.714826 7fb65ba64820 10 -- :/1024619 reaper 2014-02-15 16:03:38.714828 7fb65ba64820 10 -- :/1024619 reaper done 2014-02-15 16:03:38.714830 7fb65ba64820 10 -- :/1024619 wait: waiting for pipes to close 2014-02-15 16:03:38.714832 7fb65ba64820 10 -- :/1024619 wait: done. 2014-02-15 16:03:38.714833 7fb65ba64820 1 -- :/1024619 shutdown complete. 2014-02-15 16:03:38.714916 7fb65ba64820 -1 Couldn't init storage provider (RADOS) Obviously the problem is some missing keyring but which one and how can I solve this problem? Furthermore, why this is happening since I am following the guide to the letter?? Is something missing?? Best, G. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem starting RADOS Gateway
1) ceph -s is working as expected # ceph -s cluster c465bdb2-e0a5-49c8-8305-efb4234ac88a health HEALTH_OK monmap e1: 1 mons at {master=192.168.0.10:6789/0}, election epoch 1, quorum 0 master mdsmap e111: 1/1/1 up {0=master=up:active} osdmap e114: 2 osds: 2 up, 2 in pgmap v414: 1200 pgs, 14 pools, 10596 bytes data, 67 objects 500 GB used, 1134 GB / 1722 GB avail 1200 active+clean 2) In /etc/ceph I have the following files # ls -l total 20 -rw-r--r-- 1 root root 64 Feb 14 17:10 ceph.client.admin.keyring -rw-r--r-- 1 root root 401 Feb 15 16:57 ceph.conf -rw-r--r-- 1 root root 196 Feb 14 20:26 ceph.log -rw-r--r-- 1 root root 120 Feb 15 11:08 keyring.radosgw.gateway -rwxr-xr-x 1 root root 92 Dec 21 00:47 rbdmap 3) ceph.conf content is the following # cat ceph.conf [global] auth_service_required = cephx filestore_xattr_use_omap = true auth_client_required = cephx auth_cluster_required = cephx mon_host = 192.168.0.10 mon_initial_members = master fsid = c465bdb2-e0a5-49c8-8305-efb4234ac88a [client.radosgw.gateway] host = master keyring = /etc/ceph/keyring.radosgw.gateway rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log 4) And all the keys that exist are the following: # ceph auth list installed auth entries: mds.master key: xx== caps: [mds] allow caps: [mon] allow profile mds caps: [osd] allow rwx osd.0 key: xx== caps: [mon] allow profile osd caps: [osd] allow * osd.1 key: xx== caps: [mon] allow profile osd caps: [osd] allow * client.admin key: xx== caps: [mds] allow caps: [mon] allow * caps: [osd] allow * client.bootstrap-mds key: xx== caps: [mon] allow profile bootstrap-mds client.bootstrap-osd key: AQBWLf5SGBAyBRAAzLwi5OXsAuR5vdo8hs+2zw== caps: [mon] allow profile bootstrap-osd client.radosgw.gateway key: xx== caps: [mon] allow rw caps: [osd] allow rwx I still don't get what is wrong... G. On Sat, 15 Feb 2014 16:27:41 +0100, Udo Lembke wrote: Hi, does ceph -s also stuck on missing keyring? Do you have an keyring like: cat /etc/ceph/keyring [client.admin] key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp== Or do you have anothe defined keyring in ceph.conf? global-section - keyring = /etc/ceph/keyring The key is in ceph - see ceph auth get-key client.admin AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp== or ceph auth list for all keys. Key-genaration is doing by get-or-create key like this (but in this case for bootstap-osd): ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd Udo On 15.02.2014 15:35, Georgios Dimitrakakis wrote: Dear all, I am following this guide http://ceph.com/docs/master/radosgw/config/ to setup Object Storage on CentOS 6.5. My problem is that when I try to start the service as indicated here: http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway I get nothing # service ceph-radosgw start Starting radosgw instance(s)... and if I check if the service is running obviously it is not! # service ceph-radosgw status /usr/bin/radosgw is not running. If I try to start it manually without using the service command I get the following: # /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10 2014-02-15 16:03:38.709235 7fb65ba64820 0 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619 2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't support curl_multi_wait() 2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone / region transfer performance may be affected 2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0 2014-02-15 16:03:38.714323 7fb65ba64820 1 -- :/0 messenger.start 2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2014-02-15 16:03:38.714440 7fb65ba64820 0 librados: client.admin initialization error (2) No such file or directory 2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown :/1024619 2014-02-15 16:03:38.714468 7fb65ba64820 1 -- :/1024619 mark_down_all 2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting for dispatch queue 2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry start 2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper 2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done 2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch queue is stopped 2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry done 2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait:
[ceph-users] Block Devices and OpenStack
Hi Cephers, I am trying to configure ceph rbd as backend for cinder and glance by following the steps mentioned in: http://ceph.com/docs/master/rbd/rbd-openstack/ Before I start all openstack services are running normally and ceph cluster health shows HEALTH_OK But once I am done with all steps and restart openstack services, cinder-volume fails to start and throws an error. 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Traceback (most recent call last): 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 262, in check_for_setup_error 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd with RADOSClient(self): 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 234, in __init__ 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd self.cluster, self.ioctx = driver._connect_to_rados(pool) 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 282, in _connect_to_rados 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd client.connect() 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /usr/lib/python2.7/dist-packages/rados.py, line 185, in connect 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd raise make_ex(ret, error calling connect) 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Error: error calling connect: error code 95 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd 2014-02-16 00:01:42.591 ERROR cinder.volume.manager [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Error encountered during initialization of driver: RBDDriver 2014-02-16 00:01:42.592 ERROR cinder.volume.manager [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster 2014-02-16 00:01:42.592 TRACE cinder.volume.manager Traceback (most recent call last): 2014-02-16 00:01:42.592 TRACE cinder.volume.manager File /opt/stack/cinder/cinder/volume/manager.py, line 190, in init_host 2014-02-16 00:01:42.592 TRACE cinder.volume.manager self.driver.check_for_setup_error() 2014-02-16 00:01:42.592 TRACE cinder.volume.manager File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 267, in check_for_setup_error 2014-02-16 00:01:42.592 TRACE cinder.volume.manager raise exception.VolumeBackendAPIException(data=msg) 2014-02-16 00:01:42.592 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster Here is the content of my /etc/ceph in openstack node: ashish@ubuntu:/etc/ceph$ ls -lrt total 16 -rw-r--r-- 1 cinder cinder 229 Feb 15 23:45 ceph.conf -rw-r--r-- 1 glance glance 65 Feb 15 23:46 ceph.client.glance.keyring -rw-r--r-- 1 cinder cinder 65 Feb 15 23:47 ceph.client.cinder.keyring -rw-r--r-- 1 cinder cinder 72 Feb 15 23:47 ceph.client.cinder-backup.keyring I am really stuck and tried a lot. What Could possibly I be doing wrong. HELP. Thanks and Regards Ashish Chandra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Poor performance with 2 million files flat
I have a performance problem i would like advise. I have the following sub-optimal setup: * 2 Servers (WTFM008 WTFM009) * HP Proliant DL180 * SmartArray G6 P410 raid-controller * 4x 500GB RAID5 (seq writes = 230MB/s) * CentOS 6.5 x86_64 * 2.000.000 files (ms-word), with no directory structure * Ceph * ceph-deploy mon create WTFM008 WTFM009 * ceph-deploy mds create WTFM008 WTFM009 * ceph-deploy osd activate WTFM008:/var/lib/ceph/osd/ceph-0 WTFM009:/var/lib/ceph/osd/ceph-1 (osd is using root fs) * ceph-fuse /mnt/ceph I am currently trying to copy 2 million ms-word documents into ceph. When i started it was doing about 10 files per second. Now, 1 week later, it has done about 500.000 files and has slowed down to 1 file per 10 seconds. How can i improve this terrible performance? * The hardware is a fixed configuration, i cannot add (SSD) disks or change RAID. * I could not find the cephfs kernel module so i had to use cephfs-fuse. * I could have started with a degraded setup (1 OSD) for the initial load, would that have helped in the performance? (Ceph not having to do the distribution part) * There is nu load on the systems at all (not cpu, not mem, not disk i/o) Below is my crush map. Regards, Samuel Terburg Panther-IT BV # begin crush map # devices device 0 osd.0 device 1 osd.1 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 root # buckets host WTFM008 { id -2# do not change unnecessarily # weight 1.340 alg straw hash 0 # rjenkins1 item osd.0 weight 1.340 } host WTFM009 { id -3# do not change unnecessarily # weight 1.340 alg straw hash 0 # rjenkins1 item osd.1 weight 1.340 } root default { id -1# do not change unnecessarily # weight 2.680 alg straw hash 0 # rjenkins1 item WTFM008 weight 1.340 item WTFM009 weight 1.340 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map # ceph -w cluster 4f7bcb26-0cee-4472-abca-c200a999b686 health HEALTH_OK monmap e1: 2 mons at {WTFM008=192.168.0.1:6789/0,WTFM009=192.168.0.2:6789/0}, election epoch 4, quorum 0,1 WTFM008,WTFM009 mdsmap e5: 1/1/1 up {0=WTFM008=up:active}, 1 up:standby osdmap e14: 2 osds: 2 up, 2 in pgmap v151668: 192 pgs, 3 pools, 31616 MB data, 956 kobjects 913 GB used, 1686 GB / 2738 GB avail 192 active+clean client io 40892 kB/s rd, 7370 B/s wr, 1 op/s ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Poor performance with 2 million files flat
On Sat, 15 Feb 2014, Samuel Terburg - Panther-IT BV wrote: I have a performance problem i would like advise. I have the following sub-optimal setup: * 2 Servers (WTFM008 WTFM009) * HP Proliant DL180 * SmartArray G6 P410 raid-controller * 4x 500GB RAID5 (seq writes = 230MB/s) * CentOS 6.5 x86_64 * 2.000.000 files (ms-word), with no directory structure * Ceph * ceph-deploy mon create WTFM008 WTFM009 * ceph-deploy mds create WTFM008 WTFM009 * ceph-deploy osd activate WTFM008:/var/lib/ceph/osd/ceph-0 WTFM009:/var/lib/ceph/osd/ceph-1 (osd is using root fs) * ceph-fuse /mnt/ceph I am currently trying to copy 2 million ms-word documents into ceph. When i started it was doing about 10 files per second. Now, 1 week later, it has done about 500.000 files and has slowed down to 1 file per 10 seconds. How can i improve this terrible performance? You probably need to add mds frag = true in the [mds] section sage * The hardware is a fixed configuration, i cannot add (SSD) disks or change RAID. * I could not find the cephfs kernel module so i had to use cephfs-fuse. * I could have started with a degraded setup (1 OSD) for the initial load, would that have helped in the performance? (Ceph not having to do the distribution part) * There is nu load on the systems at all (not cpu, not mem, not disk i/o) Below is my crush map. Regards, Samuel Terburg Panther-IT BV # begin crush map # devices device 0 osd.0 device 1 osd.1 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 root # buckets host WTFM008 { id -2 # do not change unnecessarily # weight 1.340 alg straw hash 0 # rjenkins1 item osd.0 weight 1.340 } host WTFM009 { id -3 # do not change unnecessarily # weight 1.340 alg straw hash 0 # rjenkins1 item osd.1 weight 1.340 } root default { id -1 # do not change unnecessarily # weight 2.680 alg straw hash 0 # rjenkins1 item WTFM008 weight 1.340 item WTFM009 weight 1.340 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map # ceph -w cluster 4f7bcb26-0cee-4472-abca-c200a999b686 health HEALTH_OK monmap e1: 2 mons at {WTFM008=192.168.0.1:6789/0,WTFM009=192.168.0.2:6789/0}, election epoch 4, quorum 0,1 WTFM008,WTFM009 mdsmap e5: 1/1/1 up {0=WTFM008=up:active}, 1 up:standby osdmap e14: 2 osds: 2 up, 2 in pgmap v151668: 192 pgs, 3 pools, 31616 MB data, 956 kobjects 913 GB used, 1686 GB / 2738 GB avail 192 active+clean client io 40892 kB/s rd, 7370 B/s wr, 1 op/s ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] slow requests from rados bench with small writes
Dear Ceph experts, We've found that a single client running rados bench can drive other users, ex. RBD users, into slow requests. Starting with a cluster that is not particularly busy, e.g. : 2014-02-15 23:14:33.714085 mon.0 xx:6789/0 725224 : [INF] pgmap v6561996: 27952 pgs: 27952 active+clean; 66303 GB data, 224 TB used, 2850 TB / 3075 TB avail; 4880KB /s rd, 28632KB/s wr, 271op/s We then start a rados bench writing many small objects: rados bench -p test 60 write -t 500 -b 1024 --no-cleanup which gives these results (note the 60s max latency!!): Total time run: 86.351424 Total writes made: 91425 Write size: 1024 Bandwidth (MB/sec): 1.034 Stddev Bandwidth: 1.26486 Max bandwidth (MB/sec): 7.14941 Min bandwidth (MB/sec): 0 Average Latency:0.464847 Stddev Latency: 3.04961 Max latency:66.4363 Min latency:0.003188 30 seconds into this bench we start seeing slow requests, not only from bench writes but also some poor RBD clients, e.g.: 2014-02-15 23:16:02.820507 osd.483 xx:6804/46799 2201 : [WRN] slow request 30.195634 seconds old, received at 2014-02-15 23:15:32.624641: osd_sub_op(client.18535427.0:3922272 4.d42 4eb00d42/rbd_data.11371325138b774.6577/head//4 [] v 42083'71453 snapset=0=[]:[] snapc=0=[]) v7 currently commit sent During a longer, many-hour instance of this small write test, some of these RBD slow writes became very user visible, with disk flushes being blocked long enough (120s) for the VM kernels to start complaining. A rados bench from a 10Gig-e client writing 4MB objects doesn't have the same long tail of latency, namely: # rados bench -p test 60 write -t 500 --no-cleanup ... Total time run: 62.811466 Total writes made: 8553 Write size: 4194304 Bandwidth (MB/sec): 544.678 Stddev Bandwidth: 173.163 Max bandwidth (MB/sec): 1000 Min bandwidth (MB/sec): 0 Average Latency:3.50719 Stddev Latency: 0.309876 Max latency:8.04493 Min latency:0.166138 and there are zero slow requests, at least during this 60s duration. While the vast majority of small writes are completing with a reasonable sub-second latency, what is causing the very long tail seen by a few writes?? -- 60-120s!! Can someone advise us where to look in the perf dump, etc... to find which resource/queue is being exhausted during these tests? Oh yeah, we're running latest dumpling stable, 0.67.5, on the servers. Best Regards, Thanks in advance! Dan -- Dan van der Ster || Data Storage Services || CERN IT Department -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] change order of an rbd image ?
Hi, I created a 1TB rbd-image formated with vmfs (vmware) for an ESX server - but with a wrong order (25 instead of 22 ...). The rbd man page tells me for export/import/cp, rbd will use the order of the source image. Is there a way to change the order of a rbd image by doing some conversion? Ok - one idea could be to 'dd' the 1TB mapped rbd device to same mounted filesystem - but is this the only way? best regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Block Devices and OpenStack
Hi Jean, Here is the output for ceph auth list for client.cinder client.cinder key: AQCKaP9ScNgiMBAAwWjFnyL69rBfMzQRSHOfoQ== caps: [mon] allow r caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images Here is the output of ceph -s: ashish@ceph-client:~$ ceph -s cluster afa13fcd-f662-4778-8389-85047645d034 health HEALTH_OK monmap e1: 1 mons at {ceph-node1=10.0.1.11:6789/0}, election epoch 1, quorum 0 ceph-node1 osdmap e37: 3 osds: 3 up, 3 in pgmap v84: 576 pgs, 6 pools, 0 bytes data, 0 objects 106 MB used, 9076 MB / 9182 MB avail 576 active+clean I created all the keyrings and copied as suggested by the guide. On Sun, Feb 16, 2014 at 3:08 AM, Jean-Charles LOPEZ jc.lo...@inktank.comwrote: Hi, what do you get when you run a 'ceph auth list' command for the user name (client.cinder) you created for cinder? Are the caps and the key for this user correct? No typo in the hostname in the cinder.conf file (host=) ? Did you copy the keyring to the cinder running cinder (can't really say from your output and there is no ceph-s command to check the monitor names)? It could just be a typo in the ceph auth get-or-create command that's causing it. Rgds JC On Feb 15, 2014, at 10:35, Ashish Chandra mail.ashishchan...@gmail.com wrote: Hi Cephers, I am trying to configure ceph rbd as backend for cinder and glance by following the steps mentioned in: http://ceph.com/docs/master/rbd/rbd-openstack/ Before I start all openstack services are running normally and ceph cluster health shows HEALTH_OK But once I am done with all steps and restart openstack services, cinder-volume fails to start and throws an error. 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Traceback (most recent call last): 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 262, in check_for_setup_error 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd with RADOSClient(self): 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 234, in __init__ 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd self.cluster, self.ioctx = driver._connect_to_rados(pool) 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 282, in _connect_to_rados 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd client.connect() 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd File /usr/lib/python2.7/dist-packages/rados.py, line 185, in connect 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd raise make_ex(ret, error calling connect) 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd Error: error calling connect: error code 95 2014-02-16 00:01:42.582 TRACE cinder.volume.drivers.rbd 2014-02-16 00:01:42.591 ERROR cinder.volume.manager [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Error encountered during initialization of driver: RBDDriver 2014-02-16 00:01:42.592 ERROR cinder.volume.manager [req-8134a4d7-53f8-4ada-b4b5-4d96d7cad4bc None None] Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster 2014-02-16 00:01:42.592 TRACE cinder.volume.manager Traceback (most recent call last): 2014-02-16 00:01:42.592 TRACE cinder.volume.manager File /opt/stack/cinder/cinder/volume/manager.py, line 190, in init_host 2014-02-16 00:01:42.592 TRACE cinder.volume.manager self.driver.check_for_setup_error() 2014-02-16 00:01:42.592 TRACE cinder.volume.manager File /opt/stack/cinder/cinder/volume/drivers/rbd.py, line 267, in check_for_setup_error 2014-02-16 00:01:42.592 TRACE cinder.volume.manager raise exception.VolumeBackendAPIException(data=msg) 2014-02-16 00:01:42.592 TRACE cinder.volume.manager VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: error connecting to ceph cluster Here is the content of my /etc/ceph in openstack node: ashish@ubuntu:/etc/ceph$ ls -lrt total 16 -rw-r--r-- 1 cinder cinder 229 Feb 15 23:45 ceph.conf -rw-r--r-- 1 glance glance 65 Feb 15 23:46 ceph.client.glance.keyring -rw-r--r-- 1 cinder cinder 65 Feb 15 23:47 ceph.client.cinder.keyring -rw-r--r-- 1 cinder cinder 72 Feb 15 23:47 ceph.client.cinder-backup.keyring I am really stuck and tried a lot. What Could possibly I be doing wrong. HELP. Thanks and Regards Ashish Chandra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] keyring generation
(2014/02/16 3:06), Kei.masumoto wrote: (2014/02/11 23:02), Alfredo Deza wrote: On Tue, Feb 11, 2014 at 7:57 AM, Kei.masumoto kei.masum...@gmail.com wrote: (2014/02/10 23:33), Alfredo Deza wrote: On Sat, Feb 8, 2014 at 7:56 AM, Kei.masumoto kei.masum...@gmail.com wrote: (2014/02/05 23:49), Alfredo Deza wrote: On Mon, Feb 3, 2014 at 11:28 AM, Kei.masumoto kei.masum...@gmail.com wrote: Hi Alfredo, Thanks for your reply! I think I pasted all logs from ceph.log, but anyway, I re-executed ceph-deploy mon create-initial again Does that make sense? It seems like stack strace are added... Those seem bad enough. There is a ticket open for these type of tracebacks that should be gone with the up coming release of ceph-deploy. Your monitor does seem like in a good state. Have you checked the monitor logs to see if they are complaining about something? I would also raise the log level in ceph.conf for the monitors specifically to: debug mon = 10 Thanks for your reply. I did debug mon = 10, but I could not find any error logs in /var/log/ceph/ceph-mon.ceph1.log. So I tried to let ceph-create-keys generate logs to files, and inserted log by myself for debugging purpose. Then I found get_key() in ceph-create-keys complains like below. ( first line is inserted by me) INFO:ceph-create-keys: ceph --cluster=ceph --name=mon. --keyring=/var/lib/ceph/mon/ceph-ceph1/keyring auth get-or-create client.admin mon allow * osd allow * mds allow INFO:ceph-create-keys:Talking to monitor... INFO:ceph-create-keys:Cannot get or create admin key, permission denied How did you started ceph-create-keys? with root? or with the ceph user? That process is usually fired up by the init script that is usually called by root. I just use ceph-deploy --overwrite-conf mon create-initial by user ceph. Before doing that, start ceph-all stop ceph-all by root. That is odd, if this is a new cluster why are you starting and stopping ? It seems that you are at a point where you have tried a few things and the cluster setup might not be in a good state. Can you try setting it up from scratch and make sure you keep logs and output? If you can replicate your issues consistently (I have tried and cannot) then it might indicate an issue and all the logs and how you got there would be super useful I tried from scratch and logs are attached. Currently, 4 hosts exists in my test environment, ceph5(remote-host), ceph4(mon), ceph3(osd), ceph2(osd). I try to follow the instruction, although hostname is little different. http://ceph.com/docs/master/start/quick-start-preflight/#ceph-node-setup Please see at the end of cons...@ceph5.log. After exec ceph-deploy mon create-initial, I got same error. Although I will check little more detail, I appreciate any hints. I understand my problem. According to the instruction below, http://ceph.com/docs/master/start/quick-start-preflight/#ceph-node-setup When I write public network in ceph.conf, mon_host has to included the subnet described public network. I didnt realize such a pre-condition, have to learn more. After I changed /usr/bin/ceph like below, conf_defaults = { -'log_to_stderr':'true', -'err_to_stderr':'true', + 'log_to_syslog':'true', 'log_flush_on_exit':'true', } I found a logs in /var/log/syslog. 2014-02-15 23:31:50.417381 7f22f8626700 0 -- :/1009957 192.168.40.136:6789/0 pipe(0x7f22e8019850 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f22e8000c00).faul Listened ipaddr is different what netstat shows. Thanks for your help so far. BTW, I inserted debugging logs to /usr/bin/ceph, and found the below log. INFO:debug:Exception error calling connect INFO:debug:Exception error calling connect TimedOut Those logs are generated by cluster_handle.connect(). e.g. rados.Rados.connect(). try: if childargs and childargs[0] == 'ping': return ping_monitor(cluster_handle, childargs[1]) cluster_handle.connect(timeout=timeout) Any hints where to check? 6789 is listened by mon. So I tried, root@ceph1:~/my-cluster# chown -R ceph:ceph /var/lib/ceph/mon/ root@ceph1:~/my-cluster# start ceph-all stop ceph-all ceph-all start/running ceph-all stop/waiting Then I re-tried: ceph@ceph1:~/my-cluster$ ceph-deploy --overwrite-conf mon create-initial After that, I found some files are still owned b root. Is this a correct behavior? root@ceph1:~/my-cluster# ls -l /var/lib/ceph/mon/ceph-ceph1/store.db total 1184 -rw-r--r-- 1 ceph ceph 1081168 Feb 8 02:25 000133.sst -rw-r--r-- 1 ceph ceph 25530 Feb 8 02:38 000135.sst -rw-r--r-- 1 ceph ceph 25530 Feb 8 02:38 000138.sst -rw-r--r-- 1 root root 25530 Feb 8 02:44 000141.sst -rw-r--r-- 1 root root 65536 Feb 8 02:44 000142.log -rw-r--r-- 1 root root 16 Feb 8 02:44 CURRENT -rw-r--r-- 1 ceph ceph 0 Jan 26 05:50 LOCK -rw-r--r-- 1 ceph ceph 315 Jan 26 06:28 LOG -rw-r--r-- 1 ceph ceph 57 Jan 26 05:50 LOG.old -rw-r--r-- 1 root root 65536 Feb 8 02:44