[ceph-users] ceph.com 403 forbidden
Hi all. When connect to ceph.com , I got message 403 forbidden. If I using US-proxy server , it's work well. Please solve this problem. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS
As I said, 107K with IOs serving from memory, not hitting the disk.. From: Jian Zhang [mailto:amberzhan...@gmail.com] Sent: Sunday, August 31, 2014 8:54 PM To: Somnath Roy Cc: Haomai Wang; ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Somnath, on the small workload performance, 107k is higher than the theoretical IOPS of 520, any idea why? Single client is ~14K iops, but scaling as number of clients increases. 10 clients ~107K iops. ~25 cpu cores are used. 2014-09-01 11:52 GMT+08:00 Jian Zhang amberzhan...@gmail.commailto:amberzhan...@gmail.com: Somnath, on the small workload performance, 2014-08-29 14:37 GMT+08:00 Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com: Thanks Haomai ! Here is some of the data from my setup. -- Set up: 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) - one OSD. 5 client m/c with 12 core cpu and each running two instances of ceph_smalliobench (10 clients total). Network is 10GbE. Workload: - Small workload – 20K objects with 4K size and io_size is also 4K RR. The intent is to serve the ios from memory so that it can uncover the performance problems within single OSD. Results from Firefly: -- Single client throughput is ~14K iops, but as the number of client increases the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu cores are used. Result with latest master: -- Single client is ~14K iops, but scaling as number of clients increases. 10 clients ~107K iops. ~25 cpu cores are used. -- More realistic workload: - Let’s see how it is performing while 90% of the ios are served from disks Setup: --- 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 SSDs - 8 OSDs. One similar node for monitor and rgw. Another node for client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 GbE network Workload: 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR. Results from Firefly: Aggregated output while 4 rbd clients stressing the cluster in parallel is ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember precisely) Results from latest master: Aggregated output while 4 rbd clients stressing the cluster in parallel is ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores. Hope this helps. Thanks Regards Somnath -Original Message- From: Haomai Wang [mailto:haomaiw...@gmail.commailto:haomaiw...@gmail.com] Sent: Thursday, August 28, 2014 8:01 PM To: Somnath Roy Cc: Andrey Korolyov; ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Hi Roy, I already scan your merged codes about fdcache and optimizing for lfn_find/lfn_open, could you give some performance improvement data about it? I fully agree with your orientation, do you have any update about it? As for messenger level, I have some very early works on it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new messenger implementation which support different event mechanism. It looks like at least one more week to make it work. On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com wrote: Yes, what I saw the messenger level bottleneck is still huge ! Hopefully RDMA messenger will resolve that and the performance gain will be significant for Read (on SSDs). For write we need to uncover the OSD bottlenecks first to take advantage of the improved upstream. What I experienced that till you remove the very last bottleneck the performance improvement will not be visible and that could be confusing because you might think that the upstream improvement you did is not valid (which is not). Thanks Regards Somnath -Original Message- From: Andrey Korolyov [mailto:and...@xdel.ru] Sent: Thursday, August 28, 2014 12:57 PM To: Somnath Roy Cc: David Moreau Simard; Mark Nelson; ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com wrote: Nope, this will not be back ported to Firefly I
[ceph-users] ceph cluster inconsistency keyvaluestore
Hi, I reinstalled the cluster with 0.84, and tried again running rados bench on a EC coded pool on keyvaluestore. Nothing crashed this time, but when I check the status: health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few pgs per osd (15 min 20) monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 osdmap e174: 78 osds: 78 up, 78 in pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects 1753 GB used, 129 TB / 131 TB avail 1088 active+clean 128 active+clean+inconsistent the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the others are on Filestore) The only thing I can see in the logs is that after the rados tests, it start scrubbing, and for each KV pg I get something like this: 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] 2.3s0 scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 118128377856/122855358464 bytes. What could here be the problem? Thanks again!! Kenneth - Message from Haomai Wang haomaiw...@gmail.com - Date: Tue, 26 Aug 2014 17:11:43 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: ceph-users@lists.ceph.com Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223). Sorry for the late message, I forget that this fix is merged into 0.84. Thanks for your patient :-) On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, In the meantime I already tried with upgrading the cluster to 0.84, to see if that made a difference, and it seems it does. I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' anymore. But now the cluster detect it is inconsistent: cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few pgs per osd (4 min 20); mon.ceph002 low disk space monmap e3: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e145384: 78 osds: 78 up, 78 in pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects 1502 GB used, 129 TB / 131 TB avail 279 active+clean 40 active+clean+inconsistent 1 active+clean+scrubbing+deep I tried to do ceph pg repair for all the inconsistent pgs: cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub errors; too few pgs per osd (4 min 20); mon.ceph002 low disk space monmap e3: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e146452: 78 osds: 78 up, 78 in pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects 1503 GB used, 129 TB / 131 TB avail 279 active+clean 39 active+clean+inconsistent 1 active+clean+scrubbing+deep 1 active+clean+scrubbing+deep+inconsistent+repair I let it recovering through the night, but this morning the mons were all gone, nothing to see in the log files.. The osds were all still up! cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub errors; too few pgs per osd (4 min 20) monmap e7: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e203410: 78 osds: 78 up, 78 in pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects 1547 GB used, 129 TB / 131 TB avail 1 active+clean+scrubbing+deep+inconsistent+repair 284 active+clean 35 active+clean+inconsistent I restarted the monitors now, I will let you know when I see something more.. - Message from Haomai Wang haomaiw...@gmail.com - Date: Sun, 24 Aug 2014 12:51:41 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be, ceph-users@lists.ceph.com It's really strange! I write a test program according the key ordering you provided and parse the corresponding value. It's true! I have no idea now. If free,
[ceph-users] Paris Ceph meetup : september 18th, 2014
Hi Ceph, The next Paris Ceph meetup is scheduled immediately after the Ceph day. http://www.meetup.com/Ceph-in-Paris/events/204412892/ I'll be there and hope to discuss the Giant features on this occasion :-) Cheers -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-community] Paris Ceph meetup : september 18th, 2014
This reminds me that we should also schedule some sort of meetup during the Openstack summit which is also in Paris ! -- David Moreau Simard Le 2014-09-01, 8:06 AM, « Loic Dachary » l...@dachary.org a écrit : Hi Ceph, The next Paris Ceph meetup is scheduled immediately after the Ceph day. http://www.meetup.com/Ceph-in-Paris/events/204412892/ I'll be there and hope to discuss the Giant features on this occasion :-) Cheers -- Loïc Dachary, Artisan Logiciel Libre ___ Ceph-community mailing list ceph-commun...@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph cluster inconsistency keyvaluestore
Hmm, could you please list your instructions including cluster existing time and all relevant ops? I want to reproduce it. On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, I reinstalled the cluster with 0.84, and tried again running rados bench on a EC coded pool on keyvaluestore. Nothing crashed this time, but when I check the status: health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few pgs per osd (15 min 20) monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0, ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 osdmap e174: 78 osds: 78 up, 78 in pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects 1753 GB used, 129 TB / 131 TB avail 1088 active+clean 128 active+clean+inconsistent the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the others are on Filestore) The only thing I can see in the logs is that after the rados tests, it start scrubbing, and for each KV pg I get something like this: 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] 2.3s0 scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 118128377856/122855358464 bytes. What could here be the problem? Thanks again!! Kenneth - Message from Haomai Wang haomaiw...@gmail.com - Date: Tue, 26 Aug 2014 17:11:43 +0800 From: Haomai Wang haomaiw...@gmail.com Subject: Re: [ceph-users] ceph cluster inconsistency? To: Kenneth Waegeman kenneth.waege...@ugent.be Cc: ceph-users@lists.ceph.com Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223). Sorry for the late message, I forget that this fix is merged into 0.84. Thanks for your patient :-) On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman kenneth.waege...@ugent.be wrote: Hi, In the meantime I already tried with upgrading the cluster to 0.84, to see if that made a difference, and it seems it does. I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' anymore. But now the cluster detect it is inconsistent: cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few pgs per osd (4 min 20); mon.ceph002 low disk space monmap e3: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, ceph003=10.141.8.182:6789/0}, election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e145384: 78 osds: 78 up, 78 in pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects 1502 GB used, 129 TB / 131 TB avail 279 active+clean 40 active+clean+inconsistent 1 active+clean+scrubbing+deep I tried to do ceph pg repair for all the inconsistent pgs: cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub errors; too few pgs per osd (4 min 20); mon.ceph002 low disk space monmap e3: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, ceph003=10.141.8.182:6789/0}, election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e146452: 78 osds: 78 up, 78 in pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects 1503 GB used, 129 TB / 131 TB avail 279 active+clean 39 active+clean+inconsistent 1 active+clean+scrubbing+deep 1 active+clean+scrubbing+deep+inconsistent+repair I let it recovering through the night, but this morning the mons were all gone, nothing to see in the log files.. The osds were all still up! cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub errors; too few pgs per osd (4 min 20) monmap e7: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0, ceph003=10.141.8.182:6789/0}, election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby osdmap e203410: 78 osds: 78 up, 78 in pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects 1547 GB used, 129 TB / 131 TB avail 1 active+clean+scrubbing+deep+inconsistent+repair 284 active+clean 35 active+clean+inconsistent I restarted the monitors now, I will let you know when I see something more.. - Message from Haomai Wang haomaiw...@gmail.com - Date: Sun, 24 Aug 2014 12:51:41 +0800 From: Haomai Wang haomaiw...@gmail.com Subject:
Re: [ceph-users] ceph cluster inconsistency keyvaluestore
Hi, The cluster got installed with quattor, which uses ceph-deploy for installation of daemons, writes the config file and installs the crushmap. I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for the ECdata pool and a small cache partition (50G) for the cache I manually did this: ceph osd pool create cache 1024 1024 ceph osd pool set cache size 2 ceph osd pool set cache min_size 1 ceph osd erasure-code-profile set profile11 k=8 m=3 ruleset-failure-domain=osd ceph osd pool create ecdata 128 128 erasure profile11 ceph osd tier add ecdata cache ceph osd tier cache-mode cache writeback ceph osd tier set-overlay ecdata cache ceph osd pool set cache hit_set_type bloom ceph osd pool set cache hit_set_count 1 ceph osd pool set cache hit_set_period 3600 ceph osd pool set cache target_max_bytes $((280*1024*1024*1024)) (But the previous time I had the problem already without the cache part) Cluster live since 2014-08-29 15:34:16 Config file on host ceph001: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 10.143.8.0/24 filestore_xattr_use_omap = 1 fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d mon_cluster_log_to_syslog = 1 mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os mon_initial_members = ceph001, ceph002, ceph003 osd_crush_update_on_start = 0 osd_journal_size = 10240 osd_pool_default_min_size = 2 osd_pool_default_pg_num = 512 osd_pool_default_pgp_num = 512 osd_pool_default_size = 3 public_network = 10.141.8.0/24 [osd.11] osd_objectstore = keyvaluestore-dev [osd.13] osd_objectstore = keyvaluestore-dev [osd.15] osd_objectstore = keyvaluestore-dev [osd.17] osd_objectstore = keyvaluestore-dev [osd.19] osd_objectstore = keyvaluestore-dev [osd.21] osd_objectstore = keyvaluestore-dev [osd.23] osd_objectstore = keyvaluestore-dev [osd.25] osd_objectstore = keyvaluestore-dev [osd.3] osd_objectstore = keyvaluestore-dev [osd.5] osd_objectstore = keyvaluestore-dev [osd.7] osd_objectstore = keyvaluestore-dev [osd.9] osd_objectstore = keyvaluestore-dev OSDs: # idweight type name up/down reweight -12 140.6 root default-cache -9 46.87 host ceph001-cache 2 3.906 osd.2 up 1 4 3.906 osd.4 up 1 6 3.906 osd.6 up 1 8 3.906 osd.8 up 1 10 3.906 osd.10 up 1 12 3.906 osd.12 up 1 14 3.906 osd.14 up 1 16 3.906 osd.16 up 1 18 3.906 osd.18 up 1 20 3.906 osd.20 up 1 22 3.906 osd.22 up 1 24 3.906 osd.24 up 1 -10 46.87 host ceph002-cache 28 3.906 osd.28 up 1 30 3.906 osd.30 up 1 32 3.906 osd.32 up 1 34 3.906 osd.34 up 1 36 3.906 osd.36 up 1 38 3.906 osd.38 up 1 40 3.906 osd.40 up 1 42 3.906 osd.42 up 1 44 3.906 osd.44 up 1 46 3.906 osd.46 up 1 48 3.906 osd.48 up 1 50 3.906 osd.50 up 1 -11 46.87 host ceph003-cache 54 3.906 osd.54 up 1 56 3.906 osd.56 up 1 58 3.906 osd.58 up 1 60 3.906 osd.60 up 1 62 3.906 osd.62 up 1 64 3.906 osd.64 up 1 66 3.906 osd.66 up 1 68 3.906 osd.68 up 1 70 3.906 osd.70 up 1 72 3.906 osd.72 up 1 74 3.906 osd.74 up 1 76 3.906 osd.76 up 1 -8 140.6 root default-ec -5 46.87 host ceph001-ec 3 3.906 osd.3 up 1 5 3.906 osd.5 up 1 7 3.906 osd.7 up 1 9 3.906 osd.9 up 1 11 3.906 osd.11 up 1 13 3.906 osd.13 up 1 15 3.906 osd.15 up 1 17 3.906 osd.17 up 1 19 3.906 osd.19 up 1 21 3.906 osd.21 up 1 23 3.906 osd.23 up 1 25 3.906 osd.25 up 1 -6 46.87 host ceph002-ec 29 3.906 osd.29 up 1 31 3.906 osd.31 up 1 33 3.906 osd.33 up 1 35 3.906
[ceph-users] kvm guest with rbd-disks are unaccesible after app. 3h afterwards one OSD node fails
Hi list, on the weekend one of five OSD-nodes fails (hung with kernel panic). The cluster degraded (12 of 60 osds), but from our monitoring-host the noout-flag is set in this case. But around three hours later the kvm-guest, which used storage on the ceph cluster (and use writes) are unaccessible. After restarting the failed ceph-node the ceph-cluster are healthy again, but the VMs need to be restartet to work again. In the ceph.config I had defined osd_pool_default_min_size = 1 therefore I don't understand why this happens. Which parameter must be changed/set so that the kvm-clients still working on the unhealthy cluster? Ceph-version is 0.72.2 - pool replication 2. Thanks for a hint. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] I fail to add a monitor in a ceph cluster
Hello, I am currently testing ceph to make a replicated block device for a project that would involve 2 data servers accessing this block device, so that if one fails or crashes, the data can still be used and the cluster can be rebuilt. This project requires that both machines run an OSD and a monitor, and that a 3rd monitor is run somewhere else, so that there is not a single point of failure. I know it is not the best thing to run an OSD and a monitor on the same machine, but I cannot really find a better solution. My problem is that, after having read several times and followed the documentation, I cannot succeed to add a second monitor. I have bootstrapped a first monitor, added 2 OSDs (one on the machine with the monitor, one on the other), and I try to add a second monitor but it doesn't work. I think I misunderstood something. Here's what I did : On the first machine named grenier: # setup the configuration file /etc/ceph/ceph.conf (see content further) # bootstrap monitor: $ ceph-authtool --create-keyring /var/tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *' $ sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow' $ sudo chown myuser /etc/ceph/ceph.client.admin.keyring $ ceph-authtool /var/tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring $ monmaptool --create --add grenier 172.16.1.11 --fsid $monuuid $tmp/monmap $ sudo mkdir -p /var/lib/ceph/mon/ceph-grenier $ sudo chown $ID -R /var/lib/ceph/mon/ceph-grenier $ ceph-mon --mkfs -i grenier --monmap /var/tmp/monmap --keyring /var/tmp/ceph.mon.keyring # start monitor: $ sudo start ceph-mon id=grenier # add OSD: $ sudo ceph osd create $osduuid $ sudo mkdir -p /var/lib/ceph/osd/ceph-0 $ sudo ceph-osd -i 0 --mkfs --mkkey --osd-uuid $osduuid $ sudo ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-0/keyring $ ceph osd crush add-bucket grenier host $ ceph osd crush move grenier root=default $ ceph osd crush add osd.0 1.0 host=grenier # start this OSD $ sudo ceph-osd -i 0 # copy /etc/ceph/ceph.conf, /etc/ceph/ceph.client.admin.keyring, /var/tmp/ceph/ceph.mon.keyring and /var/tmp/ceph/monmap from grenier to second node named gail: # add and start OSD on the second node $ sudo ceph osd create $newosduuid $ sudo mkdir -p /var/lib/ceph/osd/ceph-1 $ sudo ceph-osd -i 1 --mkfs --mkkey --osd-uuid $newosduuid $ sudo ceph auth add osd.1 osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-1/keyring $ ceph osd crush add-bucket gail host $ ceph osd crush move gail root=default $ ceph osd crush add osd.1 1.0 host=gail # start this OSD $ sudo ceph-osd -i 1 There, everything works correctly, I can create and map a block device, and then write on it and the data is replicated on both nodes. When I perform a ceph -s I get : cluster a98faf65-b105-4ec7-913c-f8a33a4db4d1 health HEALTH_OK monmap e1: 1 mons at {grenier=172.16.1.11:6789/0}, election epoch 2, quorum 0 grenier osdmap e13: 2 osds: 2 up, 2 in pgmap v47: 192 pgs, 3 pools, 0 bytes data, 0 objects 18400 MB used, 105 GB / 129 GB avail 192 active+clean And here is what I do when trying to add a second monitor on gail: $ sudo mkdir -p /var/lib/ceph/mon/ceph-gail $ ceph mon getmap -o /var/tmp/monmap $ sudo ceph-mon -i gail --mkfs --monmap /var/tmp/monmap --keyring /var/tmp/ceph.mon.keyring which prints: ceph-mon: set fsid to a98faf65-b105-4ec7-913c-f8a33a4db4d1 ceph-mon: created monfs at /var/lib/ceph/mon/ceph-gail for mon.gail which seems correct (same uuid as in ceph.conf) $ sudo ceph-mon add gail 172.16.1.12 This command prints: 2014-09-01 17:07:26.033688 7f5538ada700 0 monclient: hunting for new mon and hangs Then I would like to do this: $ sudo ceph-mon -i gail --public-addr 172.16.1.12 but it is useless as the previous command failed. Would anybody guess what I am doing wrong ? I use ceph 0.80 on an Ubuntu trusty. My ceph.conf is as follows : [global] fsid = a98faf65-b105-4ec7-913c-f8a33a4db4d1 mon initial members = grenier mon host = 172.16.1.11 public network = 172.16.0.0/16 auth cluster required = none auth service required = none auth client required = none osd journal size = 1024 filestore xattr use omap = true osd pool default size = 2 osd pool default min size = 1 osd pool default pg num = 333 osd pool default pgp num = 333 osd crush chooseleaf type = 1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS
Mark and all, Ceph IOPS performance has definitely improved with Giant. With this version: ceph version 0.84-940-g3215c52 (3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 3.14-0. I got 6340 IOPS on a single OSD SSD. (journal and data on the same partition). So basically twice the amount of IOPS that I was getting with Firefly. Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here. The SSD is still under-utilised: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdp1 0.00 540.370.00 5902.30 0.0047.1416.36 0.870.150.000.15 0.07 40.15 sdp2 0.00 0.000.00 4454.67 0.0049.1622.60 0.310.070.000.07 0.07 30.61 Thanks a ton for all your comments and assistance guys :). One last question for Sage (or other that might know), what’s the status of the S2FS implementation? (or maybe we are waiting for S2FS to provide atomic transactions?) I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr test: fremovexattr(10, user.test@5848273) = 0 On 01 Sep 2014, at 11:13, Sebastien Han sebastien@enovance.com wrote: Mark, thanks a lot for experimenting this for me. I’m gonna try master soon and will tell you how much I can get. It’s interesting to see that using 2 SSDs brings up more performance, even both SSDs are under-utilized… They should be able to sustain both loads at the same time (journal and osd data). On 01 Sep 2014, at 09:51, Somnath Roy somnath@sandisk.com wrote: As I said, 107K with IOs serving from memory, not hitting the disk.. From: Jian Zhang [mailto:amberzhan...@gmail.com] Sent: Sunday, August 31, 2014 8:54 PM To: Somnath Roy Cc: Haomai Wang; ceph-users@lists.ceph.com Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS Somnath, on the small workload performance, 107k is higher than the theoretical IOPS of 520, any idea why? Single client is ~14K iops, but scaling as number of clients increases. 10 clients ~107K iops. ~25 cpu cores are used. 2014-09-01 11:52 GMT+08:00 Jian Zhang amberzhan...@gmail.com: Somnath, on the small workload performance, 2014-08-29 14:37 GMT+08:00 Somnath Roy somnath@sandisk.com: Thanks Haomai ! Here is some of the data from my setup. -- Set up: 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) - one OSD. 5 client m/c with 12 core cpu and each running two instances of ceph_smalliobench (10 clients total). Network is 10GbE. Workload: - Small workload – 20K objects with 4K size and io_size is also 4K RR. The intent is to serve the ios from memory so that it can uncover the performance problems within single OSD. Results from Firefly: -- Single client throughput is ~14K iops, but as the number of client increases the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu cores are used. Result with latest master: -- Single client is ~14K iops, but scaling as number of clients increases. 10 clients ~107K iops. ~25 cpu cores are used. -- More realistic workload: - Let’s see how it is performing while 90% of the ios are served from disks Setup: --- 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 SSDs - 8 OSDs. One similar node for monitor and rgw. Another node for client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 GbE network Workload: 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data. Io_size = 4K RR. Results from Firefly: Aggregated output while 4 rbd clients stressing the cluster in parallel is ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember precisely) Results from latest master: Aggregated output while 4 rbd clients stressing the cluster in parallel is ~120K IOPS , cpu is 7% idle i.e ~37-38 cpu cores. Hope this helps. Thanks Regards Somnath -Original Message- From: Haomai Wang [mailto:haomaiw...@gmail.com] Sent: Thursday, August 28, 2014 8:01 PM To: Somnath Roy Cc: Andrey Korolyov; ceph-users@lists.ceph.com Subject: Re: [ceph-users]
[ceph-users] Questions regarding Crush Map
Hi, I have some general questions regarding the crush map. It would be helpful if someone can help me out by clarifying them. 1. I saw that a bucket 'host' is always created for the crush maps which are automatically generated by ceph. If I am manually creating crushmap, do I need to always add a bucket called ' host' ? As I was looking through the source code, I didn't see any need for this. If not necessary, can osd's of the same host be split into mulitple buckets? eg : Say host 1 has four osd's- osd.0,osd.1,osd.2, osd.3 host 2 has four osd's- osd.4,osd.5,osd.6,osd.7 and create two buckets - HostGroup bucket1- {osd.0, osd.1,osd.4,osd.5} HostGroup bucket2-{osd.2,osd.3,osd.6,osd.7} where HostGroup is new bucket type instead of the default 'host' type. Is this configuration possible or invalid? If this is possible, I can group SSD's of all hosts into 1 bucket and HDD's into other. 2. I have read in Ceph docs that same osd is not advised to be part of two buckets(two pools). Is there any reason for it? But,I couldn't find this limitation in the source code. eg:osd.0 is in bucket1 and bucket2. Is this configuration possible or invalid? If this is possible, I have the flexibility to have group data which are written to different pools. 3. Is it possible to exclude or include a particular osd/host/rack in the crush mapping?. eg: I need to have third replica always in rack3 (a specified row/rack/host based on requirements) . First two can be chosen randomly If possible, how can I configure it? 4. It is said that osd weights must be configured based on the storage. Say if I have SSD of 512 GB and HDD of 1 TB and if I configure .5 and 1 respectively, am I treating both SSD and HDD equally? How do I prioritize SSD over HDD? 5. Continuing from 4), If i have mix of SSD's and HDD's in the same host, what are the best ways possible to utilize the SSD capabilities in the ceph cluster? Looking forward to your help, Thanks, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com