optmize librbd for iops
Hello list, are there any plans to optimize librbd for iops? Right now i'm able to get 50.000 iop/s via iscsi and 100.000 iop/s using multipathing with iscsi. With librbd i'm stuck to around 18.000iops. As this scales with more hosts but not with more disks in a vm. It must be limited by rbd implementation in kvm / librbd. Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disabling journal
On Sun, 11 Nov 2012, Stefan Priebe wrote: Hi Sage, With btrfs, yes, although this isn't something we have tested in a while. I'm not using btrfs as long as the devs claim it is not ready for prod. In that case, the journal is needed for consistency of the fs; we rely on writeahead journaling. It can't be turned off. Putting it on a ramdisk in this case is interesting for performance, but it means that a crash/reboot/powerloss event leaves the fs in an inconsistent and unusable state. The only time tmpfs is potentially useful in production is when you're using btrfs *and* have independent backup power sources for replicas (and can thus avoid worrying about a site-wide power failure and loss of journal). (Or have relaxed requirements for the durability of recent writes.) sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disabling journal
Am 12.11.2012 15:42, schrieb Sage Weil: On Sun, 11 Nov 2012, Stefan Priebe wrote: Hi Sage, With btrfs, yes, although this isn't something we have tested in a while. I'm not using btrfs as long as the devs claim it is not ready for prod. In that case, the journal is needed for consistency of the fs; we rely on writeahead journaling. It can't be turned off. Putting it on a ramdisk in this case is interesting for performance, but it means that a crash/reboot/powerloss event leaves the fs in an inconsistent and unusable state. But only if for replicas 2 both nodes crash / have a powerloss? The only time tmpfs is potentially useful in production is when you're using btrfs *and* have independent backup power sources for replicas (and can thus avoid worrying about a site-wide power failure and loss of journal). (Or have relaxed requirements for the durability of recent writes.) What happens for XFS and replicas two and ONE host has a power loss? The other replica / journal should be still there. I've no idea where to put the journal on. I mean i've 8 SSDs per Host one per osd each with a write IOP/s speed of 45.000 iops to whole IOP/s write speed of 360.000 IOP/s per Node. Which journal device can handle this? And if i put the journal on the same disk as the OSD it has to copy the data around. Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disabling journal
On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Am 12.11.2012 15:42, schrieb Sage Weil: On Sun, 11 Nov 2012, Stefan Priebe wrote: Hi Sage, With btrfs, yes, although this isn't something we have tested in a while. I'm not using btrfs as long as the devs claim it is not ready for prod. In that case, the journal is needed for consistency of the fs; we rely on writeahead journaling. It can't be turned off. Putting it on a ramdisk in this case is interesting for performance, but it means that a crash/reboot/powerloss event leaves the fs in an inconsistent and unusable state. But only if for replicas 2 both nodes crash / have a powerloss? Then you're okay.. but the one that lost the journal effectively also lost the contents of the SSD. Also, manual intervention is currently needed to reinitialize the osd (since this is not a normal failure mode). The only time tmpfs is potentially useful in production is when you're using btrfs *and* have independent backup power sources for replicas (and can thus avoid worrying about a site-wide power failure and loss of journal). (Or have relaxed requirements for the durability of recent writes.) What happens for XFS and replicas two and ONE host has a power loss? The other replica / journal should be still there. I've no idea where to put the journal on. I mean i've 8 SSDs per Host one per osd each with a write IOP/s speed of 45.000 iops to whole IOP/s write speed of 360.000 IOP/s per Node. Which journal device can handle this? And if i put the journal on the same disk as the OSD it has to copy the data around. I think you have two choices. Either put the journal SSDs (perhaps a journal on an existing one), or use a higher-end NVRAM-based device. There are several of these out there, although I'm blanking on product names at the moment. The best are probably the battery-backed DRAM ones with a bit of flash for when the battery gets low. Lots of RAID controllers also have some onboard NVRAM that can often be finagled into being useful, at least with spinning disks; I'm not sure how they perform with SSDs. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph cluster hangs when rebooting one node
Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. ceph -w: Looks like this: 2012-11-12 16:03:58.191106 mon.0 [INF] pgmap v19013: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:08.365557 mon.0 [INF] mon.a calling new monitor election 2012-11-12 16:04:13.422682 mon.0 [INF] mon.a@0 won leader election with quorum 0,2 2012-11-12 16:04:13.708045 mon.0 [INF] pgmap v19014: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:13.708059 mon.0 [INF] mdsmap e1: 0/0/1 up 2012-11-12 16:04:13.708070 mon.0 [INF] osdmap e4582: 20 osds: 20 up, 20 in 2012-11-12 16:04:08.242688 mon.2 [INF] mon.c calling new monitor election 2012-11-12 16:04:13.708089 mon.0 [INF] monmap e1: 3 mons at {a=10.255.0.100:6789/0,b=10.255.0.101:6789/0,c=10.255.0.102:6789/0} 2012-11-12 16:04:14.070593 mon.0 [INF] pgmap v19015: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:15.283954 mon.0 [INF] pgmap v19016: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:18.506812 mon.0 [INF] osd.21 10.255.0.101:6800/5049 failed (3 reports from 3 peers after 20.339769 = grace 20.00) 2012-11-12 16:04:18.890003 mon.0 [INF] osdmap e4583: 20 osds: 19 up, 20 in 2012-11-12 16:04:19.137936 mon.0 [INF] pgmap v19017: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:20.024595 mon.0 [INF] osdmap e4584: 20 osds: 19 up, 20 in 2012-11-12 16:04:20.330149 mon.0 [INF] pgmap v19018: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:21.535471 mon.0 [INF] pgmap v19019: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:24.181292 mon.0 [INF] osd.22 10.255.0.101:6803/5153 failed (3 reports from 3 peers after 23.013550 = grace 20.00) 2012-11-12 16:04:24.182208 mon.0 [INF] osd.23 10.255.0.101:6806/5276 failed (3 reports from 3 peers after 21.000834 = grace 20.00) 2012-11-12 16:04:24.671373 mon.0 [INF] pgmap v19020: 7032 pgs: 6637 active+clean, 208 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB used, 4295 GB / 4469 GB avail 2012-11-12 16:04:24.829022 mon.0 [INF] osdmap e4585: 20 osds: 17 up, 20 in 2012-11-12 16:04:24.870969 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 reports from 3 peers after 20.688672 = grace 20.00) 2012-11-12 16:04:25.522333 mon.0 [INF] pgmap v19021: 7032 pgs: 5912 active+clean, 933 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB used, 4295 GB / 4469 GB avail 2012-11-12 16:04:25.596927 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 reports from 3 peers after 21.708444 = grace 20.00) 2012-11-12 16:04:26.077545 mon.0 [INF] osdmap e4586: 20 osds: 16 up, 20 in 2012-11-12 16:04:26.606475 mon.0 [INF] pgmap v19022: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:27.162034 mon.0 [INF] osdmap e4587: 20 osds: 16 up, 20 in 2012-11-12 16:04:27.656974 mon.0 [INF] pgmap v19023: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:30.229958 mon.0 [INF] pgmap v19024: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:31.411989 mon.0 [INF] pgmap v19025: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:32.617576 mon.0 [INF] pgmap v19026: 7032 pgs: 4660 active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB avail 2012-11-12 16:04:35.172861 mon.0 [INF] pgmap v19027: 7032 pgs: 4660 active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB avail 2012-11-12 16:04:30.505872 osd.53 [WRN] 6 slow requests, 6 included below; oldest blocked for 30.247691 secs 2012-11-12 16:04:30.505875 osd.53 [WRN] slow request 30.247691 seconds old, received at 2012-11-12 16:04:00.258118: osd_op(client.131626.0:771962 rb.0.107a.734602d5.0bce [write 2478080~4096] 3.562a9efc) v4 currently reached pg 2012-11-12 16:04:30.505879 osd.53 [WRN] slow request 30.238016 seconds old, received at 2012-11-12 16:04:00.267793: osd_op(client.131626.0:772116 rb.0.107a.734602d5.1608 [write 262144~4096] 3.a47890e) v4 currently reached pg 2012-11-12 16:04:30.505881 osd.53 [WRN] slow request 30.236572 seconds old, received at 2012-11-12 16:04:00.269237: osd_op(client.131626.0:772141 rb.0.107a.734602d5.1777 [write 798720~4096] 3.547bc855) v4 currently reached pg 2012-11-12 16:04:30.505883 osd.53 [WRN] slow request
Re: [BUG] ceph-mon crashes
Am 12.11.2012 15:58, schrieb Joao Eduardo Luis: Hi Stefan, Any chance you can get me a larger chunk of the log from the monitor that was the leader by the time you issued those commands until the point the monitor crashed (from the excerpt you provided, that should be mon.b)? Sure: https://www.dropbox.com/s/8e604bihk56m0yd/ceph-mon.b.log.1.gz Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph cluster hangs when rebooting one node
On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with ceph osd pool set poolname min_size 1 sage ceph -w: Looks like this: 2012-11-12 16:03:58.191106 mon.0 [INF] pgmap v19013: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:08.365557 mon.0 [INF] mon.a calling new monitor election 2012-11-12 16:04:13.422682 mon.0 [INF] mon.a@0 won leader election with quorum 0,2 2012-11-12 16:04:13.708045 mon.0 [INF] pgmap v19014: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:13.708059 mon.0 [INF] mdsmap e1: 0/0/1 up 2012-11-12 16:04:13.708070 mon.0 [INF] osdmap e4582: 20 osds: 20 up, 20 in 2012-11-12 16:04:08.242688 mon.2 [INF] mon.c calling new monitor election 2012-11-12 16:04:13.708089 mon.0 [INF] monmap e1: 3 mons at {a=10.255.0.100:6789/0,b=10.255.0.101:6789/0,c=10.255.0.102:6789/0} 2012-11-12 16:04:14.070593 mon.0 [INF] pgmap v19015: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:15.283954 mon.0 [INF] pgmap v19016: 7032 pgs: 7032 active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:18.506812 mon.0 [INF] osd.21 10.255.0.101:6800/5049 failed (3 reports from 3 peers after 20.339769 = grace 20.00) 2012-11-12 16:04:18.890003 mon.0 [INF] osdmap e4583: 20 osds: 19 up, 20 in 2012-11-12 16:04:19.137936 mon.0 [INF] pgmap v19017: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:20.024595 mon.0 [INF] osdmap e4584: 20 osds: 19 up, 20 in 2012-11-12 16:04:20.330149 mon.0 [INF] pgmap v19018: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:21.535471 mon.0 [INF] pgmap v19019: 7032 pgs: 6720 active+clean, 312 stale+active+clean; 91615 MB data, 174 GB used, 4294 GB / 4469 GB avail 2012-11-12 16:04:24.181292 mon.0 [INF] osd.22 10.255.0.101:6803/5153 failed (3 reports from 3 peers after 23.013550 = grace 20.00) 2012-11-12 16:04:24.182208 mon.0 [INF] osd.23 10.255.0.101:6806/5276 failed (3 reports from 3 peers after 21.000834 = grace 20.00) 2012-11-12 16:04:24.671373 mon.0 [INF] pgmap v19020: 7032 pgs: 6637 active+clean, 208 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB used, 4295 GB / 4469 GB avail 2012-11-12 16:04:24.829022 mon.0 [INF] osdmap e4585: 20 osds: 17 up, 20 in 2012-11-12 16:04:24.870969 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 reports from 3 peers after 20.688672 = grace 20.00) 2012-11-12 16:04:25.522333 mon.0 [INF] pgmap v19021: 7032 pgs: 5912 active+clean, 933 stale+active+clean, 187 incomplete; 91615 MB data, 174 GB used, 4295 GB / 4469 GB avail 2012-11-12 16:04:25.596927 mon.0 [INF] osd.24 10.255.0.101:6809/5397 failed (3 reports from 3 peers after 21.708444 = grace 20.00) 2012-11-12 16:04:26.077545 mon.0 [INF] osdmap e4586: 20 osds: 16 up, 20 in 2012-11-12 16:04:26.606475 mon.0 [INF] pgmap v19022: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:27.162034 mon.0 [INF] osdmap e4587: 20 osds: 16 up, 20 in 2012-11-12 16:04:27.656974 mon.0 [INF] pgmap v19023: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 173 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:30.229958 mon.0 [INF] pgmap v19024: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:31.411989 mon.0 [INF] pgmap v19025: 7032 pgs: 5394 active+clean, 1094 stale+active+clean, 544 incomplete; 91615 MB data, 172 GB used, 4296 GB / 4469 GB avail 2012-11-12 16:04:32.617576 mon.0 [INF] pgmap v19026: 7032 pgs: 4660 active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB avail 2012-11-12 16:04:35.172861 mon.0 [INF] pgmap v19027: 7032 pgs: 4660 active+clean, 2372 incomplete; 91615 MB data, 171 GB used, 4298 GB / 4469 GB avail 2012-11-12 16:04:30.505872 osd.53 [WRN] 6 slow requests, 6 included below; oldest blocked for 30.247691 secs 2012-11-12 16:04:30.505875 osd.53 [WRN] slow request 30.247691 seconds old, received at 2012-11-12 16:04:00.258118: osd_op(client.131626.0:771962 rb.0.107a.734602d5.0bce [write 2478080~4096] 3.562a9efc) v4 currently reached pg 2012-11-12 16:04:30.505879 osd.53 [WRN] slow request 30.238016 seconds old, received at 2012-11-12 16:04:00.267793: osd_op(client.131626.0:772116 rb.0.107a.734602d5.1608 [write
Re: ceph cluster hangs when rebooting one node
Am 12.11.2012 16:11, schrieb Sage Weil: On Mon, 12 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, i was checking what happens if i reboot a ceph node. Sadly if i reboot one node, the whole ceph cluster hangs and no I/O is possible. If you are using the current master, the new 'min_size' may be biting you; ceph osd dump | grep ^pool and see if you see min_size for your pools. You can change that back to the norma behavior with No i don't see any min size: # ceph osd dump | grep ^pool pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1344 pgp_num 1344 last_change 1 owner 0 pool 3 'kvmpool1' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 3000 pgp_num 3000 last_change 958 owner 0 ceph osd pool set poolname min_size 1 Yes this helps! But min_size is still not shown in ceph osd dump. Also when i reboot a node it takes up to 10s-20s until all osds from this node are set to failed and the I/O starts again. Should i issue an ceph osd out command before? But i had already this set for all my rules in my crushmap min_size 1 max_size 2 in my crushmap for each rule. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [pve-devel] less cores more iops / speed
Adding this to ceph.conf on kvm host adds another 2000 iops (20.000 iop/s with one VM). I'm sure most of them are useless on a client kvm / rbd host but i don't know which one makes sense ;-) [global] debug ms = 0/0 debug rbd = 0/0 debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 [client] debug ms = 0/0 debug rbd = 0/0 debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 Stefan Am 12.11.2012 15:35, schrieb Alexandre DERUMIER: Another idea, do you have tried to put debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 in a ceph.conf on your kvm host ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe - Profihost AG s.pri...@profihost.ag Cc: pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:26:36 Objet: Re: [pve-devel] less cores more iops / speed Maybe some tracing on kvm process could give us clues to find where the cpu is used ? Also another idea, can you try with auth supported=none ? maybe they are some overhead with ceph authenfication ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe - Profihost AG s.pri...@profihost.ag Cc: pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:20:07 Objet: Re: [pve-devel] less cores more iops / speed Ok thanks. Seem to use a lot of cpu vs nfs,iscsi ... I hope that ceph dev will work on this soon ! - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:05:08 Objet: Re: [pve-devel] less cores more iops / speed Am 12.11.2012 13:49, schrieb Alexandre DERUMIER: One VM on one Host: 18.000 IOP/s Two VM on one Host: 2x11.000 IOP/s Three VM on one Host: 3x7.000 IOP/s And host cpu is 100% ? No. For three VMs yes. For one and two no. I think librbd / rbd implementation in kvm is the bottleneck here. Stefan - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 12:58:35 Objet: Re: [pve-devel] less cores more iops / speed Am 12.11.2012 08:51, schrieb Alexandre DERUMIER: Right now RBD in KVM is limited by CPU speed. Good to known, so it's seem lack of threading, or maybe somes locks. (so faster cpu give more iops). If you lauch parallel fio on same host on different guest, do you get more total iops ? (for me it's scale) One VM on one Host: 18.000 IOP/s Two VM on one Host: 2x11.000 IOP/s Three VM on one Host: 3x7.000 IOP/s if you launch 2 parallel fio, on same guest (on differents disk), do you get more iops ? (for me, it doesn't scale, so raid0 in guest doesn't help). No it doesn't scale. Stefan - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Dimanche 11 Novembre 2012 13:07:36 Objet: Re: [pve-devel] less cores more iops / speed Am 11.11.2012 12:12, schrieb Alexandre DERUMIER: If I remember good, stefan can achieve 100.000 iops with iscsi with same kvm host. Correct but this was always with scsi-generic and I/O multipathing on host. rbd does not support scsi-generic ;-( I have checked ceph mailing, stefan seem to have resolved his problem with dual core with bios update ! Correct. So speed on Dual Xeon is now 14.000 IOP/s and 18.000 IOP/s on Single Xeon. But the difference is an issue of the CPU Speed. 3,6Ghz Single Xeon vs.
improve speed with auth supported=none
Hello list, i'm still trying to improve ceph speed. Disable logging on host and rbd client gives me additional 5000 iop/s which is great. But i also wanted to try disabling authentication using: auth supported=none How does this work? Do i just have to place this line under global section in ceph.conf? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
changed rbd cp behavior in 0.53
Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path is omitted. rbd cp install/img testimg rbd ls install img testimg Is this change permanent? Thanks! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] ceph-mon crashes
On 11/12/2012 03:10 PM, Stefan Priebe - Profihost AG wrote: Am 12.11.2012 15:58, schrieb Joao Eduardo Luis: Hi Stefan, Any chance you can get me a larger chunk of the log from the monitor that was the leader by the time you issued those commands until the point the monitor crashed (from the excerpt you provided, that should be mon.b)? Sure: https://www.dropbox.com/s/8e604bihk56m0yd/ceph-mon.b.log.1.gz Greets, Stefan Hi Stefan, Thanks for the log. Can you please confirm me that sometime between you issuing the out command and mon.b failing, you had yet another monitor (maybe mon.a) that was the leader but for some reason it was down by the time that mon.b failed? If so, could you provide the log for that monitor as well, given this log doesn't have some infos I'm looking for? -Joao -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull request: ceph-qa-suite branch wip-java
This patch adds a yaml file to add the libcephfs-java tests to the nightly qa test set. Best, -Joe Buck -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: improve speed with auth supported=none
I guess you can refer to that link on the list: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/9776 btw do you get 5000 iop/s on the rbd kernel or on a vm disk? cheers. On Mon, Nov 12, 2012 at 4:37 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hello list, i'm still trying to improve ceph speed. Disable logging on host and rbd client gives me additional 5000 iop/s which is great. But i also wanted to try disabling authentication using: auth supported=none How does this work? Do i just have to place this line under global section in ceph.conf? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] ceph-mon crashes
Hi Joao, Am 12.11.2012 18:05, schrieb Joao Eduardo Luis: Can you please confirm me that sometime between you issuing the out command and mon.b failing, you had yet another monitor (maybe mon.a) that was the leader but for some reason it was down by the time that mon.b failed? If so, could you provide the log for that monitor as well, given this log doesn't have some infos I'm looking for? Not sure but here are the logs of the other two mons: https://www.dropbox.com/s/jztsedvj1b2kjje/ceph-mon.a.log.1.gz https://www.dropbox.com/s/62jkfbbbgvs5o25/ceph-mon.c.log.1.gz Thanks, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [pve-devel] less cores more iops / speed
On 11/12/2012 07:33 AM, Stefan Priebe - Profihost AG wrote: Adding this to ceph.conf on kvm host adds another 2000 iops (20.000 iop/s with one VM). I'm sure most of them are useless on a client kvm / rbd host but i don't know which one makes sense ;-) [global] debug ms = 0/0 debug rbd = 0/0 debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 [client] For the client side you'd these settings to disable all debug logging: [client] debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug filer = 0/0 debug objecter = 0/0 debug rados = 0/0 debug rbd = 0/0 debug objectcacher = 0/0 debug client = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 Josh debug ms = 0/0 debug rbd = 0/0 debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 Stefan Am 12.11.2012 15:35, schrieb Alexandre DERUMIER: Another idea, do you have tried to put debug lockdep = 0/0 debug context = 0/0 debug crush = 0/0 debug buffer = 0/0 debug timer = 0/0 debug journaler = 0/0 debug osd = 0/0 debug optracker = 0/0 debug objclass = 0/0 debug filestore = 0/0 debug journal = 0/0 debug ms = 0/0 debug monc = 0/0 debug tp = 0/0 debug auth = 0/0 debug finisher = 0/0 debug heartbeatmap = 0/0 debug perfcounter = 0/0 debug asok = 0/0 debug throttle = 0/0 in a ceph.conf on your kvm host ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe - Profihost AG s.pri...@profihost.ag Cc: pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:26:36 Objet: Re: [pve-devel] less cores more iops / speed Maybe some tracing on kvm process could give us clues to find where the cpu is used ? Also another idea, can you try with auth supported=none ? maybe they are some overhead with ceph authenfication ? - Mail original - De: Alexandre DERUMIER aderum...@odiso.com À: Stefan Priebe - Profihost AG s.pri...@profihost.ag Cc: pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:20:07 Objet: Re: [pve-devel] less cores more iops / speed Ok thanks. Seem to use a lot of cpu vs nfs,iscsi ... I hope that ceph dev will work on this soon ! - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 15:05:08 Objet: Re: [pve-devel] less cores more iops / speed Am 12.11.2012 13:49, schrieb Alexandre DERUMIER: One VM on one Host: 18.000 IOP/s Two VM on one Host: 2x11.000 IOP/s Three VM on one Host: 3x7.000 IOP/s And host cpu is 100% ? No. For three VMs yes. For one and two no. I think librbd / rbd implementation in kvm is the bottleneck here. Stefan - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Lundi 12 Novembre 2012 12:58:35 Objet: Re: [pve-devel] less cores more iops / speed Am 12.11.2012 08:51, schrieb Alexandre DERUMIER: Right now RBD in KVM is limited by CPU speed. Good to known, so it's seem lack of threading, or maybe somes locks. (so faster cpu give more iops). If you lauch parallel fio on same host on different guest, do you get more total iops ? (for me it's scale) One VM on one Host: 18.000 IOP/s Two VM on one Host: 2x11.000 IOP/s Three VM on one Host: 3x7.000 IOP/s if you launch 2 parallel fio, on same guest (on differents disk), do you get more iops ? (for me, it doesn't scale, so raid0 in guest doesn't help). No it doesn't scale. Stefan - Mail original - De: Stefan Priebe s.pri...@profihost.ag À: Alexandre DERUMIER aderum...@odiso.com Cc: eric e...@netwalk.com, pve-de...@pve.proxmox.com Envoyé: Dimanche 11 Novembre 2012 13:07:36 Objet: Re: [pve-devel]
Re: [pve-devel] less cores more iops / speed
Hi Josh, For the client side you'd these settings to disable all debug logging: ... Thanks! Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] ceph-mon crashes
On 11/12/2012 06:30 PM, Stefan Priebe wrote: Hi Joao, Am 12.11.2012 18:05, schrieb Joao Eduardo Luis: Can you please confirm me that sometime between you issuing the out command and mon.b failing, you had yet another monitor (maybe mon.a) that was the leader but for some reason it was down by the time that mon.b failed? If so, could you provide the log for that monitor as well, given this log doesn't have some infos I'm looking for? Not sure but here are the logs of the other two mons: https://www.dropbox.com/s/jztsedvj1b2kjje/ceph-mon.a.log.1.gz https://www.dropbox.com/s/62jkfbbbgvs5o25/ceph-mon.c.log.1.gz Thanks, Stefan Thanks Stefan, I'll be looking into this. For future reference, I created issue #3477 on the tracker: http://tracker.newdream.net/issues/3477 -Joao -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Build regressions/improvements in v3.7-rc5
On Mon, Nov 12, 2012 at 9:58 PM, Geert Uytterhoeven ge...@linux-m68k.org wrote: JFYI, when comparing v3.7-rc5 to v3.7-rc4[3], the summaries are: - build errors: +14/-4 14 regressions: + drivers/virt/fsl_hypervisor.c: error: 'MSR_GS' undeclared (first use in this function): = 799:93 + error: No rule to make target drivers/scsi/aic7xxx/aicasm/*.[chyl]: = N/A + net/ceph/ceph_common.c: error: dereferencing pointer to incomplete type: = 272:13 + net/ceph/ceph_common.c: error: implicit declaration of function 'request_key' [-Werror=implicit-function-declaration]: = 249:2 + net/ceph/crypto.c: error: dereferencing pointer to incomplete type: = 463:19, 434:46, 452:5, 448:52, 429:23, 467:36, 447:18 + net/ceph/crypto.c: error: implicit declaration of function 'key_payload_reserve' [-Werror=implicit-function-declaration]: = 437:2 + net/ceph/crypto.c: error: implicit declaration of function 'register_key_type' [-Werror=implicit-function-declaration]: = 481:2 + net/ceph/crypto.c: error: implicit declaration of function 'unregister_key_type' [-Werror=implicit-function-declaration]: = 485:2 + net/ceph/crypto.c: error: unknown field 'destroy' specified in initializer: = 477:2 + net/ceph/crypto.c: error: unknown field 'instantiate' specified in initializer: = 475:2 + net/ceph/crypto.c: error: unknown field 'match' specified in initializer: = 476:2 + net/ceph/crypto.c: error: unknown field 'name' specified in initializer: = 474:2 + net/ceph/crypto.c: error: variable 'key_type_ceph' has initializer but incomplete type: = 473:8 powerpc-randconfig + error: relocation truncated to fit: R_PPC64_REL24 against symbol `._mcount' defined in .text section in arch/powerpc/kernel/entry_64.o: (.text+0x1ff9eb8) = (.text+0x1ffa274), (.text+0x1ff7840) powerpc-allyesconfig [1] http://kisskb.ellerman.id.au/kisskb/head/5614/ (all 117 configs) [3] http://kisskb.ellerman.id.au/kisskb/head/5600/ (all 117 configs) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] ceph-mon crashes
Thanks i'm subscribed to the tracker now. Stefan Am 12.11.2012 21:40, schrieb Joao Eduardo Luis: On 11/12/2012 06:30 PM, Stefan Priebe wrote: Hi Joao, Am 12.11.2012 18:05, schrieb Joao Eduardo Luis: Can you please confirm me that sometime between you issuing the out command and mon.b failing, you had yet another monitor (maybe mon.a) that was the leader but for some reason it was down by the time that mon.b failed? If so, could you provide the log for that monitor as well, given this log doesn't have some infos I'm looking for? Not sure but here are the logs of the other two mons: https://www.dropbox.com/s/jztsedvj1b2kjje/ceph-mon.a.log.1.gz https://www.dropbox.com/s/62jkfbbbgvs5o25/ceph-mon.c.log.1.gz Thanks, Stefan Thanks Stefan, I'll be looking into this. For future reference, I created issue #3477 on the tracker: http://tracker.newdream.net/issues/3477 -Joao -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: improve speed with auth supported=none
Thanks, this gives another burst for iops. I'm now at 23.000 iops ;-) So for random 4k iops ceph auth and especially the logging is a lot of overhead. Greets, Stefan Am 12.11.2012 19:26, schrieb Sébastien Han: I guess you can refer to that link on the list: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/9776 btw do you get 5000 iop/s on the rbd kernel or on a vm disk? cheers. On Mon, Nov 12, 2012 at 4:37 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hello list, i'm still trying to improve ceph speed. Disable logging on host and rbd client gives me additional 5000 iop/s which is great. But i also wanted to try disabling authentication using: auth supported=none How does this work? Do i just have to place this line under global section in ceph.conf? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rbd map command hangs for 15 minutes during system start up
After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it seems we no longer have this hang. On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin josh.dur...@inktank.com wrote: On 11/08/2012 02:10 PM, Mandell Degerness wrote: We are seeing a somewhat random, but frequent hang on our systems during startup. The hang happens at the point where an rbd map rbdvol command is run. I've attached the ceph logs from the cluster. The map command happens at Nov 8 18:41:09 on server 172.18.0.15. The process which hung can be seen in the log as 172.18.0.15:0/1143980479. It appears as if the TCP socket is opened to the OSD, but then times out 15 minutes later, the process gets data when the socket is closed on the client server and it retries. Please help. We are using ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe). We are using a 3.5.7 kernel with the following list of patches applied: 1-libceph-encapsulate-out-message-data-setup.patch 2-libceph-dont-mark-footer-complete-before-it-is.patch 3-libceph-move-init-of-bio_iter.patch 4-libceph-dont-use-bio_iter-as-a-flag.patch 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch 8-libceph-protect-ceph_con_open-with-mutex.patch 9-libceph-reset-connection-retry-on-successfully-negotiation.patch 10-rbd-only-reset-capacity-when-pointing-to-head.patch 11-rbd-set-image-size-when-header-is-updated.patch 12-libceph-fix-crypto-key-null-deref-memory-leak.patch 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch 17-libceph-check-for-invalid-mapping.patch 18-ceph-propagate-layout-error-on-osd-request-creation.patch 19-rbd-BUG-on-invalid-layout.patch 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch 21-ceph-avoid-32-bit-page-index-overflow.patch 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch Any suggestions? The log shows your monitors don't have time sychronized enough among them to make much progress (including authenticating new connections). That's probably the real issue. 0.2s is pretty large clock drift. One thought is that the following patch (which we could not apply) is what is required: 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch This is certainly useful too, but I don't think it's the cause of the delay in this case. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: changed rbd cp behavior in 0.53
On 11/12/2012 08:30 AM, Andrey Korolyov wrote: Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path is omitted. rbd cp install/img testimg rbd ls install img testimg Is this change permanent? Thanks! This is a regression. The previous behavior will be restored for 0.54. I added http://tracker.newdream.net/issues/3478 to track it. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rbd map command hangs for 15 minutes during system start up
On Mon, 12 Nov 2012, Nick Bartos wrote: After removing 8-libceph-protect-ceph_con_open-with-mutex.patch, it seems we no longer have this hang. Hmm, that's a bit disconcerting. Did this series come from our old 3.5 stable series? I recently prepared a new one that backports *all* of the fixes from 3.6 to 3.5 (and 3.4); see wip-3.5 in ceph-client.git. I would be curious if you see problems with that. So far, with these fixes in place, we have not seen any unexplained kernel crashes in this code. I take it you're going back to a 3.5 kernel because you weren't able to get rid of the sync problem with 3.6? sage On Thu, Nov 8, 2012 at 5:43 PM, Josh Durgin josh.dur...@inktank.com wrote: On 11/08/2012 02:10 PM, Mandell Degerness wrote: We are seeing a somewhat random, but frequent hang on our systems during startup. The hang happens at the point where an rbd map rbdvol command is run. I've attached the ceph logs from the cluster. The map command happens at Nov 8 18:41:09 on server 172.18.0.15. The process which hung can be seen in the log as 172.18.0.15:0/1143980479. It appears as if the TCP socket is opened to the OSD, but then times out 15 minutes later, the process gets data when the socket is closed on the client server and it retries. Please help. We are using ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe). We are using a 3.5.7 kernel with the following list of patches applied: 1-libceph-encapsulate-out-message-data-setup.patch 2-libceph-dont-mark-footer-complete-before-it-is.patch 3-libceph-move-init-of-bio_iter.patch 4-libceph-dont-use-bio_iter-as-a-flag.patch 5-libceph-resubmit-linger-ops-when-pg-mapping-changes.patch 6-libceph-re-initialize-bio_iter-on-start-of-message-receive.patch 7-ceph-close-old-con-before-reopening-on-mds-reconnect.patch 8-libceph-protect-ceph_con_open-with-mutex.patch 9-libceph-reset-connection-retry-on-successfully-negotiation.patch 10-rbd-only-reset-capacity-when-pointing-to-head.patch 11-rbd-set-image-size-when-header-is-updated.patch 12-libceph-fix-crypto-key-null-deref-memory-leak.patch 13-ceph-tolerate-and-warn-on-extraneous-dentry-from-mds.patch 14-ceph-avoid-divide-by-zero-in-__validate_layout.patch 15-rbd-drop-dev-reference-on-error-in-rbd_open.patch 16-ceph-Fix-oops-when-handling-mdsmap-that-decreases-max_mds.patch 17-libceph-check-for-invalid-mapping.patch 18-ceph-propagate-layout-error-on-osd-request-creation.patch 19-rbd-BUG-on-invalid-layout.patch 20-ceph-return-EIO-on-invalid-layout-on-GET_DATALOC-ioctl.patch 21-ceph-avoid-32-bit-page-index-overflow.patch 23-ceph-fix-dentry-reference-leak-in-encode_fh.patch Any suggestions? The log shows your monitors don't have time sychronized enough among them to make much progress (including authenticating new connections). That's probably the real issue. 0.2s is pretty large clock drift. One thought is that the following patch (which we could not apply) is what is required: 22-rbd-reset-BACKOFF-if-unable-to-re-queue.patch This is certainly useful too, but I don't think it's the cause of the delay in this case. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph osd crush set command under 0.53
Did the syntax and behavior of the ceph osd crush set ... command change between 0.48 and 0.53? When trying out ceph 0.53, I get the following in my log when trying to add the first OSD to a new cluster (similar behavior for osds 2 and 3). It appears that the ceph osd crush command fails, but still marks the OSDs as up and in: Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 [2012-11-12 23:19:05.759] 908/MainThread savage/INFO: execute(['ceph', 'osd', 'crush', 'set', '0', 'osd.0', '1.0', 'host=172.20.0.13', 'rack=0', 'pool=default']) Nov 12 23:19:05 node-172-20-0-14/172.20.0.14 ceph-mon: 2012-11-12 23:19:05.804080 7ffd761fe700 0 mon.1@1(peon) e1 handle_command mon_command(osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default v 0) v1 Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772215 7fad40911700 0 mon.0@0(leader) e1 handle_command mon_command(osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default v 0) v1 Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772248 7fad40911700 0 mon.0@0(leader).osd e2 adding/updating crush item id 0 name 'osd.0' weight 1 at location {host=172.20.0.13,pool=default,rack=0} Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772323 7fad40911700 1 error: didn't find anywhere to add item 0 in {host=172.20.0.13,pool=default,rack=0} Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 [2012-11-12 23:19:05.783] 908/MainThread savage/CRITICAL: Logging uncaught exception Traceback (most recent call last): File /usr/bin/sv-fred.py, line 9, in module load_entry_point('savage==.2101.118c3ebc8c0843f87e82eb047de043c8a70086bd', 'console_scripts', 'sv-fred.py')() File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 811, in main File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 798, in run File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 291, in step File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 252, in step File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 231, in _newstate File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 219, in _newstate File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 563, in action_firstboot_full File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 768, in handle_message File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 750, in start_phase File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 164, in start File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 275, in _wrap File /usr/lib64/python2.6/site-packages/savage/command/commands/ceph.py, line 50, in crush_myself File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 244, in execute File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 130, in collect_subprocess ExecutionError: Command failed: ceph osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default return_code: 1 stdout: (22) Invalid argument stderr: Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.491514 7fad40911700 1 mon.0@0(leader).osd e3 e3: 3 osds: 1 up, 1 in Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.494461 7fad40911700 0 log [INF] : osdmap e3: 3 osds: 1 up, 1 in Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.494463 mon.0 172.20.0.13:6789/0 16 : [INF] osdmap e3: 3 osds: 1 up, 1 in -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph osd crush set command under 0.53
On Mon, 12 Nov 2012, Mandell Degerness wrote: Did the syntax and behavior of the ceph osd crush set ... command change between 0.48 and 0.53? When trying out ceph 0.53, I get the following in my log when trying to add the first OSD to a new cluster (similar behavior for osds 2 and 3). It appears that the ceph osd crush command fails, but still marks the OSDs as up and in: The 'pool=default' is changed to 'root=default', as in the root of the crush hierarchy. 'pool' was confusing because there are also rados pools, which are something else entirely. (You can also omit the first '0' (i.e., just 'osd.123' and not [..., '123', 'osd.123', ...]), but both the old and new syntax are supported.) sage Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 [2012-11-12 23:19:05.759] 908/MainThread savage/INFO: execute(['ceph', 'osd', 'crush', 'set', '0', 'osd.0', '1.0', 'host=172.20.0.13', 'rack=0', 'pool=default']) Nov 12 23:19:05 node-172-20-0-14/172.20.0.14 ceph-mon: 2012-11-12 23:19:05.804080 7ffd761fe700 0 mon.1@1(peon) e1 handle_command mon_command(osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default v 0) v1 Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772215 7fad40911700 0 mon.0@0(leader) e1 handle_command mon_command(osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default v 0) v1 Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772248 7fad40911700 0 mon.0@0(leader).osd e2 adding/updating crush item id 0 name 'osd.0' weight 1 at location {host=172.20.0.13,pool=default,rack=0} Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:05.772323 7fad40911700 1 error: didn't find anywhere to add item 0 in {host=172.20.0.13,pool=default,rack=0} Nov 12 23:19:05 node-172-20-0-13/172.20.0.13 [2012-11-12 23:19:05.783] 908/MainThread savage/CRITICAL: Logging uncaught exception Traceback (most recent call last): File /usr/bin/sv-fred.py, line 9, in module load_entry_point('savage==.2101.118c3ebc8c0843f87e82eb047de043c8a70086bd', 'console_scripts', 'sv-fred.py')() File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 811, in main File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 798, in run File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 291, in step File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 252, in step File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 231, in _newstate File /usr/lib64/python2.6/site-packages/savage/utils/nfa.py, line 219, in _newstate File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 563, in action_firstboot_full File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 768, in handle_message File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 750, in start_phase File /usr/lib64/python2.6/site-packages/savage/services/fred.py, line 164, in start File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 275, in _wrap File /usr/lib64/python2.6/site-packages/savage/command/commands/ceph.py, line 50, in crush_myself File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 244, in execute File /usr/lib64/python2.6/site-packages/savage/utils/__init__.py, line 130, in collect_subprocess ExecutionError: Command failed: ceph osd crush set 0 osd.0 1.0 host=172.20.0.13 rack=0 pool=default return_code: 1 stdout: (22) Invalid argument stderr: Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.491514 7fad40911700 1 mon.0@0(leader).osd e3 e3: 3 osds: 1 up, 1 in Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.494461 7fad40911700 0 log [INF] : osdmap e3: 3 osds: 1 up, 1 in Nov 12 23:19:06 node-172-20-0-13/172.20.0.13 ceph-mon: 2012-11-12 23:19:06.494463 mon.0 172.20.0.13:6789/0 16 : [INF] osdmap e3: 3 osds: 1 up, 1 in -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Help] Use Ceph RBD as primary storage in CloudStack 4.0
Hi, All Has somebody used Ceph RBD in CloudStack as primary storage? I see that in the new features of CS 4.0, RBD is supported for KVM. So I tried using RBD as primary storage but met with some problems. I use a CentOS6.3 server as host. First I erase the qemu-kvm(0.12.1) and libvirt(0.9.10) because their versions are too low (Qemu on the Hypervisor has to be compiled with RBD enabled .The libvirt version on the Hypervisor has to be at least 0.10 with RBD enabled).Then I download the latest qemu(1.2.0) and libvirt(1.0.0) source code and compile and install them. But when compiling qemu source code, #wget http://wiki.qemu-project.org/download/qemu-1.2.0.tar.bz2 #tar jxvf qemu-1.2.0.tar.bz2 # cd qemu-1.2.0 # ./configure --enable-rbd the following errors occur: ERROR: User requested feature rados block device ERROR: configure was not able to find it But on Ubuntu12.04 I tried compiling qemu source code and succeed.Now I am very confused.How to use Ceph RBD as primary storage in CloudStack on CentOS6.3?Anyone can help me? Best Regards, Alex -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
MDaemon Notification -- Attachment Removed
The following message contained restricted attachment(s) which have been removed: From : ceph-devel@vger.kernel.org To: libr...@irost.org Subject : Returned mail: Data format error Message-ID: Attachment(s) removed: - instruction.pif -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: improve speed with auth supported=none
On 11/12/2012 01:57 PM, Stefan Priebe wrote: Thanks, this gives another burst for iops. I'm now at 23.000 iops ;-) So for random 4k iops ceph auth and especially the logging is a lot of overhead. How much difference did disabling auth make vs only disabling logging? Josh Greets, Stefan Am 12.11.2012 19:26, schrieb Sébastien Han: I guess you can refer to that link on the list: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/9776 btw do you get 5000 iop/s on the rbd kernel or on a vm disk? cheers. On Mon, Nov 12, 2012 at 4:37 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hello list, i'm still trying to improve ceph speed. Disable logging on host and rbd client gives me additional 5000 iop/s which is great. But i also wanted to try disabling authentication using: auth supported=none How does this work? Do i just have to place this line under global section in ceph.conf? Greets, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: optmize librbd for iops
On 11/12/2012 05:50 AM, Stefan Priebe - Profihost AG wrote: Hello list, are there any plans to optimize librbd for iops? Right now i'm able to get 50.000 iop/s via iscsi and 100.000 iop/s using multipathing with iscsi. With librbd i'm stuck to around 18.000iops. As this scales with more hosts but not with more disks in a vm. It must be limited by rbd implementation in kvm / librbd. It'd be interesting to see which layers are most limiting in this case - qemu/kvm, librados, or librbd. How does rados bench with 4k writes and then 4k reads with many concurrent IOs do? Unfortunately there's no librbd read benchmark yet. Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: improve speed with auth supported=none
Am 13.11.2012 08:42, schrieb Josh Durgin: On 11/12/2012 01:57 PM, Stefan Priebe wrote: Thanks, this gives another burst for iops. I'm now at 23.000 iops ;-) So for random 4k iops ceph auth and especially the logging is a lot of overhead. How much difference did disabling auth make vs only disabling logging? disable debug logging: 3000 iops disable auth logging: 2000 iops Is anybody in the ceph team also interested in a call graph of kvm when VM is doing random 4k write io? Greets Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: optmize librbd for iops
Am 13.11.2012 08:51, schrieb Josh Durgin: On 11/12/2012 05:50 AM, Stefan Priebe - Profihost AG wrote: Hello list, are there any plans to optimize librbd for iops? Right now i'm able to get 50.000 iop/s via iscsi and 100.000 iop/s using multipathing with iscsi. With librbd i'm stuck to around 18.000iops. As this scales with more hosts but not with more disks in a vm. It must be limited by rbd implementation in kvm / librbd. It'd be interesting to see which layers are most limiting in this case - qemu/kvm, librados, or librbd. How does rados bench with 4k writes and then 4k reads with many concurrent IOs do? Right now i'm using qemu-kvm with librbd and fio inside guest. How does the rados bench work? Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html