RE: [ceph-users] Does CEPH rely on any multicasting?
Recall that Ceph already incorporates its own cluster-management framework, and the various Ceph daemons already operate in a clustered manner. Sure. But it guess it could reduce 'ceph' code size if you use an existing framework. We (Proxmox VE) run corosync by default on all nodes, so it would also make configuration easier.
RE: [ceph-users] Does CEPH rely on any multicasting?
Does CEPH rely on any multicasting? Appreciate the feedback.. Nope! All networking is point-to-point. Besides, it would be great if ceph could use existing cluster stacks like corosync, ... Is there any plan to support that? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Proxmox and ceph integration
We will update our test repository later today (we just do final bug fixing ...) I will send a note when it is online. I found https://git.proxmox.com/?p=pve- manager.git;a=blob;f=PVE/CephTools.pm;h=f7f11ce2dc515cfd4fb423dbcf6dbbe 4877b8afa;hb=3f5368bb67146d89f7ed098dc0c036c2217ed193 but not I'm not familiar with proxmox :-) On 24/01/2014 09:33, Loic Dachary wrote: Hi, I'm told today that the Ceph proxmox integration has recently been improved. I'm curious to take a look, would you be so kind as to point me to the URL where I could read about it ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Proxmox and ceph integration
I'm told today that the Ceph proxmox integration has recently been improved. I'm curious to take a look, would you be so kind as to point me to the URL where I could read about it ? Martin just posted the links to the user list: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007562.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
API json format question
The command signature for osd pool create is: 'cmd119' = { 'sig' = [ 'osd', 'pool', 'create', { 'name' = 'pool', 'type' = 'CephPoolname' }, { 'range' = '0', 'name' = 'pg_num', 'type' = 'CephInt' }, { 'req' = 'false', 'range' = '0', 'name' = 'pgp_num', 'type' = 'CephInt' }, { 'n' = 'N', 'req' = 'false', 'goodchars' = '[A-Za-z0-9-_.=]', 'name' = 'properties', 'type' = 'CephString' } ], But it seems that 'properties' are ignored. I tried: 2014-01-17 09:39:13.010675 2b3a6975e700 0 mon.0@0(leader) e1 handle_command mon_command({format:json,pg_num:32,pool:test3,properties:[size=3,min_size=2,crush_ruleset=1],prefix:osd pool create} v 0) v1 Is something wrong with the format? (I do not get any error message). -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ceph cli delay when one mon is down
Caching the last successfully connected mon isn't a bad idea either... care to open a feature ticket? http://tracker.ceph.com/issues/7150 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph cli delay when one mon is down
Hi all, I use a few helper scripts to automate things, and those scripts call 'ceph' command multiple times, like #/bin/sh ceph do something ... ceph do something else ... --- I get a delay when one monitor is down, until a working mon is found. That is OK so far. But I get that delay each times when I run a 'ceph' command. It would be great if ceph remembers the last successful mon connection, so that such delays are avoided. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: ceph cli delay when one mon is down
You can avoid this, and speed things up in general, by using the interactive mode: #!/bin/sh ceph EOM do something do something else EOM Above is a bit clumsy. To be honest, I want to do things with perl, so I guess it is better to use perl bindings for librados. Are perl bindings already available? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
missing health status details
Just playing around, and detected this: # ceph health -f json-pretty { summary: [], timechecks: { epoch: 34, round: 40, round_status: finished, mons: [ { name: 1, skew: 0.00, latency: 0.00, health: HEALTH_OK}, { name: 0, skew: -0.000330, latency: 0.001454, health: HEALTH_OK}, { name: 2, skew: 0.054170, latency: 0.000816, health: HEALTH_WARN, details: clock skew 0.0541703s max 0.05s}]}, health: { health_services: [ { mons: [ { name: 1, kb_total: 2064208, kb_used: 1424964, kb_avail: 534388, avail_percent: 25, last_updated: 2013-11-22 11:14:47.500872, store_stats: { bytes_total: 9524208, bytes_sst: 8999282, bytes_log: 458752, bytes_misc: 66174, last_updated: 0.00}, health: HEALTH_WARN, health_detail: low disk space!}, { name: 0, kb_total: 2064208, kb_used: 1249116, kb_avail: 710236, avail_percent: 34, last_updated: 2013-11-22 11:14:53.498853, store_stats: { bytes_total: 10048528, bytes_sst: 8999314, bytes_log: 983040, bytes_misc: 66174, last_updated: 0.00}, health: HEALTH_OK}, { name: 2, kb_total: 2064208, kb_used: 1247084, kb_avail: 712268, avail_percent: 34, last_updated: 2013-11-22 11:14:28.720368, store_stats: { bytes_total: 10048528, bytes_sst: 8999314, bytes_log: 983040, bytes_misc: 66174, last_updated: 0.00}, health: HEALTH_OK}]}]}, overall_status: HEALTH_WARN, detail: [ mon.2 addr 192.168.3.13:6789\/0 clock skew 0.0541703s max 0.05s (latency 0.000815989s)]} Why is low disk space! not included inside 'details'? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
possible bug in init-ceph.in
http://ceph.com/git/?p=ceph.git;a=blob;f=src/init-ceph.in;h=7399abb8f85855f2248c4afb22bf94f2e2f080a2;hb=HEAD line 320: if [ ${update_crush:-1} = 1 -o {$update_crush:-1} = true ]; then looks strange to me. Maybe that should be: - if [ ${update_crush:-1} = 1 -o {$update_crush:-1} = true ]; then + if [ ${update_crush:-1} = 1 -o ${update_crush:-1} = true ]; then -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: v0.67 Dumpling released
Another three months have gone by, and the next stable release of Ceph is ready: Dumpling! Thank you to everyone who has contributed to this release! Seem there is a new dependency in the ceph-common debian package: python-ceph Is that really required (previous release does not depend on that)? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: incremental rbd export / sparse files?
Step 2 is to export the incremental changes. The hangup there is figuring out a generic and portable file format to represent those incremental changes; we'd rather not invent something ourselves that is ceph-specific. Suggestions welcome! AFAIK, both 'zfs' and 'btrfs' already have such format. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Limited IOP/s on Dual Xeon KVM Host
Got it fixed by Bios update... crazy. So you can't see any regression in IOPS with more cores? Can you speedup things with process pinning? Just want to make sure that NUMA does not play any role here. N�r��yb�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: less cores more iops / speed
I've noticed something really interesting. I get 5000 iops / VM for rand. 4k writes while assigning 4 cores on a 2.5 Ghz Xeon. When i move this VM to another kvm host with 3.6Ghz i get 8000 iops (still 8 cores) when i then LOWER the assigned cores from 8 to 4 i get 14.500 iops. If i assign only 2 cores i get 16.000 iops... Why does less kvm cores mean more speed? There is a serious bug in the kvm vhost code. Do you use virtio-net with vhost? see: http://lists.nongnu.org/archive/html/qemu-devel/2012- 11/msg00579.html Please test using the e1000 driver instead. Or update the guest kernel (what guest kernel do you use?). AFAIK 3.X kernels does not trigger the bug. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: less cores more iops / speed
Why is vhost net driver involved here at all? Kvm guest only uses ssh here. I though you are testing things (rdb) which depends on KVM network speed? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: RBD trim / unmap support?
hw/ide/qdev.c:error_report(discard_granularity must be 512 for ide); -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Stefan Priebe - Profihost AG Sent: Freitag, 02. November 2012 09:20 To: Josh Durgin Cc: ceph-devel@vger.kernel.org Subject: Re: RBD trim / unmap support? Am 02.11.2012 00:36, schrieb Josh Durgin: On 11/01/2012 04:33 PM, Stefan Priebe wrote: Hello list, does rbd support trim / unmap? Or is it planned to support it? Greets, Stefan librbd (and thus qemu) support it. The rbd kernel module does not yet. See http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim Thanks! Is there any recommanded value for discard_granularity? With fstrim and iscsi i use 128kb. Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: slow fio random read benchmark, need help
I do not really understand that network latency argument. If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph? Note: network latency is the same in both cases What do I miss? -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- ow...@vger.kernel.org] On Behalf Of Alexandre DERUMIER Sent: Mittwoch, 31. Oktober 2012 18:27 To: Marcus Sorensen Cc: Sage Weil; ceph-devel Subject: Re: slow fio random read benchmark, need help Thanks Marcus, indeed gigabit ethernet. note that my iscsi results (40k)was with multipath, so multiple gigabit links. I have also done tests with a netapp array, with nfs, single link, I'm around 13000 iops I will do more tests with multiples vms, from differents hosts, and with -- numjobs. I'll keep you in touch, Thanks for help, Regards, Alexandre - Mail original - De: Marcus Sorensen shadow...@gmail.com À: Alexandre DERUMIER aderum...@odiso.com Cc: Sage Weil s...@inktank.com, ceph-devel ceph- de...@vger.kernel.org Envoyé: Mercredi 31 Octobre 2012 18:08:11 Objet: Re: slow fio random read benchmark, need help 5000 is actually really good, if you ask me. Assuming everything is connected via gigabit. If you get 40k iops locally, you add the latency of tcp, as well as that of the ceph services and VM layer, and that's what you get. On my network I get about a .1ms round trip on gigabit over the same switch, which by definition can only do 10,000 iops. Then if you have storage on the other end capable of 40k iops, you add the latencies together (.1ms + .025ms) and you're at 8k iops. Then add the small latency of the application servicing the io (NFS, Ceph, etc), and the latency introduced by your VM layer, and 5k sounds about right. The good news is that you probably aren't taxing the storage, you can likely do many simultaneous tests from several VMs and get the same results. You can try adding --numjobs to your fio to parallelize the specific test you're doing, or launching a second VM and doing the same test at the same time. This would be a good indicator if it's latency. On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER aderum...@odiso.com wrote: Have you tried increasing the iodepth? Yes, I have try with 100 and 200, same results. I have also try directly from the host, with /dev/rbd1, and I have same result. I have also try with 3 differents hosts, with differents cpus models. (note: I can reach around 40.000 iops with same fio config on a zfs iscsi array) My test ceph cluster nodes cpus are old (xeon E5420), but they are around 10% usage, so I think it's ok. Do you have an idea if I can trace something ? Thanks, Alexandre - Mail original - De: Sage Weil s...@inktank.com À: Alexandre DERUMIER aderum...@odiso.com Cc: ceph-devel ceph-devel@vger.kernel.org Envoyé: Mercredi 31 Octobre 2012 16:57:05 Objet: Re: slow fio random read benchmark, need help On Wed, 31 Oct 2012, Alexandre DERUMIER wrote: Hello, I'm doing some tests with fio from a qemu 1.2 guest (virtio disk,cache=none), randread, with 4K block size on a small size of 1G (so it can be handle by the buffer cache on ceph cluster) fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M --iodepth=40 --group_reporting --name=file1 --ioengine=libaio --direct=1 I can't get more than 5000 iops. Have you tried increasing the iodepth? sage RBD cluster is : --- 3 nodes,with each node : -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon -cpu: 2x 4 cores intel xeon E5420@2.5GHZ rbd 0.53 ceph.conf journal dio = false filestore fiemap = false filestore flusher = false osd op threads = 24 osd disk threads = 24 filestore op threads = 6 kvm host is : 4 x 12 cores opteron During the bench: on ceph nodes: - cpu is around 10% used - iostat show no disks activity on osds. (so I think that the 1G file is handle in the linux buffer) on kvm host: -cpu is around 20% used I really don't see where is the bottleneck Any Ideas, hints ? Regards, Alexandre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: slow fio random read benchmark, need help
For the record, I'm not saying that it's the entire reason why the performance is lower (obviously since iscsi is better), I'm just saying that when you're talking about high iops, adding 100us (best case gigabit) to each request and response is significant iSCSI also uses the network (also adds 100us to esach request), so that simply can't be the reason. I always thought a distributed block storage could do such things faster (or at least as fast) than a single centralized store? - Dietmar -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: rbd export to stdout ?
Subject: Re: rbd export to stdout ? On 10/22/2012 12:41 PM, Alexandre DERUMIER wrote: Hi, I'm looking for use rbd export to stdout and not a file. I thought we should extend 'qemu-img', adding import/export there? That way it would work for all qemu storage drivers?
RE: rbd export to stdout ?
I thought we should extend 'qemu-img', adding import/export there? That way it would work for all qemu storage drivers? seem to be already possible: http://ceph.com/wiki/QEMU-RBD qemu-img convert -f qcow2 -O rbd /data/debian_lenny_amd64_small.qcow2 rbd:data/lenny Oh - I missed that. Thanks for the hint. N�r��yb�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: RBD performance - tuning hints
RBD waits for the data to be on disk on all replicas. It's pretty easy to relax this to in memory on all replicas, but there's no option for that right now. I thought that is dangerous, because you can loose data?