RE: [ceph-users] Does CEPH rely on any multicasting?

2014-05-16 Thread Dietmar Maurer
 Recall that Ceph already incorporates its own cluster-management framework,
 and the various Ceph daemons already operate in a clustered manner.

Sure. But it guess it could reduce 'ceph' code size if you use an existing 
framework.

We (Proxmox VE) run corosync by default on all nodes, so it would also make
configuration easier.



RE: [ceph-users] Does CEPH rely on any multicasting?

2014-05-15 Thread Dietmar Maurer
  Does CEPH rely on any multicasting?  Appreciate the feedback..
 
 Nope! All networking is point-to-point.

Besides, it would be great if ceph could use existing cluster stacks like 
corosync, ...
Is there any plan to support that?

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Proxmox and ceph integration

2014-01-24 Thread Dietmar Maurer
We will update our test repository later today (we just do final bug fixing ...)

I will send a note when it is online.

 I found https://git.proxmox.com/?p=pve-
 manager.git;a=blob;f=PVE/CephTools.pm;h=f7f11ce2dc515cfd4fb423dbcf6dbbe
 4877b8afa;hb=3f5368bb67146d89f7ed098dc0c036c2217ed193 but not I'm not
 familiar with proxmox :-)
 
 On 24/01/2014 09:33, Loic Dachary wrote:
  Hi,
 
  I'm told today that the Ceph proxmox integration has recently been improved.
 I'm curious to take a look, would you be so kind as to point me to the URL 
 where
 I could read about it ?
 
  Cheers
 
 
 --
 Loïc Dachary, Artisan Logiciel Libre


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Proxmox and ceph integration

2014-01-24 Thread Dietmar Maurer
 I'm told today that the Ceph proxmox integration has recently been improved.
 I'm curious to take a look, would you be so kind as to point me to the URL 
 where
 I could read about it ?

Martin just posted the links to the user list:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007562.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


API json format question

2014-01-17 Thread Dietmar Maurer
The command signature for osd pool create is:  

   'cmd119' = {
    'sig' = [
   'osd',
   'pool',
   'create',
   {
 'name' = 'pool',
     'type' = 'CephPoolname'
   },
   {
 'range' = '0',
 'name' = 'pg_num',
 'type' = 'CephInt'
   },
   {
 'req' = 'false',
 'range' = '0',
 'name' = 'pgp_num',
 'type' = 'CephInt'
   },
   {
 'n' = 'N',
 'req' = 'false',
 'goodchars' = '[A-Za-z0-9-_.=]',
 'name' = 'properties',
 'type' = 'CephString'
   }
 ],
  
But it seems that 'properties' are ignored. I tried:

2014-01-17 09:39:13.010675 2b3a6975e700  0 mon.0@0(leader) e1 handle_command 
mon_command({format:json,pg_num:32,pool:test3,properties:[size=3,min_size=2,crush_ruleset=1],prefix:osd
 pool create} v 0) v1

Is something wrong with the format? (I do not get any error message).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ceph cli delay when one mon is down

2014-01-15 Thread Dietmar Maurer
 Caching the last successfully connected mon isn't a bad idea either...
 care to open a feature ticket?

http://tracker.ceph.com/issues/7150

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph cli delay when one mon is down

2014-01-14 Thread Dietmar Maurer
Hi all,

I use a few helper scripts to automate things, and those scripts call 'ceph' 
command
multiple times, like


#/bin/sh

ceph do something
...
ceph do something else
...
---

I get a delay when one monitor is down, until a working mon is found.
That is OK so far.

But I get that delay each times when I run a 'ceph' command. It would
be great if ceph remembers the last successful mon connection, so that such 
delays are avoided. 


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: ceph cli delay when one mon is down

2014-01-14 Thread Dietmar Maurer
 You can avoid this, and speed things up in general, by using the interactive
 mode:
 
 #!/bin/sh
 ceph EOM
 do something
 do something else
 EOM

Above is a bit clumsy. To be honest, I want to do things with perl, so 
I guess it is better to use perl bindings for librados.

Are perl bindings already available? 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


missing health status details

2013-11-22 Thread Dietmar Maurer
Just playing around, and detected this:

# ceph health -f json-pretty

{ summary: [],
  timechecks: { epoch: 34,
  round: 40,
  round_status: finished,
  mons: [
{ name: 1,
  skew: 0.00,
  latency: 0.00,
  health: HEALTH_OK},
{ name: 0,
  skew: -0.000330,
  latency: 0.001454,
  health: HEALTH_OK},
{ name: 2,
  skew: 0.054170,
  latency: 0.000816,
  health: HEALTH_WARN,
  details: clock skew 0.0541703s  max 0.05s}]},
  health: { health_services: [
{ mons: [
{ name: 1,
  kb_total: 2064208,
  kb_used: 1424964,
  kb_avail: 534388,
  avail_percent: 25,
  last_updated: 2013-11-22 11:14:47.500872,
  store_stats: { bytes_total: 9524208,
  bytes_sst: 8999282,
  bytes_log: 458752,
  bytes_misc: 66174,
  last_updated: 0.00},
  health: HEALTH_WARN,
  health_detail: low disk space!},
{ name: 0,
  kb_total: 2064208,
  kb_used: 1249116,
  kb_avail: 710236,
  avail_percent: 34,
  last_updated: 2013-11-22 11:14:53.498853,
  store_stats: { bytes_total: 10048528,
  bytes_sst: 8999314,
  bytes_log: 983040,
  bytes_misc: 66174,
  last_updated: 0.00},
  health: HEALTH_OK},
{ name: 2,
  kb_total: 2064208,
  kb_used: 1247084,
  kb_avail: 712268,
  avail_percent: 34,
  last_updated: 2013-11-22 11:14:28.720368,
  store_stats: { bytes_total: 10048528,
  bytes_sst: 8999314,
  bytes_log: 983040,
  bytes_misc: 66174,
  last_updated: 0.00},
  health: HEALTH_OK}]}]},
  overall_status: HEALTH_WARN,
  detail: [
mon.2 addr 192.168.3.13:6789\/0 clock skew 0.0541703s  max 0.05s 
(latency 0.000815989s)]}


Why is low disk space! not included inside 'details'?

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible bug in init-ceph.in

2013-11-20 Thread Dietmar Maurer
http://ceph.com/git/?p=ceph.git;a=blob;f=src/init-ceph.in;h=7399abb8f85855f2248c4afb22bf94f2e2f080a2;hb=HEAD

line 320:

  if [ ${update_crush:-1} = 1 -o {$update_crush:-1} = true ]; then

looks strange to me. Maybe that should be:

-   if [ ${update_crush:-1} = 1 -o {$update_crush:-1} = 
true ]; then
+   if [ ${update_crush:-1} = 1 -o ${update_crush:-1} = 
true ]; then



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: v0.67 Dumpling released

2013-08-14 Thread Dietmar Maurer
 Another three months have gone by, and the next stable release of Ceph is
 ready: Dumpling!  Thank you to everyone who has contributed to this release!

Seem there is a new dependency in the ceph-common debian package: python-ceph

Is that really required (previous release does not depend on that)?

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: incremental rbd export / sparse files?

2012-11-22 Thread Dietmar Maurer
 Step 2 is to export the incremental changes.  The hangup there is figuring out
 a generic and portable file format to represent those incremental changes;
 we'd rather not invent something ourselves that is ceph-specific.
 Suggestions welcome!

AFAIK, both 'zfs' and 'btrfs' already have such format. 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Limited IOP/s on Dual Xeon KVM Host

2012-11-11 Thread Dietmar Maurer
 Got it fixed by Bios update... crazy.

So you can't see any regression in IOPS with more cores? 
Can you speedup things with process pinning?

Just want to make sure that NUMA does not play any role here.
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: less cores more iops / speed

2012-11-07 Thread Dietmar Maurer
  I've noticed something really interesting.
 
  I get 5000 iops / VM for rand. 4k writes while assigning 4 cores on a
  2.5 Ghz Xeon.
 
  When i move this VM to another kvm host with 3.6Ghz i get 8000 iops
  (still 8
  cores) when i then LOWER the assigned cores from 8 to 4 i get
  14.500 iops. If i assign only 2 cores i get 16.000 iops...
 
  Why does less kvm cores mean more speed?
 
 There is a serious bug in the kvm vhost code. Do you use virtio-net with
 vhost?
 
 see: http://lists.nongnu.org/archive/html/qemu-devel/2012-
 11/msg00579.html
 
 Please test using the e1000 driver instead.

Or update the guest kernel (what guest kernel do you use?). AFAIK 3.X kernels 
does not trigger the bug.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: less cores more iops / speed

2012-11-07 Thread Dietmar Maurer
 Why is vhost net driver involved here at all? Kvm guest only uses ssh here.

I though you are testing things (rdb) which depends on KVM network speed?

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: RBD trim / unmap support?

2012-11-02 Thread Dietmar Maurer
hw/ide/qdev.c:error_report(discard_granularity must be 512 for ide);

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Stefan Priebe - Profihost AG
 Sent: Freitag, 02. November 2012 09:20
 To: Josh Durgin
 Cc: ceph-devel@vger.kernel.org
 Subject: Re: RBD trim / unmap support?
 
 Am 02.11.2012 00:36, schrieb Josh Durgin:
  On 11/01/2012 04:33 PM, Stefan Priebe wrote:
  Hello list,
 
  does rbd support trim / unmap? Or is it planned to support it?
 
  Greets,
  Stefan
 
  librbd (and thus qemu) support it. The rbd kernel module does not yet.
  See http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim
 
 Thanks! Is there any recommanded value for discard_granularity? With
 fstrim and iscsi i use 128kb.
 
 Stefan
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: slow fio random read benchmark, need help

2012-11-01 Thread Dietmar Maurer
I do not really understand that network latency argument.

If one can get 40K iops with iSCSI, why can't I get the same with rados/ceph?

Note: network latency is the same in both cases

What do I miss?

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
 ow...@vger.kernel.org] On Behalf Of Alexandre DERUMIER
 Sent: Mittwoch, 31. Oktober 2012 18:27
 To: Marcus Sorensen
 Cc: Sage Weil; ceph-devel
 Subject: Re: slow fio random read benchmark, need help
 
 Thanks Marcus,
 
 indeed gigabit ethernet.
 
 note that my iscsi results  (40k)was with multipath, so multiple gigabit 
 links.
 
 I have also done tests with a netapp array, with nfs, single link, I'm around
 13000 iops
 
 I will do more tests with multiples vms, from differents hosts, and with --
 numjobs.
 
 I'll keep you in touch,
 
 Thanks for help,
 
 Regards,
 
 Alexandre
 
 
 - Mail original -
 
 De: Marcus Sorensen shadow...@gmail.com
 À: Alexandre DERUMIER aderum...@odiso.com
 Cc: Sage Weil s...@inktank.com, ceph-devel ceph-
 de...@vger.kernel.org
 Envoyé: Mercredi 31 Octobre 2012 18:08:11
 Objet: Re: slow fio random read benchmark, need help
 
 5000 is actually really good, if you ask me. Assuming everything is connected
 via gigabit. If you get 40k iops locally, you add the latency of tcp, as well 
 as
 that of the ceph services and VM layer, and that's what you get. On my
 network I get about a .1ms round trip on gigabit over the same switch, which
 by definition can only do 10,000 iops. Then if you have storage on the other
 end capable of 40k iops, you add the latencies together (.1ms + .025ms) and
 you're at 8k iops.
 Then add the small latency of the application servicing the io (NFS, Ceph, 
 etc),
 and the latency introduced by your VM layer, and 5k sounds about right.
 
 The good news is that you probably aren't taxing the storage, you can likely
 do many simultaneous tests from several VMs and get the same results.
 
 You can try adding --numjobs to your fio to parallelize the specific test 
 you're
 doing, or launching a second VM and doing the same test at the same time.
 This would be a good indicator if it's latency.
 
 On Wed, Oct 31, 2012 at 10:29 AM, Alexandre DERUMIER
 aderum...@odiso.com wrote:
 Have you tried increasing the iodepth?
  Yes, I have try with 100 and 200, same results.
 
  I have also try directly from the host, with /dev/rbd1, and I have same
 result.
  I have also try with 3 differents hosts, with differents cpus models.
 
  (note: I can reach around 40.000 iops with same fio config on a zfs
  iscsi array)
 
  My test ceph cluster nodes cpus are old (xeon E5420), but they are around
 10% usage, so I think it's ok.
 
 
  Do you have an idea if I can trace something ?
 
  Thanks,
 
  Alexandre
 
  - Mail original -
 
  De: Sage Weil s...@inktank.com
  À: Alexandre DERUMIER aderum...@odiso.com
  Cc: ceph-devel ceph-devel@vger.kernel.org
  Envoyé: Mercredi 31 Octobre 2012 16:57:05
  Objet: Re: slow fio random read benchmark, need help
 
  On Wed, 31 Oct 2012, Alexandre DERUMIER wrote:
  Hello,
 
  I'm doing some tests with fio from a qemu 1.2 guest (virtio
  disk,cache=none), randread, with 4K block size on a small size of 1G
  (so it can be handle by the buffer cache on ceph cluster)
 
 
  fio --filename=/dev/vdb -rw=randread --bs=4K --size=1000M
  --iodepth=40 --group_reporting --name=file1 --ioengine=libaio
  --direct=1
 
 
  I can't get more than 5000 iops.
 
  Have you tried increasing the iodepth?
 
  sage
 
 
 
  RBD cluster is :
  ---
  3 nodes,with each node :
  -6 x osd 15k drives (xfs), journal on tmpfs, 1 mon
  -cpu: 2x 4 cores intel xeon E5420@2.5GHZ rbd 0.53
 
  ceph.conf
 
  journal dio = false
  filestore fiemap = false
  filestore flusher = false
  osd op threads = 24
  osd disk threads = 24
  filestore op threads = 6
 
  kvm host is : 4 x 12 cores opteron
  
 
 
  During the bench:
 
  on ceph nodes:
  - cpu is around 10% used
  - iostat show no disks activity on osds. (so I think that the 1G file
  is handle in the linux buffer)
 
 
  on kvm host:
 
  -cpu is around 20% used
 
 
  I really don't see where is the bottleneck
 
  Any Ideas, hints ?
 
 
  Regards,
 
  Alexandre
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel
  in the body of a message to majord...@vger.kernel.org More
 majordomo
  info at http://vger.kernel.org/majordomo-info.html
 
 
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel
  in the body of a message to majord...@vger.kernel.org More
 majordomo
  info at http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the
 body of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html



RE: slow fio random read benchmark, need help

2012-11-01 Thread Dietmar Maurer

 For the record, I'm not saying that it's the entire reason why the performance
 is lower (obviously since iscsi is better), I'm just saying that when you're
 talking about high iops, adding 100us (best case gigabit) to each request and
 response is significant

iSCSI also uses the network (also adds 100us to esach request), so that simply 
can't be the reason.

I always thought a distributed block storage could do such things 
faster (or at least as fast) than a single centralized store?

- Dietmar

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: rbd export to stdout ?

2012-10-22 Thread Dietmar Maurer
 Subject: Re: rbd export to stdout ?
 
 On 10/22/2012 12:41 PM, Alexandre DERUMIER wrote:
  Hi,
 
  I'm looking for use rbd export  to stdout and not a file.

I thought we should extend 'qemu-img', adding import/export there? That
way it would work for all qemu storage drivers?


RE: rbd export to stdout ?

2012-10-22 Thread Dietmar Maurer
 I thought we should extend 'qemu-img', adding import/export there?
 That way it would work for all qemu storage drivers?
 
 seem to be already possible:
 
 http://ceph.com/wiki/QEMU-RBD
 
 qemu-img convert -f qcow2 -O rbd
 /data/debian_lenny_amd64_small.qcow2 rbd:data/lenny

Oh -  I missed that. Thanks for the hint.
N�r��yb�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�mzZ+�ݢj��!�i

RE: RBD performance - tuning hints

2012-08-31 Thread Dietmar Maurer
RBD waits for the data to be on disk on all replicas. It's pretty easy
to relax this to in memory on all replicas, but there's no option for
that right now.

I thought that is dangerous, because you can loose data?