rbd command to display free space in a cluster ?

2012-10-15 Thread Alexandre DERUMIER
Hi,

I'm looking for a way to retrieve the free space from a rbd cluster with rbd 
command.

Any hint ?

(something like ceph -w status, but without need to parse the result)

Regards,

Alexandre
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSD::mkfs: couldn't mount FileStore: error -22

2012-10-15 Thread Adam Nielsen

current/ is a btrfs subvolume.. 'btrfs sub delete current' will remove it.


Ah, that worked, thanks.  Unfortunately mkcephfs still fails with the same 
error.


The warning in the previous email suggets you're running a fairly old
kernel.. there is probably something handled incorrectly during the fs
init process.  Exactly which kernel are you running?


2.6.32 - apparently the latest in Debian stable.  I figured this was workable 
since ceph.com offers packages for Debian stable.



In any case, btrfs isn't going to work particularly well on something that
old; I suggest running something newer (3.5 or 3.6) or switching to XFS.


Ok fair enough.  I'll have to see how practical it is to get a more recent 
kernel going, otherwise I'll go down the XFS route.


Thanks again,
Adam.


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rbd: zero return code in rbd_dev_image_id()

2012-10-15 Thread Alex Elder
There is a call in rbd_dev_image_id() to rbd_req_sync_exec()
to get the image id for an image.  Despite the get_id class
method only returning 0 on success, I am getting back a positive
value (I think the number of bytes returned with the call).

That may or may not be how rbd_req_sync_exec() is supposed to
behave, but zeroing the return value if successful makes it moot
and makes this path through the code work as desired.

Do the same in rbd_dev_v2_object_prefix().

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |2 ++
 1 file changed, 2 insertions(+)

Index: b/drivers/block/rbd.c
===
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2207,6 +2207,7 @@ static int rbd_dev_v2_object_prefix(stru
dout(%s: rbd_req_sync_exec returned %d\n, __func__, ret);
if (ret  0)
goto out;
+   ret = 0;/* rbd_req_sync_exec() can return positive */

p = reply_buf;
rbd_dev-header.object_prefix = ceph_extract_encoded_string(p,
@@ -2900,6 +2901,7 @@ static int rbd_dev_image_id(struct rbd_d
dout(%s: rbd_req_sync_exec returned %d\n, __func__, ret);
if (ret  0)
goto out;
+   ret = 0;/* rbd_req_sync_exec() can return positive */

p = response;
rbd_dev-image_id = ceph_extract_encoded_string(p,
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] rbd: kill rbd_device-rbd_opts

2012-10-15 Thread Alex Elder
The rbd_device structure has an embedded rbd_options structure.
Such a structure is needed to work with the generic ceph argument
parsing code, but there's no need to keep it around once argument
parsing is done.

Use a local variable to hold the rbd options used in parsing in
rbd_get_client(), and just transfer its content (it's just a
read_only flag) into the field in the rbd_mapping sub-structure
that requires that information.

Signed-off-by: Alex Elder el...@inktank.com
---
 drivers/block/rbd.c |   14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

Index: b/drivers/block/rbd.c
===
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -181,7 +181,6 @@ struct rbd_device {
struct gendisk  *disk;  /* blkdev's gendisk and rq */

u32 image_format;   /* Either 1 or 2 */
-   struct rbd_options  rbd_opts;
struct rbd_client   *rbd_client;

charname[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
@@ -453,18 +452,24 @@ static int parse_rbd_opts_token(char *c,
 static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
size_t mon_addr_len, char *options)
 {
-   struct rbd_options *rbd_opts = rbd_dev-rbd_opts;
+   struct rbd_options rbd_opts;
struct ceph_options *ceph_opts;
struct rbd_client *rbdc;

-   rbd_opts-read_only = RBD_READ_ONLY_DEFAULT;
+   /* Initialize all rbd options to the defaults */
+
+   rbd_opts.read_only = RBD_READ_ONLY_DEFAULT;

ceph_opts = ceph_parse_options(options, mon_addr,
mon_addr + mon_addr_len,
-   parse_rbd_opts_token, rbd_opts);
+   parse_rbd_opts_token, rbd_opts);
if (IS_ERR(ceph_opts))
return PTR_ERR(ceph_opts);

+   /* Record the parsed rbd options */
+
+   rbd_dev-mapping.read_only = rbd_opts.read_only;
+
rbdc = rbd_client_find(ceph_opts);
if (rbdc) {
/* using an existing client */
@@ -672,7 +677,6 @@ static int rbd_dev_set_mapping(struct rb
rbd_dev-mapping.size = rbd_dev-header.image_size;
rbd_dev-mapping.features = rbd_dev-header.features;
rbd_dev-mapping.snap_exists = false;
-   rbd_dev-mapping.read_only = rbd_dev-rbd_opts.read_only;
ret = 0;
} else {
ret = snap_by_name(rbd_dev, snap_name);
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd command to display free space in a cluster ?

2012-10-15 Thread Sage Weil
On Mon, 15 Oct 2012, Alexandre DERUMIER wrote:
 Hi,
 
 I'm looking for a way to retrieve the free space from a rbd cluster with rbd 
 command.
 
 Any hint ?
 
 (something like ceph -w status, but without need to parse the result)

 rados df

is the closest.

sage



 
 Regards,
 
 Alexandre
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Ceph benchmark high wait on journal device

2012-10-15 Thread Martin Mailand

Hi,

inspired from the performance test Mark did, I tried to compile my own one.
I have four OSD processes on one Node, each process has a Intel 710 SSD 
for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
If I test the SSD with fio they are quite fast and the w_wait time is 
quite low.
But if I run rados bench on the cluster, the w_wait times for the 
journal devices are quite high (around 20-40ms).

I thought the SSD would be better, any ideas what happend here?

 -martin

Logs:

/dev/sd{c,d,e,f}
Intel SSD 710 200G

/dev/sd{g,h,i,j}
each 4 x SAS on LSI 9266-8i Raid 0

fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2 
-ioengine libaio -direct 1 -bs 256k


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

- snip -
sdc   0,00 0,000,00  809,20 0,00   202,30 
512,00 0,961,190,001,19   1,18  95,84

- snap -



rados bench -p rbd 300 write -t 16

2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat: 
0.189553

   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   300  16 25329 25313   337.443   324  0.274815  0.189553
 Total time run: 300.169843
Total writes made:  25329
Write size: 4194304
Bandwidth (MB/sec): 337.529

Stddev Bandwidth:   25.1568
Max bandwidth (MB/sec): 372
Min bandwidth (MB/sec): 0
Average Latency:0.189597
Stddev Latency: 0.0641609
Max latency:0.469604
Min latency:0.035382


during the rados bench test.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  20,380,00   16,208,870,00   54,55

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0,0041,200,00   12,40 0,00 0,35 
57,42 0,000,310,000,31   0,31   0,38
sdb   0,00 0,000,000,00 0,00 0,00 
0,00 0,000,000,000,00   0,00   0,00
sdc   0,00 0,000,00  332,80 0,00   139,67 
859,53 7,36   22,090,00   22,09   2,12  70,42
sdd   0,00 0,000,00  391,60 0,00   175,84 
919,6215,59   39,620,00   39,62   2,40  93,80
sde   0,00 0,000,00  342,00 0,00   147,39 
882,59 8,54   24,890,00   24,89   2,18  74,58
sdf   0,00 0,000,00  362,20 0,00   162,72 
920,0515,35   42,500,00   42,50   2,60  94,20
sdg   0,00 0,000,00  522,00 0,00   139,20 
546,13 0,280,540,000,54   0,10   5,26
sdh   0,00 0,000,00  672,00 0,00   179,20 
546,13 9,67   14,420,00   14,42   0,61  41,18
sdi   0,00 0,000,00  555,00 0,00   148,00 
546,13 0,320,570,000,57   0,10   5,46
sdj   0,00 0,000,00  582,00 0,00   155,20 
546,13 0,510,870,000,87   0,12   6,96


100 seconds later

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  22,920,00   19,579,250,00   48,25

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0,0040,800,00   15,60 0,00 0,36 
47,08 0,000,220,000,22   0,22   0,34
sdb   0,00 0,000,000,00 0,00 0,00 
0,00 0,000,000,000,00   0,00   0,00
sdc   0,00 0,000,00  386,60 0,00   168,33 
891,7012,11   31,080,00   31,08   2,25  86,86
sdd   0,00 0,000,00  405,00 0,00   183,06 
925,6815,68   38,700,00   38,70   2,34  94,90
sde   0,00 0,000,00  411,00 0,00   185,06 
922,1515,58   38,090,00   38,09   2,33  95,92
sdf   0,00 0,000,00  387,00 0,00   168,33 
890,7912,19   31,480,00   31,48   2,26  87,48
sdg   0,00 0,000,00  646,20 0,00   171,22 
542,64 0,420,650,000,65   0,10   6,70
sdh   0,0085,600,40  797,00 0,01   192,97 
495,6510,95   13,73   32,50   13,72   0,55  44,22
sdi   0,00 0,000,00  678,20 0,00   180,01 
543,59 0,450,670,000,67   0,10   6,76
sdj   0,00 0,000,00  639,00 0,00   169,61 
543,61 0,360,570,000,57   0,10   6,32


 --admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump

Help...MDS Continuously Segfaulting

2012-10-15 Thread Nick Couchman
Well, both of my MDSs seem to be down right now, and then continually segfault 
(every time I try to start them) with the following:

ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
starting mds.b at :/0
*** Caught signal (Segmentation fault) **
 in thread 7fbe0d61d700
 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation 
fault) **
 in thread 7fbe0d61d700

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

 0 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7fbe0d61d700

 ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
 1: ceph-mds() [0x7ef83a]
 2: (()+0xfd00) [0x7fbe15a0cd00]
 3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
 4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
 5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
 6: (()+0x7f05) [0x7fbe15a04f05]
 7: (clone()+0x6d) [0x7fbe14bc410d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

Segmentation fault

Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can 
attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone can 
offer any insight as to what to do to get the replay to run without segfaulting?





This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient.  If this email is not intended for you, or you are 
not responsible for the delivery of this message to the intended recipient, 
please note that this message may contain SEAKR Engineering (SEAKR) 
Privileged/Proprietary Information.  In such a case, you are strictly 
prohibited from downloading, photocopying, distributing or otherwise using this 
message, its contents or attachments in any way.  If you have received this 
message in error, please notify us immediately by replying to this e-mail and 
delete the message from your mailbox.  Information contained in this message 
that does not relate to the business of SEAKR is neither endorsed by nor 
attributable to SEAKR.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Gregory Farnum
Something in the MDS log is bad or is poking at a bug in the code. Can
you turn on MDS debugging and restart a daemon and put that log
somewhere accessible?
debug mds = 20
debug journaler = 20
debug ms = 1
-Greg

On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman nick.couch...@seakr.com wrote:
 Well, both of my MDSs seem to be down right now, and then continually 
 segfault (every time I try to start them) with the following:

 ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
 starting mds.b at :/0
 *** Caught signal (Segmentation fault) **
  in thread 7fbe0d61d700
  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation 
 fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

  0 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal 
 (Segmentation fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Segmentation fault

 Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can 
 attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone 
 can offer any insight as to what to do to get the replay to run without 
 segfaulting?



 
 This e-mail may contain confidential and privileged material for the sole use 
 of the intended recipient.  If this email is not intended for you, or you are 
 not responsible for the delivery of this message to the intended recipient, 
 please note that this message may contain SEAKR Engineering (SEAKR) 
 Privileged/Proprietary Information.  In such a case, you are strictly 
 prohibited from downloading, photocopying, distributing or otherwise using 
 this message, its contents or attachments in any way.  If you have received 
 this message in error, please notify us immediately by replying to this 
 e-mail and delete the message from your mailbox.  Information contained in 
 this message that does not relate to the business of SEAKR is neither 
 endorsed by nor attributable to SEAKR.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Nick Couchman
Anywhere in particular I should make it available?  It's a little over a 
million lines of debug in the file - I can put it on a pastebin, if that works, 
or perhaps zip it up and throw it somewhere?

-Nick

 On 2012/10/15 at 11:26, Gregory Farnum g...@inktank.com wrote: 
 Something in the MDS log is bad or is poking at a bug in the code. Can
 you turn on MDS debugging and restart a daemon and put that log
 somewhere accessible?
 debug mds = 20
 debug journaler = 20
 debug ms = 1
 -Greg
 
 On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman nick.couch...@seakr.com 
 wrote:
 Well, both of my MDSs seem to be down right now, and then continually 
 segfault (every time I try to start them) with the following:

 ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
 starting mds.b at :/0
 *** Caught signal (Segmentation fault) **
  in thread 7fbe0d61d700
  ceph version 0.48.1argonaut 
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation 
 fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut 
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

  0 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal 
 (Segmentation fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut 
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 Segmentation fault

 Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can 
 attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone 
 can offer any insight as to what to do to get the replay to run without 
 segfaulting?



 
 This e-mail may contain confidential and privileged material for the sole 
 use 
 of the intended recipient.  If this email is not intended for you, or you are 
 not responsible for the delivery of this message to the intended recipient, 
 please note that this message may contain SEAKR Engineering (SEAKR) 
 Privileged/Proprietary Information.  In such a case, you are strictly 
 prohibited from downloading, photocopying, distributing or otherwise using 
 this message, its contents or attachments in any way.  If you have received 
 this message in error, please notify us immediately by replying to this 
 e-mail 
 and delete the message from your mailbox.  Information contained in this 
 message that does not relate to the business of SEAKR is neither endorsed by 
 nor attributable to SEAKR.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html





This e-mail may contain confidential and privileged material for the sole use 
of the intended recipient.  If this email is not intended for you, or you are 
not responsible for the delivery of this message to the intended recipient, 
please note that this message may contain SEAKR Engineering (SEAKR) 
Privileged/Proprietary Information.  In such a case, you are strictly 
prohibited from downloading, photocopying, distributing or otherwise using this 
message, its contents or attachments in any way.  If you have received this 
message in error, please notify us immediately by replying to this e-mail and 
delete the message from your mailbox.  Information contained in this message 
that does not relate to the business of SEAKR is neither endorsed by nor 
attributable to SEAKR.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help...MDS Continuously Segfaulting

2012-10-15 Thread Gregory Farnum
Yeah, zip it and post — somebody's going to have to download it and do
fun things. :)
-Greg

On Mon, Oct 15, 2012 at 10:43 AM, Nick Couchman nick.couch...@seakr.com wrote:
 Anywhere in particular I should make it available?  It's a little over a 
 million lines of debug in the file - I can put it on a pastebin, if that 
 works, or perhaps zip it up and throw it somewhere?

 -Nick

 On 2012/10/15 at 11:26, Gregory Farnum g...@inktank.com wrote:
 Something in the MDS log is bad or is poking at a bug in the code. Can
 you turn on MDS debugging and restart a daemon and put that log
 somewhere accessible?
 debug mds = 20
 debug journaler = 20
 debug ms = 1
 -Greg

 On Mon, Oct 15, 2012 at 10:02 AM, Nick Couchman nick.couch...@seakr.com
 wrote:
 Well, both of my MDSs seem to be down right now, and then continually
 segfault (every time I try to start them) with the following:

 ceph-mdsmon-a:~ # ceph-mds -n mds.b -c /etc/ceph/ceph.conf -f
 starting mds.b at :/0
 *** Caught signal (Segmentation fault) **
  in thread 7fbe0d61d700
  ceph version 0.48.1argonaut
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal (Segmentation
 fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to
 interpret this.

  0 2012-10-15 10:57:35.449161 7fbe0d61d700 -1 *** Caught signal
 (Segmentation fault) **
  in thread 7fbe0d61d700

  ceph version 0.48.1argonaut
 (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
  1: ceph-mds() [0x7ef83a]
  2: (()+0xfd00) [0x7fbe15a0cd00]
  3: (ESession::replay(MDS*)+0x3ea) [0x4dcfea]
  4: (MDLog::_replay_thread()+0x6b6) [0x6a2446]
  5: (MDLog::ReplayThread::entry()+0xd) [0x4cf5ed]
  6: (()+0x7f05) [0x7fbe15a04f05]
  7: (clone()+0x6d) [0x7fbe14bc410d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to
 interpret this.

 Segmentation fault

 Anyone have any hints on recovering?  I'm running 0.48.1argonaut - I can
 attempt to upgrade to 0.48.2 and see if that helps, but I figured if anyone
 can offer any insight as to what to do to get the replay to run without
 segfaulting?



 
 This e-mail may contain confidential and privileged material for the sole 
 use
 of the intended recipient.  If this email is not intended for you, or you are
 not responsible for the delivery of this message to the intended recipient,
 please note that this message may contain SEAKR Engineering (SEAKR)
 Privileged/Proprietary Information.  In such a case, you are strictly
 prohibited from downloading, photocopying, distributing or otherwise using
 this message, its contents or attachments in any way.  If you have received
 this message in error, please notify us immediately by replying to this 
 e-mail
 and delete the message from your mailbox.  Information contained in this
 message that does not relate to the business of SEAKR is neither endorsed by
 nor attributable to SEAKR.
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



 

 This e-mail may contain confidential and privileged material for the sole use 
 of the intended recipient.  If this email is not intended for you, or you are 
 not responsible for the delivery of this message to the intended recipient, 
 please note that this message may contain SEAKR Engineering (SEAKR) 
 Privileged/Proprietary Information.  In such a case, you are strictly 
 prohibited from downloading, photocopying, distributing or otherwise using 
 this message, its contents or attachments in any way.  If you have received 
 this message in error, please notify us immediately by replying to this 
 e-mail and delete the message from your mailbox.  Information contained in 
 this message that does not relate to the business of SEAKR is neither 
 endorsed by nor attributable to SEAKR.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New branch: Python packaging integrated into automake

2012-10-15 Thread Tommi Virtanen
Hi. While working on the external journal stuff, for a while I thought
I needed more python code than I ended up needing. To support that
code, I put in the skeleton of import ceph.foo support. While I
ultimately didn't need it, I didn't want to throw away the results. If
you later need to have more python stuff in core ceph, use this branch
as the base.

It's intentionally separate from the python-ceph Debian package; that
one is about providing python programmers APIs to use RADOS, RBD etc
(compare to librados-dev etc); this is about Python code used by core
Ceph itself.

https://github.com/ceph/ceph/tree/automake-python
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two questions about client writes update to Ceph

2012-10-15 Thread Samuel Just
Hi Alex,

1) When a replica goes, down the write won't complete until the
replica is detected as down.  At that point, the write can complete
without the down replica.  Shortly thereafter, if the down replica
does not come back, a new replica will replace it bringing the
replication count to what it should be.  In the scenario you
described, the transaction will be re-replicated to the replicas when
they come back up (or to new replicas if they don't).

2) The clients don't have a local, durable journal.  The client side
state acts like the volatile cache on a spinning disk: flushing the
disk io will force the data to become durable on an OSD (either in the
journal or synced to the filesystem).

-Sam

On Mon, Oct 15, 2012 at 1:46 AM, Alex Jiang alex.jiang@gmail.com wrote:
 Hi,all
 I'm very interested in Ceph. Recently I deployed a Ceph cluster in
 lab environment, and it works very well. Now I am studying the
 principle of Ceph RBD. But I am confused about two questions.
 1) When client writes update to Ceph RBD, it contacts with primary
 OSD directly. Then The primary OSD forwards the update to replicas. If
 the replicas are down and update is not committed to replicas disks,
 but the update has been committed to primary OSD disk, will Ceph
 rollbacks the transaction?
 2) To ensure availability, do journals exist in client side to log
 I/O history? If true, are the journals durable in disks?

   Regards,

   Alex
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph benchmark high wait on journal device

2012-10-15 Thread Mark Nelson

Hi Martin,

I haven't tested the 9266-8i specifically, but it may behave similarly 
to the 9265-8i.  This is just a theory, but I get the impression that 
the controller itself introduces some latency getting data to disk, and 
that it may get worse as the more data is pushed across the controller. 
That seems to be the case even of the data is not going to the disk in 
question.  Are you using a single controller with expanders?  On some of 
our nodes that use a single controller with lots of expanders, I've 
noticed high IO wait times, especially when doing lots of small writes.


Mark

On 10/15/2012 11:12 AM, Martin Mailand wrote:

Hi,

inspired from the performance test Mark did, I tried to compile my own one.
I have four OSD processes on one Node, each process has a Intel 710 SSD
for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
If I test the SSD with fio they are quite fast and the w_wait time is
quite low.
But if I run rados bench on the cluster, the w_wait times for the
journal devices are quite high (around 20-40ms).
I thought the SSD would be better, any ideas what happend here?

-martin

Logs:

/dev/sd{c,d,e,f}
Intel SSD 710 200G

/dev/sd{g,h,i,j}
each 4 x SAS on LSI 9266-8i Raid 0

fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2
-ioengine libaio -direct 1 -bs 256k

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
- snip -
sdc 0,00 0,00 0,00 809,20 0,00 202,30 512,00 0,96 1,19 0,00 1,19 1,18 95,84
- snap -



rados bench -p rbd 300 write -t 16

2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat:
0.189553
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
300 16 25329 25313 337.443 324 0.274815 0.189553
Total time run: 300.169843
Total writes made: 25329
Write size: 4194304
Bandwidth (MB/sec): 337.529

Stddev Bandwidth: 25.1568
Max bandwidth (MB/sec): 372
Min bandwidth (MB/sec): 0
Average Latency: 0.189597
Stddev Latency: 0.0641609
Max latency: 0.469604
Min latency: 0.035382


during the rados bench test.

avg-cpu: %user %nice %system %iowait %steal %idle
20,38 0,00 16,20 8,87 0,00 54,55

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 41,20 0,00 12,40 0,00 0,35 57,42 0,00 0,31 0,00 0,31 0,31 0,38
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 332,80 0,00 139,67 859,53 7,36 22,09 0,00 22,09 2,12
70,42
sdd 0,00 0,00 0,00 391,60 0,00 175,84 919,62 15,59 39,62 0,00 39,62 2,40
93,80
sde 0,00 0,00 0,00 342,00 0,00 147,39 882,59 8,54 24,89 0,00 24,89 2,18
74,58
sdf 0,00 0,00 0,00 362,20 0,00 162,72 920,05 15,35 42,50 0,00 42,50 2,60
94,20
sdg 0,00 0,00 0,00 522,00 0,00 139,20 546,13 0,28 0,54 0,00 0,54 0,10 5,26
sdh 0,00 0,00 0,00 672,00 0,00 179,20 546,13 9,67 14,42 0,00 14,42 0,61
41,18
sdi 0,00 0,00 0,00 555,00 0,00 148,00 546,13 0,32 0,57 0,00 0,57 0,10 5,46
sdj 0,00 0,00 0,00 582,00 0,00 155,20 546,13 0,51 0,87 0,00 0,87 0,12 6,96

100 seconds later

avg-cpu: %user %nice %system %iowait %steal %idle
22,92 0,00 19,57 9,25 0,00 48,25

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 40,80 0,00 15,60 0,00 0,36 47,08 0,00 0,22 0,00 0,22 0,22 0,34
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 386,60 0,00 168,33 891,70 12,11 31,08 0,00 31,08 2,25
86,86
sdd 0,00 0,00 0,00 405,00 0,00 183,06 925,68 15,68 38,70 0,00 38,70 2,34
94,90
sde 0,00 0,00 0,00 411,00 0,00 185,06 922,15 15,58 38,09 0,00 38,09 2,33
95,92
sdf 0,00 0,00 0,00 387,00 0,00 168,33 890,79 12,19 31,48 0,00 31,48 2,26
87,48
sdg 0,00 0,00 0,00 646,20 0,00 171,22 542,64 0,42 0,65 0,00 0,65 0,10 6,70
sdh 0,00 85,60 0,40 797,00 0,01 192,97 495,65 10,95 13,73 32,50 13,72
0,55 44,22
sdi 0,00 0,00 0,00 678,20 0,00 180,01 543,59 0,45 0,67 0,00 0,67 0,10 6,76
sdj 0,00 0,00 0,00 639,00 0,00 169,61 543,61 0,36 0,57 0,00 0,57 0,10 6,32

--admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump
{filestore:{journal_queue_max_ops:500,journal_queue_ops:0,journal_ops:34653,journal_queue_max_bytes:104857600,journal_queue_bytes:0,journal_bytes:86821481160,journal_latency:{avgcount:34653,sum:3458.68},journal_wr:19372,journal_wr_bytes:{avgcount:19372,sum:87026655232},op_queue_max_ops:500,op_queue_ops:126,ops:34653,op_queue_max_bytes:104857600,op_queue_bytes:167023,bytes:86821143225,apply_latency:{avgcount:34527,sum:605.768},committing:0,commitcycle:19,commitcycle_interval:{avgcount:19,sum:572.674},commitcycle_latency:{avgcount:19,sum:2.62279},journal_full:0},osd:{opq:0,op_wip:4,op:15199,op_in_bytes:36140461079,op_out_bytes:0,op_latency:{avgcount:15199,sum:1811.57},op_r:0,op_r_out_bytes:0,op_r_latency:{avgcount:0,sum:0},op_w:15199,op_w_in_bytes:36140461079,op_w_rlat:{avgcount:15199,sum:177.327},op_w_latency:{avgcount:15199,sum:1811.57},op_rw:0,op_rw_in_bytes:0,op!

_rw_out_

Re: osd crash in ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*, pg_stat_t*)

2012-10-15 Thread Samuel Just
Do you have a coredump for the crash?  Can you reproduce the crash with:

debug filestore = 20
debug osd = 20

and post the logs?

As far as the incomplete pg goes, can you post the output of

ceph pg pgid query

where pgid is the pgid of the incomplete pg (e.g. 1.34)?

Thanks
-Sam

On Thu, Oct 11, 2012 at 3:17 PM, Yann Dupont yann.dup...@univ-nantes.fr wrote:
 Hello everybody.

 I'm currently having problem with 1 of my OSD, crashing with  this trace :

 ceph version 0.52 (commit:e48859474c4944d4ff201ddc9f5fd400e8898173)
  1: /usr/bin/ceph-osd() [0x737879]
  2: (()+0xf030) [0x7f43f0af0030]
  3:
 (ReplicatedPG::add_object_context_to_pg_stat(ReplicatedPG::ObjectContext*,
 pg_stat_t*)+0x292) [0x555262]
  4: (ReplicatedPG::recover_backfill(int)+0x1c1a) [0x55c93a]
  5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x26a)
 [0x563c1a]
  6: (OSD::do_recovery(PG*)+0x39d) [0x5d3c9d]
  7: (OSD::RecoveryWQ::_process(PG*)+0xd) [0x6119fd]
  8: (ThreadPool::worker()+0x82b) [0x7c176b]
  9: (ThreadPool::WorkThread::entry()+0xd) [0x5f609d]
  10: (()+0x6b50) [0x7f43f0ae7b50]
  11: (clone()+0x6d) [0x7f43ef81b78d]

 Restarting gives the same message after some seconds.
 I've been watching the bug tracker but I don't see something related.

 Some informations : kernel is 3.6.1, with standard debian packages from
 ceph.com

 My ceph cluster was running well and stable on 6 osd since june (3
 datacenters, 2 with 2 nodes, 1 with 4 nodes, a replication of 2, and
 adjusted weight to try to balance data evenly). Beginned with the
 then-up-to-date version, then 0.48, 49,50,51... Data store is on XFS.

 I'm currently in the process of growing my ceph from 6 nodes to 12 nodes. 11
 nodes are currently in ceph, for a 130 TB total. Declaring new osd was OK,
 the data has moved quite ok (in fact I had some OSD crash - not
 definitive, the osd restart ok-, maybe related to an error in my new nodes
 network configuration that I discovered after. More on that later, I can
 find the traces, but I'm not sure it's related)

 When ceph was finally stable again, with HEALTH_OK, I decided to reweight
 the osd (that was tuesday). Operation went quite OK, but near the end of
 operation (0,085% left), 1 of my OSD crashed, and won't start again.

 More problematic, with this osd down, I have 1 incomplete PG :

 ceph -s
health HEALTH_WARN 86 pgs backfill; 231 pgs degraded; 4 pgs down; 15 pgs
 incomplete; 4 pgs peering; 134 pgs recovering; 19 pgs stuck inactive; 455
 pgs stuck unclean; recovery 2122878/23181946 degraded (9.157%);
 2321/11590973 unfound (0.020%); 1 near full osd(s)
monmap e1: 3 mons at
 {chichibu=172.20.14.130:6789/0,glenesk=172.20.14.131:6789/0,karuizawa=172.20.14.133:6789/0},
 election epoch 20, quorum 0,1,2 chichibu,glenesk,karuizawa
osdmap e13184: 11 osds: 10 up, 10 in
 pgmap v2399093: 1728 pgs: 165 active, 1270 active+clean, 8
 active+recovering+degraded, 41 active+recovering+degraded+remapped+backfill,
 4 down+peering, 137 active+degraded, 3 active+clean+scrubbing, 15
 incomplete, 40 active+recovering, 45 active+recovering+degraded+backfill;
 44119 GB data, 84824 GB used, 37643 GB / 119 TB avail; 2122878/23181946
 degraded (9.157%); 2321/11590973 unfound (0.020%)
mdsmap e321: 1/1/1 up {0=karuizawa=up:active}, 2 up:standby

 how is it possible as I have a replication of 2  ?

 Is it a known problem ?

 Cheers,

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd command to display free space in a cluster ?

2012-10-15 Thread Dan Mick

Nothing like that exists at the moment; see
http://tracker.newdream.net/issues/3283 fpr the other side of it.

On 10/15/2012 12:52 AM, Alexandre DERUMIER wrote:

Hi,

I'm looking for a way to retrieve the free space from a rbd cluster with rbd 
command.

Any hint ?

(something like ceph -w status, but without need to parse the result)

Regards,

Alexandre
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ceph benchmark high wait on journal device

2012-10-15 Thread Martin Mailand

Hi Mark,

I think there is no differences between the 9266-8i and the 9265-8i, 
except for the cache vault and the angel of the SAS connectors.
In the last test, which I posted, the SSDs where connected to the 
onboard SATA ports. Further test showed if I reduce the the object size 
(the -b option) to 1M, 512k, 256k the latency almost vanished.

With 256k the w_wait was around 1ms.
So my observation shows almost the different of yours.

I use a singel controller with a dual expander backplane.

That's the baby.

http://85.214.49.87/ceph/testlab/IMAG0018.jpg

btw.

Is there a nice way to format the output of ceph --admin-daemon 
ceph-osd.0.asok perf_dump?



-martin

Am 15.10.2012 21:50, schrieb Mark Nelson:

Hi Martin,

I haven't tested the 9266-8i specifically, but it may behave similarly
to the 9265-8i.  This is just a theory, but I get the impression that
the controller itself introduces some latency getting data to disk, and
that it may get worse as the more data is pushed across the controller.
That seems to be the case even of the data is not going to the disk in
question.  Are you using a single controller with expanders?  On some of
our nodes that use a single controller with lots of expanders, I've
noticed high IO wait times, especially when doing lots of small writes.

Mark

On 10/15/2012 11:12 AM, Martin Mailand wrote:

Hi,

inspired from the performance test Mark did, I tried to compile my own
one.
I have four OSD processes on one Node, each process has a Intel 710 SSD
for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
If I test the SSD with fio they are quite fast and the w_wait time is
quite low.
But if I run rados bench on the cluster, the w_wait times for the
journal devices are quite high (around 20-40ms).
I thought the SSD would be better, any ideas what happend here?

-martin

Logs:

/dev/sd{c,d,e,f}
Intel SSD 710 200G

/dev/sd{g,h,i,j}
each 4 x SAS on LSI 9266-8i Raid 0

fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2
-ioengine libaio -direct 1 -bs 256k

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
- snip -
sdc 0,00 0,00 0,00 809,20 0,00 202,30 512,00 0,96 1,19 0,00 1,19 1,18
95,84
- snap -



rados bench -p rbd 300 write -t 16

2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat:
0.189553
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
300 16 25329 25313 337.443 324 0.274815 0.189553
Total time run: 300.169843
Total writes made: 25329
Write size: 4194304
Bandwidth (MB/sec): 337.529

Stddev Bandwidth: 25.1568
Max bandwidth (MB/sec): 372
Min bandwidth (MB/sec): 0
Average Latency: 0.189597
Stddev Latency: 0.0641609
Max latency: 0.469604
Min latency: 0.035382


during the rados bench test.

avg-cpu: %user %nice %system %iowait %steal %idle
20,38 0,00 16,20 8,87 0,00 54,55

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 41,20 0,00 12,40 0,00 0,35 57,42 0,00 0,31 0,00 0,31 0,31 0,38
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 332,80 0,00 139,67 859,53 7,36 22,09 0,00 22,09 2,12
70,42
sdd 0,00 0,00 0,00 391,60 0,00 175,84 919,62 15,59 39,62 0,00 39,62 2,40
93,80
sde 0,00 0,00 0,00 342,00 0,00 147,39 882,59 8,54 24,89 0,00 24,89 2,18
74,58
sdf 0,00 0,00 0,00 362,20 0,00 162,72 920,05 15,35 42,50 0,00 42,50 2,60
94,20
sdg 0,00 0,00 0,00 522,00 0,00 139,20 546,13 0,28 0,54 0,00 0,54 0,10
5,26
sdh 0,00 0,00 0,00 672,00 0,00 179,20 546,13 9,67 14,42 0,00 14,42 0,61
41,18
sdi 0,00 0,00 0,00 555,00 0,00 148,00 546,13 0,32 0,57 0,00 0,57 0,10
5,46
sdj 0,00 0,00 0,00 582,00 0,00 155,20 546,13 0,51 0,87 0,00 0,87 0,12
6,96

100 seconds later

avg-cpu: %user %nice %system %iowait %steal %idle
22,92 0,00 19,57 9,25 0,00 48,25

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
r_await w_await svctm %util
sda 0,00 40,80 0,00 15,60 0,00 0,36 47,08 0,00 0,22 0,00 0,22 0,22 0,34
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
sdc 0,00 0,00 0,00 386,60 0,00 168,33 891,70 12,11 31,08 0,00 31,08 2,25
86,86
sdd 0,00 0,00 0,00 405,00 0,00 183,06 925,68 15,68 38,70 0,00 38,70 2,34
94,90
sde 0,00 0,00 0,00 411,00 0,00 185,06 922,15 15,58 38,09 0,00 38,09 2,33
95,92
sdf 0,00 0,00 0,00 387,00 0,00 168,33 890,79 12,19 31,48 0,00 31,48 2,26
87,48
sdg 0,00 0,00 0,00 646,20 0,00 171,22 542,64 0,42 0,65 0,00 0,65 0,10
6,70
sdh 0,00 85,60 0,40 797,00 0,01 192,97 495,65 10,95 13,73 32,50 13,72
0,55 44,22
sdi 0,00 0,00 0,00 678,20 0,00 180,01 543,59 0,45 0,67 0,00 0,67 0,10
6,76
sdj 0,00 0,00 0,00 639,00 0,00 169,61 543,61 0,36 0,57 0,00 0,57 0,10
6,32

--admin-daemon /var/run/ceph/ceph-osd.1.asok perf dump

Re: Ceph benchmark high wait on journal device

2012-10-15 Thread Sage Weil
On Mon, 15 Oct 2012, Travis Rhoden wrote:
 Martin,
 
  btw.
 
  Is there a nice way to format the output of ceph --admin-daemon
  ceph-osd.0.asok perf_dump?
 
 I use:
 
 ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok perf dump | python 
 -mjson.tool

There is also ceph.git/src/script/perf-watch.py -s foo.asok list of vars or 
var prefixes

s
 
  - Travis
 
 
 On Mon, Oct 15, 2012 at 4:38 PM, Martin Mailand mar...@tuxadero.com wrote:
  Hi Mark,
 
  I think there is no differences between the 9266-8i and the 9265-8i, except
  for the cache vault and the angel of the SAS connectors.
  In the last test, which I posted, the SSDs where connected to the onboard
  SATA ports. Further test showed if I reduce the the object size (the -b
  option) to 1M, 512k, 256k the latency almost vanished.
  With 256k the w_wait was around 1ms.
  So my observation shows almost the different of yours.
 
  I use a singel controller with a dual expander backplane.
 
  That's the baby.
 
  http://85.214.49.87/ceph/testlab/IMAG0018.jpg
 
  btw.
 
  Is there a nice way to format the output of ceph --admin-daemon
  ceph-osd.0.asok perf_dump?
 
 
  -martin
 
  Am 15.10.2012 21:50, schrieb Mark Nelson:
 
  Hi Martin,
 
  I haven't tested the 9266-8i specifically, but it may behave similarly
  to the 9265-8i.  This is just a theory, but I get the impression that
  the controller itself introduces some latency getting data to disk, and
  that it may get worse as the more data is pushed across the controller.
  That seems to be the case even of the data is not going to the disk in
  question.  Are you using a single controller with expanders?  On some of
  our nodes that use a single controller with lots of expanders, I've
  noticed high IO wait times, especially when doing lots of small writes.
 
  Mark
 
  On 10/15/2012 11:12 AM, Martin Mailand wrote:
 
  Hi,
 
  inspired from the performance test Mark did, I tried to compile my own
  one.
  I have four OSD processes on one Node, each process has a Intel 710 SSD
  for its journal and 4 SAS Disk via an Lsi 9266-8i in Raid 0.
  If I test the SSD with fio they are quite fast and the w_wait time is
  quite low.
  But if I run rados bench on the cluster, the w_wait times for the
  journal devices are quite high (around 20-40ms).
  I thought the SSD would be better, any ideas what happend here?
 
  -martin
 
  Logs:
 
  /dev/sd{c,d,e,f}
  Intel SSD 710 200G
 
  /dev/sd{g,h,i,j}
  each 4 x SAS on LSI 9266-8i Raid 0
 
  fio -name iops -rw=write -size=10G -iodepth 1 -filename /dev/sdc2
  -ioengine libaio -direct 1 -bs 256k
 
  Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
  r_await w_await svctm %util
  - snip -
  sdc 0,00 0,00 0,00 809,20 0,00 202,30 512,00 0,96 1,19 0,00 1,19 1,18
  95,84
  - snap -
 
 
 
  rados bench -p rbd 300 write -t 16
 
  2012-10-15 17:53:17.058383min lat: 0.035382 max lat: 0.469604 avg lat:
  0.189553
  sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
  300 16 25329 25313 337.443 324 0.274815 0.189553
  Total time run: 300.169843
  Total writes made: 25329
  Write size: 4194304
  Bandwidth (MB/sec): 337.529
 
  Stddev Bandwidth: 25.1568
  Max bandwidth (MB/sec): 372
  Min bandwidth (MB/sec): 0
  Average Latency: 0.189597
  Stddev Latency: 0.0641609
  Max latency: 0.469604
  Min latency: 0.035382
 
 
  during the rados bench test.
 
  avg-cpu: %user %nice %system %iowait %steal %idle
  20,38 0,00 16,20 8,87 0,00 54,55
 
  Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
  r_await w_await svctm %util
  sda 0,00 41,20 0,00 12,40 0,00 0,35 57,42 0,00 0,31 0,00 0,31 0,31 0,38
  sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
  sdc 0,00 0,00 0,00 332,80 0,00 139,67 859,53 7,36 22,09 0,00 22,09 2,12
  70,42
  sdd 0,00 0,00 0,00 391,60 0,00 175,84 919,62 15,59 39,62 0,00 39,62 2,40
  93,80
  sde 0,00 0,00 0,00 342,00 0,00 147,39 882,59 8,54 24,89 0,00 24,89 2,18
  74,58
  sdf 0,00 0,00 0,00 362,20 0,00 162,72 920,05 15,35 42,50 0,00 42,50 2,60
  94,20
  sdg 0,00 0,00 0,00 522,00 0,00 139,20 546,13 0,28 0,54 0,00 0,54 0,10
  5,26
  sdh 0,00 0,00 0,00 672,00 0,00 179,20 546,13 9,67 14,42 0,00 14,42 0,61
  41,18
  sdi 0,00 0,00 0,00 555,00 0,00 148,00 546,13 0,32 0,57 0,00 0,57 0,10
  5,46
  sdj 0,00 0,00 0,00 582,00 0,00 155,20 546,13 0,51 0,87 0,00 0,87 0,12
  6,96
 
  100 seconds later
 
  avg-cpu: %user %nice %system %iowait %steal %idle
  22,92 0,00 19,57 9,25 0,00 48,25
 
  Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await
  r_await w_await svctm %util
  sda 0,00 40,80 0,00 15,60 0,00 0,36 47,08 0,00 0,22 0,00 0,22 0,22 0,34
  sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
  sdc 0,00 0,00 0,00 386,60 0,00 168,33 891,70 12,11 31,08 0,00 31,08 2,25
  86,86
  sdd 0,00 0,00 0,00 405,00 0,00 183,06 925,68 15,68 38,70 0,00 38,70 2,34
  94,90
  sde 0,00 0,00 0,00 411,00 0,00 185,06 922,15 15,58 38,09 0,00 38,09 2,33
  95,92
  sdf 0,00 0,00 0,00 387,00 0,00 168,33 890,79