Re: [ceph-users] Rugged data distribution on OSDs

2013-09-17 Thread Mihály Árva-Tóth
Hello Greg,

Output of 'ceph osd tree':

# idweight  type name   up/down reweight
-1  27.3root default
-2  9.1 host stor1
0   3.64osd.0   up  1
1   3.64osd.1   up  1
2   1.82osd.2   up  1
-3  9.1 host stor2
3   3.64osd.3   up  1
4   1.82osd.4   up  1
6   3.64osd.6   up  1
-4  9.1 host stor3
7   3.64osd.7   up  1
8   3.64osd.8   up  1
9   1.82osd.9   up  1

(missing of osd.5 comes from previous test when I remove HDD from a working
cluster, but I think this is not relevant now)

root@stor3:~# ceph osd pool get .rgw.buckets pg_num
pg_num: 250
root@stor3:~# ceph osd pool get .rgw.buckets pgp_num
pgp_num: 250

pgmap v129814: 514 pgs: 514 active; 818 GB data, 1682 GB used

Thank you,
Mihaly

2013/9/16 Gregory Farnum g...@inktank.com

 What is your PG count and what's the output of ceph osd tree? It's
 possible that you've just got a slightly off distribution since there
 still isn't much data in the cluster (probabilistic placement and all
 that), but let's cover the basics first.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Sep 16, 2013 at 2:08 AM, Mihály Árva-Tóth
 mihaly.arva-t...@virtual-call-center.eu wrote:
  Hello,
 
  I made some tests on 3 node Ceph cluster: upload 3 million 50 KiB object
 to
  single container. Speed and performance were okay. But data does not
  distributed correctly. Every node has got 2 pcs. 4 TB and 1 pc. 2 TB HDD.
 
  osd.0 41 GB (4 TB)
  osd.1 47 GB (4 TB)
  osd.3 16 GB (2 TB)
  osd.4 40 GB (4 TB)
  osd.5 49 GB (4 TB)
  osd.6 17 GB (2 TB)
  osd.7 48 GB (4 TB)
  osd.8 42 GB (4 TB)
  osd.9 18 GB (2 TB)
 
  Every 4 TB and 2 TB HDDs are from same vendor and same type. (WD RE SATA)
 
  I monitored iops with Zabbix under test, you can see here:
  http://ctrlv.in/237368
  (sda and sdb are system HDDs) This graph are same on every three nodes.
 
  Is there any idea what's wrong or what should I see?
 
  I'm using ceph-0.67.3 on Ubuntu 12.04.3 x86_64.
 
  Thank you,
  Mihaly
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph instead of RAID

2013-09-17 Thread Guangliang Zhao
On Tue, Aug 13, 2013 at 10:41:53AM -0500, Mark Nelson wrote:

Hi Mark,

 On 08/13/2013 02:56 AM, Dmitry Postrigan wrote:
 I am currently installing some backup servers with 6x3TB drives in them. I 
 played with RAID-10 but I was not
 impressed at all with how it performs during a recovery.
 
 Anyway, I thought what if instead of RAID-10 I use ceph? All 6 disks will 
 be local, so I could simply create
 6 local OSDs + a monitor, right? Is there anything I need to watch out for 
 in such configuration?
 
 You can do that. Although it's nice to play with and everything, I
 wouldn't recommend doing it. It will give you more pain than pleasure.
 
 Any specific reason? I just got it up and running, an after simulating some 
 failures, I like it much better than
 mdraid. Again, this only applies to large arrays (6x3TB in my case). I would 
 not use ceph to replace a RAID-1
 array of course, but it looks like a good idea to replace a large RAID10 
 array with a local ceph installation.
 
 The only thing I do not enjoy about ceph is performance. Probably need to do 
 more tweaking, but so far numbers
 are not very impressive. I have two exactly same servers running same OS, 
 kernel, etc. Each server has 6x 3TB
 drives (same model and firmware #).
 
 Server 1 runs ceph (2 replicas)
 Server 2 runs mdraid (raid-10)
 
 I ran some very basic benchmarks on both servers:
 
 dd if=/dev/zero of=/storage/test.bin bs=1M count=10
 Ceph: 113 MB/s
 mdraid: 467 MB/s
 
 
 dd if=/storage/test.bin of=/dev/null bs=1M
 Ceph: 114 MB/s
 mdraid: 550 MB/s
 
 
 As you can see, mdraid is by far faster than ceph. It could be by design, 
 or perhaps I am not doing it
 right. Even despite such difference in speed, I would still go with ceph 
 because *I think* it is more reliable.
 
 couple of things:
 
 1) Ceph is doing full data journal writes so is going to eat (at
 least) half of your write performance right there.
 
 2) Ceph tends to like lots of concurrency.  You'll probably see
 higher numbers with multiple dd reads/writes going at once.
 
 3) Ceph is a lot more complex than something like mdraid.  It gives
 you a lot more power and flexibility but the cost is greater
 complexity. There are probably things you can tune to get your
 numbers up, but it could take some work.
 
 Having said all of this, my primary test box is a single server and
 I can get 90MB/s+ per drive out of Ceph (with 24 drives!), but if I

Could you share the configurations and parameters you have modified, or
where I could find the associate documents? 

 was building a production box and never planned to expand to
 multiple servers, I'd certainly be looking into zfs or btrfs RAID.
 
 Mark
 
 
 Dmitry
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Best regards,
Guangliang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help with radosGW

2013-09-17 Thread Alexis GÜNST HORN
Hello to all,

I've a big issue with Ceph RadosGW.
I did a PoC some days ago with radosgw. It worked well.

Ceph version 0.67.3 under CentOS 6.4

Now, I'm installing a new cluster but, I can't succeed. I do not understand why.
Here is some elements :

ceph.conf:

[global]
filestore_xattr_use_omap = true
mon_host = 192.168.0.1,192.168.0.2,192.168.0.3
fsid = f261d4c5-2a93-43dc-85a9-85211ec7100f
mon_initial_members = mon-1, mon-2, mon-3
auth_supported = cephx
osd_journal_size = 10240

[osd]
cluster_network = 192.168.0.0/24
public_network = 192.168.1.0/24


[client.radosgw.gateway]
host = gw-1
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log
rgw print continue = false



I followed this doc to install radosgw :
http://ceph.com/docs/next/install/rpm/#installing-ceph-object-storage

I start httpd :
/etc/init.d/httpd start

I start radosgw :
[root@gw-1]# /etc/init.d/ceph-radosgw start
Starting radosgw instance(s)...
2013-09-17 08:07:11.954248 7f835d7fb820 -1 WARNING: libcurl doesn't
support curl_multi_wait()
2013-09-17 08:07:11.954253 7f835d7fb820 -1 WARNING: cross zone /
region transfer performance may be affected

I create a user :
radosgw-admin user create --uid=alexis

It works.
Fine.

So now, I connect to the gateway via a client (CyberDuck).
I can create a bucket : test.
Then, I try to upload a file = does not work.
I have a time out after about 30 secs.

And, of course, the file is not uploaded. A rados df on .rgw.buckets
show that there is no objects inside.

Here are some logs.

radosgw.log:
http://pastebin.com/6NNuczC5
(the last lines are because I stop radosgw, not to pollute the logs)

and httpd.log :
[Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI: comm
with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI:
incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
[Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI: comm
with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI:
incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
[Tue Sep 17 08:08:42 2013] [error] [client 46.231.147.8] FastCGI: comm
with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Tue Sep 17 08:08:46 2013] [error] [client 46.231.147.8] FastCGI:
incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
[Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI: comm
with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
[Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI:
incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
[Tue Sep 17 08:13:02 2013] [error] [client 46.231.147.8] FastCGI:
incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi



I'm really diapointed because i can't understand where is the issue.
Thanks A LOT for your help.

Alexis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
Hi to all.
Let's assume a Ceph cluster used to store VM disk images.
VMs will be booted directly from the RBD.

What will happens in case of OSD failure if the failed OSD is the
primary where VM is reading from ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rbd cp empty block

2013-09-17 Thread 王根意
Yeah,rbd clone works well, thanks a lot!


2013/9/16 Sage Weil s...@inktank.com

 On Mon, 16 Sep 2013, Chris Dunlop wrote:
  On Mon, Sep 16, 2013 at 09:20:29AM +0800, ??? wrote:
   Hi all:
  
   I have a 30G rbd block device as virtual machine disk, Aleady installed
   ubuntu 12.04. About 1G space used.
  
   When I want to deploy vm, I made a rbd cp. Then problem came, it
 copy 30G
   data instead of 1G. And this action take lots of time.
  
   Any ideal? I just want make it faster to deploy vm.
 
  It's a bug:
 
  http://tracker.ceph.com/issues/6257

 Instead of cp, you can use rbd clone; this is copy-on-write and will
 always be faster than rbd cp.

 sage
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
OPS 王根意
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to use the admin api (get user info)

2013-09-17 Thread
 hi
   i follow the  admin api  document  
http://ceph.com/docs/master/radosgw/adminops/ ,
when i get user info , it rentue 405 not allowed
my commond is

curl -XGET http://kp/admin/user?format=json -d'{uid:user1}'  
-H'Authoeization:AWS **:**' -H'Date:**' -i -v
the reasult is
405 Method not allowed , {'code':MethodNotAllowed}

the same commd will be work when I get usage, the commod is
curl -XGET http://kp/admin/usage?format=json -d'{uid:user1}'  
-H'Authoeization:AWS **:**' -H'Date:**' -i -v
it will return 200 ok , {entries:[], summary}

my ceph vesion is 0.56.3,
Thank you for your patience !
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Objects get via s3api FastCGI incomplete headers and hanging up

2013-09-17 Thread Mihály Árva-Tóth
Hello,

I'm trying to download objects from one container (which contains 3 million
objects, file sizes between 16K and 1024K) parallel 10 threads. I'm using
s3 binary comes from libs3. I'm monitoring download time, response time
of 80% lower than 50-80 ms. But sometimes download hanging up, up to 17
secs; apache returns with error code 500. apache error log (lot of):

[Tue Sep 17 11:33:11 2013] [error] [client 194.38.106.67] FastCGI: comm
with server /var/www/radosgw.fcgi aborted: idle timeout (30 sec)
[Tue Sep 17 11:33:11 2013] [error] [client 194.38.106.67] FastCGI:
incomplete headers (0 bytes) received from server /var/www/radosgw.fcgi
[Tue Sep 17 11:33:11 2013] [error] [client 194.38.106.67] Handler for
fastcgi-script returned invalid result code 1

I tried with native apache2/fastcgi ubuntu packages and Ceph built
apache2/fastcgi both. When I turn on rgw print continue = true with
modified build, the result is better very bit (less hungs). FastCgiWrapper
Off of course.

And if I set parallel get requests only 3 (instead of 10) the result is
much better, the longest hang only 1500 ms. So I think this is depends with
some resource management. But I get no idea.

Using ceph-0.67.4 with Ubuntu 12.04 x8_64.

I found the following issue (more than 1 year):
http://tracker.ceph.com/issues/2027

But this closed with unable to reproduce. I can reproduce every time.

Thank you,
Mihaly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Decrease radosgw logging level

2013-09-17 Thread Joao Eduardo Luis

On 09/13/2013 01:02 PM, Mihály Árva-Tóth wrote:

Hello,

How can I decrease logging level of radosgw? I uploaded 400k pieces of
objects and my radosgw log raises to 2 GiB. Current settings:

rgw_enable_usage_log = true
rgw_usage_log_tick_interval = 30
rgw_usage_log_flush_threshold = 1024
rgw_usage_max_shards = 32
rgw_usage_max_user_shards = 1
rgw_print_continue = false
rgw_enable_ops_log = false
rgw_ops_log_rados = false
log_file =
log_to_syslog = true


If you mean output from rgw itself to its own log, try adjusting 'debug 
rgw'.  Default is 1, so check if you have it set to some higher value. 
You can always set it to 0 too (debug rgw = 0)


  -Joao


--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-17 Thread Alfredo Deza
On Mon, Sep 16, 2013 at 8:30 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:
-Original Message-
From: Alfredo Deza [mailto:alfredo.d...@inktank.com]
Subject: Re: [ceph-users] problem with ceph-deploy hanging

ceph-deploy will use the user as you are currently executing. That is why, if
you are calling ceph-deploy as root, it will log in remotely as root.

So by a different user, I mean, something like, user `ceph` executing ceph-
deploy (yes, that same user needs to exist remotely too with correct
permissions)

 This is interesting.  Since the preflight has us set up passwordless SSH with 
 a default ceph user I assumed it didn't really matter what user I was logged 
 in as on the admin system.  Good to know.

Well, it is (for now) a crappy work around. We have fixed this in the
upcoming release :)

 Unfortunately, logging in as my ceph user on the admin system (with a 
 matching user on the target system) does not affect my result.  The 
 ceph-deploy install still hangs here:

 [cephtest02][INFO  ] Running command: wget -q -O- 
 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
 add -

 It has been suggested that this could be due to our firewall.  I have the 
 proxies configured in /etc/environment and when I run a wget myself (as the 
 ceph user, either directly on cephtest02 or via SSH command to cephtest02 
 from the admin system) it resolves the proxy and succeeds.  Is there any 
 reason the wget might behave differently when run by ceph-deploy and fail to 
 resolve the proxy?  Is there anywhere I might need to set proxy information 
 besides /etc/environment?

I was about to ask if you had tried running that command through SSH,
but you did and had correct behavior. This is puzzling for me because
that is exactly what ceph-deploy does :/

When you say 'via SSH command' you mean something like:

ssh cephtest02 sudo wget -q -O-
'https://ceph.com/git/?p=ceph.git,a=blob_plain;f=keys/release.asc' |
apt-key add -

Right?

The firewall might have something to do with it. How do you have your
proxies configured in /etc/environment ?

Again, in this next coming release, you will be able to tell
ceph-deploy to just install the packages without mangling your repos
(or installing keys)



 Or, any other thoughts on how to debug this further?

 Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd stuck creating a block device

2013-09-17 Thread Wido den Hollander

On 09/16/2013 11:29 AM, Nico Massenberg wrote:

Am 16.09.2013 um 11:25 schrieb Wido den Hollander w...@42on.com:


On 09/16/2013 11:18 AM, Nico Massenberg wrote:

Hi there,

I have successfully setup a ceph cluster with a healthy status.
When trying to create a rbd block device image I am stuck with an error which I 
have to ctrl+c:


ceph@vl0181:~/konkluster$ rbd create imagefoo --size 5120 --pool kontrastpool
2013-09-16 10:59:06.838235 7f3bcb9eb700  0 -- 192.168.111.109:0/1013698  
192.168.111.10:6806/3750 pipe(0x1fdfb00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x1fdfd60).fault


Any ideas anyone?


Is the Ceph cluster healthy?


Yes it is.



What does 'ceph -s' say?


ceph@vl0181:~/konkluster$ ceph -s
   cluster 3dad736b-a9fc-42bf-a2fb-399cb8cbb880
health HEALTH_OK
monmap e3: 3 mons at 
{ceph01=192.168.111.10:6789/0,ceph02=192.168.111.11:6789/0,ceph03=192.168.111.12:6789/0},
 election epoch 52, quorum 0,1,2 ceph01,ceph02,ceph03
osdmap e230: 12 osds: 12 up, 12 in
 pgmap v3963: 292 pgs: 292 active+clean; 0 bytes data, 450 MB used, 6847 GB 
/ 6847 GB avail
mdsmap e1: 0/0/1 up



If the cluster is healthy it seems like this client can't contact the Ceph 
cluster.


I have no problems contacting any node/monitor from the admin machine via ping 
or telnet.



It seems like the first monitor (ceph01) is not responding properly, is 
that one reachable?


And if you leave the rbd command running for some time, will it work 
eventually?


Wido




Thanks, Nico
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem with ceph-deploy hanging

2013-09-17 Thread Gilles Mocellin

Le 17/09/2013 14:48, Alfredo Deza a écrit :

On Mon, Sep 16, 2013 at 8:30 PM, Gruher, Joseph R
joseph.r.gru...@intel.com wrote:
[...]
Unfortunately, logging in as my ceph user on the admin system (with a matching user on 
the target system) does not affect my result.  The ceph-deploy install still 
hangs here:

[cephtest02][INFO  ] Running command: wget -q -O- 
'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key 
add -

It has been suggested that this could be due to our firewall.  I have the 
proxies configured in /etc/environment and when I run a wget myself (as the 
ceph user, either directly on cephtest02 or via SSH command to cephtest02 from 
the admin system) it resolves the proxy and succeeds.  Is there any reason the 
wget might behave differently when run by ceph-deploy and fail to resolve the 
proxy?  Is there anywhere I might need to set proxy information besides 
/etc/environment?


[...]

Just a thought, as it concern a proxy server.

On Debian, so perhaps also on Ubuntu, sudo does reset almost all 
environment variables, and it does for sure for http_proxy ones.
As ceph-deploy runs sudo on the other end, Perhaps /etc/environment 
(deprecated) is loaded for the normal user and reset by sudo.


I don't know the good way of solving this.
Perhaps, just add in the doc that while creating a user with sudo 
rights, to add the options not to reset http_proxy variables...


Extract of sudoers' man :
 By default, the env_reset option is enabled.  This causes commands 
to be executed with a new, minimal environment.  On AIX (and Linux 
systems without
 PAM), the environment is initialized with the contents of the 
/etc/environment file.  The new environment contains the TERM, PATH, 
HOME, MAIL, SHELL,
 LOGNAME, USER, USERNAME and SUDO_* variables in addition to 
variables from the invoking process permitted by the env_check and 
env_keep options.  This is

 effectively a whitelist for environment variables.


So you can add something like this in all ceph nodes' /etc/sudoers (use 
visudo) :


Defaults env_keep += http_proxy https_proxy ftp_proxy no_proxy

Hope it can help.

--
Gilles Mocellin
Nuage Libre


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
Hello all,
I am new to the list.

I have a single machines setup for testing Ceph.  It has a dual proc 6
cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
240GB SSDs and an OSD setup on each disk with the OSD and Journal in
separate partitions formatted with ext4.

My goal here is to prove just how fast Ceph can go and what kind of
performance to expect when using it as a back-end storage for virtual
machines mostly windows.  I would also like to try to understand how it
will scale IO by removing one disk of the three and doing the benchmark
tests.  But that is secondary.  So far here are my results.  I am aware
this is all sequential, I just want to know how fast it can go.

DD IO test of SSD disks:  I am testing 8K blocks since that is the default
block size of windows.
 dd of=ddbenchfile if=/dev/zero bs=8K count=100
819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

RADOS bench test with 3 SSD disks and 4MB object size(Default):
rados --no-cleanup bench -p pbench 30 write
Total writes made:  2061
Write size: 4194304
Bandwidth (MB/sec): 273.004

Stddev Bandwidth:   67.5237
Max bandwidth (MB/sec): 352
Min bandwidth (MB/sec): 0
Average Latency:0.234199
Stddev Latency: 0.130874
Max latency:0.867119
Min latency:0.039318
-
rados bench -p pbench 30 seq
Total reads made: 2061
Read size:4194304
Bandwidth (MB/sec):956.466

Average Latency:   0.0666347
Max latency:   0.208986
Min latency:   0.011625

This all looks like I would expect from using three disks.  The problems
appear to come with the 8K blocks/object size.

RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
rados --no-cleanup bench -b 8192 -p pbench 30 write
Total writes made:  13770
Write size: 8192
Bandwidth (MB/sec): 3.581

Stddev Bandwidth:   1.04405
Max bandwidth (MB/sec): 6.19531
Min bandwidth (MB/sec): 0
Average Latency:0.0348977
Stddev Latency: 0.0349212
Max latency:0.326429
Min latency:0.0019
--
rados bench -b 8192 -p pbench 30 seq
Total reads made: 13770
Read size:8192
Bandwidth (MB/sec):52.573

Average Latency:   0.00237483
Max latency:   0.006783
Min latency:   0.000521

So are these performance correct or is this something I missed with the
testing procedure?  The RADOS bench number with 8K block size are the same
we see when testing performance in an VM with SQLIO.  Does anyone know of
any configure changes that are needed to get the Ceph performance closer to
native performance with 8K blocks?

Thanks in advance.



-- 
-- 
*Jason Villalta*
Co-founder
[image: Inline image 1]
800.799.4407x1230 | www.RubixTechnology.comhttp://www.rubixtechnology.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gregory Farnum
The VM read will hang until a replica gets promoted and the VM resends the
read. In a healthy cluster with default settings this will take about 15
seconds.
-Greg

On Tuesday, September 17, 2013, Gandalf Corvotempesta wrote:

 Hi to all.
 Let's assume a Ceph cluster used to store VM disk images.
 VMs will be booted directly from the RBD.

 What will happens in case of OSD failure if the failed OSD is the
 primary where VM is reading from ?
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com javascript:;
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Gregory Farnum
Your 8k-block dd test is not nearly the same as your 8k-block rados bench
or SQL tests. Both rados bench and SQL require the write to be committed to
disk before moving on to the next one; dd is simply writing into the page
cache. So you're not going to get 460 or even 273MB/s with sync 8k
writes regardless of your settings.

However, I think you should be able to tune your OSDs into somewhat better
numbers -- that rados bench is giving you ~300IOPs on every OSD (with a
small pipeline!), and an SSD-based daemon should be going faster. What kind
of logging are you running with and what configs have you set?

Hopefully you can get Mark or Sam or somebody who's done some performance
tuning to offer some tips as well. :)
-Greg

On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the default
 block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 Bandwidth (MB/sec): 273.004

 Stddev Bandwidth:   67.5237
 Max bandwidth (MB/sec): 352
 Min bandwidth (MB/sec): 0
 Average Latency:0.234199
 Stddev Latency: 0.130874
 Max latency:0.867119
 Min latency:0.039318
 -
 rados bench -p pbench 30 seq
 Total reads made: 2061
 Read size:4194304
 Bandwidth (MB/sec):956.466

 Average Latency:   0.0666347
 Max latency:   0.208986
 Min latency:   0.011625

 This all looks like I would expect from using three disks.  The problems
 appear to come with the 8K blocks/object size.

 RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
 rados --no-cleanup bench -b 8192 -p pbench 30 write
 Total writes made:  13770
 Write size: 8192
 Bandwidth (MB/sec): 3.581

 Stddev Bandwidth:   1.04405
 Max bandwidth (MB/sec): 6.19531
 Min bandwidth (MB/sec): 0
 Average Latency:0.0348977
 Stddev Latency: 0.0349212
 Max latency:0.326429
 Min latency:0.0019
 --
 rados bench -b 8192 -p pbench 30 seq
 Total reads made: 13770
 Read size:8192
 Bandwidth (MB/sec):52.573

 Average Latency:   0.00237483
 Max latency:   0.006783
 Min latency:   0.000521

 So are these performance correct or is this something I missed with the
 testing procedure?  The RADOS bench number with 8K block size are the same
 we see when testing performance in an VM with SQLIO.  Does anyone know of
 any configure changes that are needed to get the Ceph performance closer to
 native performance with 8K blocks?

 Thanks in advance.



 --
 --
 *Jason Villalta*
 Co-founder
 [image: Inline image 1]
 800.799.4407x1230 | www.RubixTechnology.comhttp://www.rubixtechnology.com/



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Campbell, Bill
Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 
8k as a default for your configuration? 

- Original Message -

From: Gregory Farnum g...@inktank.com 
To: Jason Villalta ja...@rubixnet.com 
Cc: ceph-users@lists.ceph.com 
Sent: Tuesday, September 17, 2013 10:40:09 AM 
Subject: Re: [ceph-users] Ceph performance with 8K blocks. 

Your 8k-block dd test is not nearly the same as your 8k-block rados bench or 
SQL tests. Both rados bench and SQL require the write to be committed to disk 
before moving on to the next one; dd is simply writing into the page cache. So 
you're not going to get 460 or even 273MB/s with sync 8k writes regardless of 
your settings. 

However, I think you should be able to tune your OSDs into somewhat better 
numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small 
pipeline!), and an SSD-based daemon should be going faster. What kind of 
logging are you running with and what configs have you set? 

Hopefully you can get Mark or Sam or somebody who's done some performance 
tuning to offer some tips as well. :) 
-Greg 

On Tuesday, September 17, 2013, Jason Villalta wrote: 



Hello all, 
I am new to the list. 

I have a single machines setup for testing Ceph. It has a dual proc 6 
cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB 
SSDs and an OSD setup on each disk with the OSD and Journal in separate 
partitions formatted with ext4. 

My goal here is to prove just how fast Ceph can go and what kind of performance 
to expect when using it as a back-end storage for virtual machines mostly 
windows. I would also like to try to understand how it will scale IO by 
removing one disk of the three and doing the benchmark tests. But that is 
secondary. So far here are my results. I am aware this is all sequential, I 
just want to know how fast it can go. 

DD IO test of SSD disks: I am testing 8K blocks since that is the default block 
size of windows. 
dd of=ddbenchfile if=/dev/zero bs=8K count=100 
819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s 

dd if=ddbenchfile of=/dev/null bs=8K 
819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s 

RADOS bench test with 3 SSD disks and 4MB object size(Default): 
rados --no-cleanup bench -p pbench 30 write 
Total writes made: 2061 
Write size: 4194304 
Bandwidth (MB/sec): 273.004 

Stddev Bandwidth: 67.5237 
Max bandwidth (MB/sec): 352 
Min bandwidth (MB/sec): 0 
Average Latency: 0.234199 
Stddev Latency: 0.130874 
Max latency: 0.867119 
Min latency: 0.039318 
- 
rados bench -p pbench 30 seq 
Total reads made: 2061 
Read size: 4194304 
Bandwidth (MB/sec): 956.466 

Average Latency: 0.0666347 
Max latency: 0.208986 
Min latency: 0.011625 

This all looks like I would expect from using three disks. The problems appear 
to come with the 8K blocks/object size. 

RADOS bench test with 3 SSD disks and 8K object size(8K blocks): 
rados --no-cleanup bench -b 8192 -p pbench 30 write 
Total writes made: 13770 
Write size: 8192 
Bandwidth (MB/sec): 3.581 

Stddev Bandwidth: 1.04405 
Max bandwidth (MB/sec): 6.19531 
Min bandwidth (MB/sec): 0 
Average Latency: 0.0348977 
Stddev Latency: 0.0349212 
Max latency: 0.326429 
Min latency: 0.0019 
-- 
rados bench -b 8192 -p pbench 30 seq 
Total reads made: 13770 
Read size: 8192 
Bandwidth (MB/sec): 52.573 

Average Latency: 0.00237483 
Max latency: 0.006783 
Min latency: 0.000521 

So are these performance correct or is this something I missed with the testing 
procedure? The RADOS bench number with 8K block size are the same we see when 
testing performance in an VM with SQLIO. Does anyone know of any configure 
changes that are needed to get the Ceph performance closer to native 
performance with 8K blocks? 

Thanks in advance. 



-- 
-- 
Jason Villalta 
Co-founder 
800.799.4407x1230 | www.RubixTechnology.com 





-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Gregory Farnum
Oh, and you should run some local sync benchmarks against these drives to
figure out what sort of performance they can deliver with two write streams
going on, too. Sometimes the drives don't behave the way one would expect.
-Greg

On Tuesday, September 17, 2013, Gregory Farnum wrote:

 Your 8k-block dd test is not nearly the same as your 8k-block rados bench
 or SQL tests. Both rados bench and SQL require the write to be committed to
 disk before moving on to the next one; dd is simply writing into the page
 cache. So you're not going to get 460 or even 273MB/s with sync 8k
 writes regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat better
 numbers -- that rados bench is giving you ~300IOPs on every OSD (with a
 small pipeline!), and an SSD-based daemon should be going faster. What kind
 of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some performance
 tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the
 default block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 Bandwidth (MB/sec): 273.004

 Stddev Bandwidth:   67.5237
 Max bandwidth (MB/sec): 352
 Min bandwidth (MB/sec): 0
 Average Latency:0.234199
 Stddev Latency: 0.130874
 Max latency:0.867119
 Min latency:0.039318
 -
 rados bench -p pbench 30 seq
 Total reads made: 2061
 Read size:4194304
 Bandwidth (MB/sec):956.466

 Average Latency:   0.0666347
 Max latency:   0.208986
 Min latency:   0.011625

 This all looks like I would expect from using three disks.  The problems
 appear to come with the 8K blocks/object size.

 RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
 rados --no-cleanup bench -b 8192 -p pbench 30 write
 Total writes made:  13770
 Write size: 8192
 Bandwidth (MB/sec): 3.581

 Stddev Bandwidth:   1.04405
 Max bandwidth (MB/sec): 6.19531
 Min bandwidth (MB/sec): 0
 Average Latency:0.0348977
 Stddev Latency: 0.0349212
 Max latency:0.326429
 Min latency:0.0019
 --
 rados bench -b 8192 -p pbench 30 seq
 Total reads made: 13770
 Read size:8192
 Bandwidth (MB/sec):52.573

 Average Latency:   0.00237483
 Max latency:   0.006783
 Min latency:   0.000521

 So are these performance correct or is this something I missed with the
 testing procedure?  The RADOS bench number with 8K block size are the same
 we see when testing performance in an VM with SQLIO.  Does anyone know of
 any configure changes that are needed to get the Ceph performance closer to
 native performance with 8K blocks?

 Thanks in advance.



 --
 --
 *Jason Villalta*
 Co-founder
 [image: Inline image 1]
 800.799.4407x1230 | www.RubixTechnology.comhttp://www.rubixtechnology.com/



 --
 Software Engineer #42 @ http://inktank.com | http://ceph.com



-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM storage and OSD Ceph failures

2013-09-17 Thread Gandalf Corvotempesta
2013/9/17 Gregory Farnum g...@inktank.com:
 The VM read will hang until a replica gets promoted and the VM resends the
 read. In a healthy cluster with default settings this will take about 15
 seconds.

Thank you.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Pause i/o from time to time

2013-09-17 Thread Mike Dawson
You could be suffering from a known, but unfixed issue [1] where spindle 
contention from scrub and deep-scrub cause periodic stalls in RBD. You 
can try to disable scrub and deep-scrub with:


# ceph osd set noscrub
# ceph osd set nodeep-scrub

If your problem stops, Issue #6278 is likely the cause. To re-enable 
scrub and deep-scrub:


# ceph osd unset noscrub
# ceph osd unset nodeep-scrub

Because you seem to only have two OSDs, you may also be saturating your 
disks even without scrub or deep-scrub.


http://tracker.ceph.com/issues/6278

Cheers,
Mike Dawson


On 9/16/2013 12:30 PM, Timofey wrote:

I use ceph for HA-cluster.
Some time ceph rbd go to have pause in work (stop i/o operations). Sometime it 
can be when one of OSD slow response to requests. Sometime it can be my mistake 
(xfs_freeze -f for one of OSD-drive).
I have 2 storage servers with one osd on each. This pauses can be few minutes.

1. Is any settings for fast change primary osd if current osd work bad (slow, 
don't response).
2. Can I use ceph-rbd in software raid-array with local drive, for use local 
drive instead of ceph if ceph cluster fail?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
Thanks for you feed back it is helpful.

I may have been wrong about the default windows block size.  What would be
the best tests to compare native performance of the SSD disks at 4K blocks
vs Ceph performance with 4K blocks?  It just seems their is a huge
difference in the results.


On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
bcampb...@axcess-financial.com wrote:

 Windows default (NTFS) is a 4k block.  Are you changing the allocation
 unit to 8k as a default for your configuration?

 --
 *From: *Gregory Farnum g...@inktank.com
 *To: *Jason Villalta ja...@rubixnet.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Tuesday, September 17, 2013 10:40:09 AM
 *Subject: *Re: [ceph-users] Ceph performance with 8K blocks.


 Your 8k-block dd test is not nearly the same as your 8k-block rados bench
 or SQL tests. Both rados bench and SQL require the write to be committed to
 disk before moving on to the next one; dd is simply writing into the page
 cache. So you're not going to get 460 or even 273MB/s with sync 8k
 writes regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat better
 numbers -- that rados bench is giving you ~300IOPs on every OSD (with a
 small pipeline!), and an SSD-based daemon should be going faster. What kind
 of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some performance
 tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the
 default block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 Bandwidth (MB/sec): 273.004

 Stddev Bandwidth:   67.5237
 Max bandwidth (MB/sec): 352
 Min bandwidth (MB/sec): 0
 Average Latency:0.234199
 Stddev Latency: 0.130874
 Max latency:0.867119
 Min latency:0.039318
 -
 rados bench -p pbench 30 seq
 Total reads made: 2061
 Read size:4194304
 Bandwidth (MB/sec):956.466

 Average Latency:   0.0666347
 Max latency:   0.208986
 Min latency:   0.011625

 This all looks like I would expect from using three disks.  The problems
 appear to come with the 8K blocks/object size.

 RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
 rados --no-cleanup bench -b 8192 -p pbench 30 write
 Total writes made:  13770
 Write size: 8192
 Bandwidth (MB/sec): 3.581

 Stddev Bandwidth:   1.04405
 Max bandwidth (MB/sec): 6.19531
 Min bandwidth (MB/sec): 0
 Average Latency:0.0348977
 Stddev Latency: 0.0349212
 Max latency:0.326429
 Min latency:0.0019
 --
 rados bench -b 8192 -p pbench 30 seq
 Total reads made: 13770
 Read size:8192
 Bandwidth (MB/sec):52.573

 Average Latency:   0.00237483
 Max latency:   0.006783
 Min latency:   0.000521

 So are these performance correct or is this something I missed with the
 testing procedure?  The RADOS bench number with 8K block size are the same
 we see when testing performance in an VM with SQLIO.  Does anyone know of
 any configure changes that are needed to get the Ceph performance closer to
 native performance with 8K blocks?

 Thanks in advance.



 --
 --
 *Jason Villalta*
 Co-founder
 [image: Inline image 1]
 800.799.4407x1230 | www.RubixTechnology.comhttp://www.rubixtechnology.com/



 --
 Software Engineer #42 @ http://inktank.com | http://ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 *NOTICE: Protect the information in this message in accordance with the
 company's security policies. If you received this message in error,
 immediately notify the sender and destroy all copies.*




-- 
-- 
*Jason 

Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
Ahh thanks I will try the test again with that flag and post the results.
On Sep 17, 2013 11:38 AM, Campbell, Bill bcampb...@axcess-financial.com
wrote:

 As Gregory mentioned, your 'dd' test looks to be reading from the cache
 (you are writing 8GB in, and then reading that 8GB out, so the reads are
 all cached reads) so the performance is going to seem good.  You can add
 the 'oflag=direct' to your dd test to try and get a more accurate reading
 from that.

 RADOS performance from what I've seen is largely going to hinge on replica
 size and journal location.  Are your journals on separate disks or on the
 same disk as the OSD?  What is the replica size of your pool?

 --
 *From: *Jason Villalta ja...@rubixnet.com
 *To: *Bill Campbell bcampb...@axcess-financial.com
 *Cc: *Gregory Farnum g...@inktank.com, ceph-users 
 ceph-users@lists.ceph.com
 *Sent: *Tuesday, September 17, 2013 11:31:43 AM
 *Subject: *Re: [ceph-users] Ceph performance with 8K blocks.

 Thanks for you feed back it is helpful.

 I may have been wrong about the default windows block size.  What would be
 the best tests to compare native performance of the SSD disks at 4K blocks
 vs Ceph performance with 4K blocks?  It just seems their is a huge
 difference in the results.


 On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:

 Windows default (NTFS) is a 4k block.  Are you changing the allocation
 unit to 8k as a default for your configuration?

 --
 *From: *Gregory Farnum g...@inktank.com
 *To: *Jason Villalta ja...@rubixnet.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Tuesday, September 17, 2013 10:40:09 AM
 *Subject: *Re: [ceph-users] Ceph performance with 8K blocks.


 Your 8k-block dd test is not nearly the same as your 8k-block rados bench
 or SQL tests. Both rados bench and SQL require the write to be committed to
 disk before moving on to the next one; dd is simply writing into the page
 cache. So you're not going to get 460 or even 273MB/s with sync 8k
 writes regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat
 better numbers -- that rados bench is giving you ~300IOPs on every OSD
 (with a small pipeline!), and an SSD-based daemon should be going faster.
 What kind of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some performance
 tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the
 default block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 Bandwidth (MB/sec): 273.004

 Stddev Bandwidth:   67.5237
 Max bandwidth (MB/sec): 352
 Min bandwidth (MB/sec): 0
 Average Latency:0.234199
 Stddev Latency: 0.130874
 Max latency:0.867119
 Min latency:0.039318
 -
 rados bench -p pbench 30 seq
 Total reads made: 2061
 Read size:4194304
 Bandwidth (MB/sec):956.466

 Average Latency:   0.0666347
 Max latency:   0.208986
 Min latency:   0.011625

 This all looks like I would expect from using three disks.  The problems
 appear to come with the 8K blocks/object size.

 RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
 rados --no-cleanup bench -b 8192 -p pbench 30 write
 Total writes made:  13770
 Write size: 8192
 Bandwidth (MB/sec): 3.581

 Stddev Bandwidth:   1.04405
 Max bandwidth (MB/sec): 6.19531
 Min bandwidth (MB/sec): 0
 Average Latency:0.0348977
 Stddev Latency: 0.0349212
 Max latency:0.326429
 Min latency:0.0019
 --
 rados bench -b 8192 -p pbench 30 seq
 Total reads made: 13770
 Read size:8192
 Bandwidth (MB/sec):52.573

 Average Latency:   0.00237483
 Max latency:  

Re: [ceph-users] Rugged data distribution on OSDs

2013-09-17 Thread Gregory Farnum
Well, that all looks good to me. I'd just keep writing and see if the
distribution evens out some.
You could also double or triple the number of PGs you're using in that
pool; it's not atrocious but it's a little low for 9 OSDs.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 17, 2013 at 12:06 AM, Mihály Árva-Tóth
mihaly.arva-t...@virtual-call-center.eu wrote:
 Hello Greg,

 Output of 'ceph osd tree':

 # idweight  type name   up/down reweight
 -1  27.3root default
 -2  9.1 host stor1
 0   3.64osd.0   up  1
 1   3.64osd.1   up  1
 2   1.82osd.2   up  1
 -3  9.1 host stor2
 3   3.64osd.3   up  1
 4   1.82osd.4   up  1
 6   3.64osd.6   up  1
 -4  9.1 host stor3
 7   3.64osd.7   up  1
 8   3.64osd.8   up  1
 9   1.82osd.9   up  1

 (missing of osd.5 comes from previous test when I remove HDD from a working
 cluster, but I think this is not relevant now)

 root@stor3:~# ceph osd pool get .rgw.buckets pg_num
 pg_num: 250
 root@stor3:~# ceph osd pool get .rgw.buckets pgp_num
 pgp_num: 250

 pgmap v129814: 514 pgs: 514 active; 818 GB data, 1682 GB used

 Thank you,
 Mihaly

 2013/9/16 Gregory Farnum g...@inktank.com

 What is your PG count and what's the output of ceph osd tree? It's
 possible that you've just got a slightly off distribution since there
 still isn't much data in the cluster (probabilistic placement and all
 that), but let's cover the basics first.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Sep 16, 2013 at 2:08 AM, Mihály Árva-Tóth
 mihaly.arva-t...@virtual-call-center.eu wrote:
  Hello,
 
  I made some tests on 3 node Ceph cluster: upload 3 million 50 KiB object
  to
  single container. Speed and performance were okay. But data does not
  distributed correctly. Every node has got 2 pcs. 4 TB and 1 pc. 2 TB
  HDD.
 
  osd.0 41 GB (4 TB)
  osd.1 47 GB (4 TB)
  osd.3 16 GB (2 TB)
  osd.4 40 GB (4 TB)
  osd.5 49 GB (4 TB)
  osd.6 17 GB (2 TB)
  osd.7 48 GB (4 TB)
  osd.8 42 GB (4 TB)
  osd.9 18 GB (2 TB)
 
  Every 4 TB and 2 TB HDDs are from same vendor and same type. (WD RE
  SATA)
 
  I monitored iops with Zabbix under test, you can see here:
  http://ctrlv.in/237368
  (sda and sdb are system HDDs) This graph are same on every three nodes.
 
  Is there any idea what's wrong or what should I see?
 
  I'm using ceph-0.67.3 on Ubuntu 12.04.3 x86_64.
 
  Thank you,
  Mihaly
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with radosGW

2013-09-17 Thread Yehuda Sadeh
On Tue, Sep 17, 2013 at 1:29 AM, Alexis GÜNST HORN
alexis.gunsth...@outscale.com wrote:
 Hello to all,

 I've a big issue with Ceph RadosGW.
 I did a PoC some days ago with radosgw. It worked well.

 Ceph version 0.67.3 under CentOS 6.4

 Now, I'm installing a new cluster but, I can't succeed. I do not understand 
 why.
 Here is some elements :

 ceph.conf:

 [global]
 filestore_xattr_use_omap = true
 mon_host = 192.168.0.1,192.168.0.2,192.168.0.3
 fsid = f261d4c5-2a93-43dc-85a9-85211ec7100f
 mon_initial_members = mon-1, mon-2, mon-3
 auth_supported = cephx
 osd_journal_size = 10240

 [osd]
 cluster_network = 192.168.0.0/24
 public_network = 192.168.1.0/24


 [client.radosgw.gateway]
 host = gw-1
 keyring = /etc/ceph/keyring.radosgw.gateway
 rgw socket path = /tmp/radosgw.sock
 log file = /var/log/ceph/radosgw.log
 rgw print continue = false



 I followed this doc to install radosgw :
 http://ceph.com/docs/next/install/rpm/#installing-ceph-object-storage

 I start httpd :
 /etc/init.d/httpd start

 I start radosgw :
 [root@gw-1]# /etc/init.d/ceph-radosgw start
 Starting radosgw instance(s)...
 2013-09-17 08:07:11.954248 7f835d7fb820 -1 WARNING: libcurl doesn't
 support curl_multi_wait()
 2013-09-17 08:07:11.954253 7f835d7fb820 -1 WARNING: cross zone /
 region transfer performance may be affected

 I create a user :
 radosgw-admin user create --uid=alexis

 It works.
 Fine.

 So now, I connect to the gateway via a client (CyberDuck).
 I can create a bucket : test.
 Then, I try to upload a file = does not work.
 I have a time out after about 30 secs.

 And, of course, the file is not uploaded. A rados df on .rgw.buckets
 show that there is no objects inside.

 Here are some logs.

 radosgw.log:
 http://pastebin.com/6NNuczC5
 (the last lines are because I stop radosgw, not to pollute the logs)

 and httpd.log :
 [Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:08:42 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:08:46 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:13:02 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi


Are you using the correct fastcgi apache module?

Yehuda



 I'm really diapointed because i can't understand where is the issue.
 Thanks A LOT for your help.

 Alexis
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Disk partition and replicas

2013-09-17 Thread Jordi Arcas
Hi!
I've a remote server with one unit where is installed Ubuntu. I can't create 
another partition on the disk to install OSD because is mounted. There is 
another way to install OSD? Maybe in a folder?

And another question... Could I configure Ceph to make a particular replica in 
a particular OSD? For example, imagine that I'm interested to have a replica of 
a file in a server that runs faster than others.

Thanks!!


-
Jordi___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with radosGW

2013-09-17 Thread John Wilkins
I see that you added your public and cluster networks under an [osd]
section. All daemons use the public network, and OSDs use the cluster
network. Consider moving those settings to [global].
http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-networks

Also, I do believe I had a doc bug to fix here.
http://tracker.ceph.com/issues/6182  It is now resolved. The s3gw.fcgi
file should be in /var/www as suggested. However, my chmod instruction
pointed to an incorrect directory. Can you take a look at that and see
if that helps?

On Tue, Sep 17, 2013 at 1:29 AM, Alexis GÜNST HORN
alexis.gunsth...@outscale.com wrote:
 Hello to all,

 I've a big issue with Ceph RadosGW.
 I did a PoC some days ago with radosgw. It worked well.

 Ceph version 0.67.3 under CentOS 6.4

 Now, I'm installing a new cluster but, I can't succeed. I do not understand 
 why.
 Here is some elements :

 ceph.conf:

 [global]
 filestore_xattr_use_omap = true
 mon_host = 192.168.0.1,192.168.0.2,192.168.0.3
 fsid = f261d4c5-2a93-43dc-85a9-85211ec7100f
 mon_initial_members = mon-1, mon-2, mon-3
 auth_supported = cephx
 osd_journal_size = 10240

 [osd]
 cluster_network = 192.168.0.0/24
 public_network = 192.168.1.0/24


 [client.radosgw.gateway]
 host = gw-1
 keyring = /etc/ceph/keyring.radosgw.gateway
 rgw socket path = /tmp/radosgw.sock
 log file = /var/log/ceph/radosgw.log
 rgw print continue = false



 I followed this doc to install radosgw :
 http://ceph.com/docs/next/install/rpm/#installing-ceph-object-storage

 I start httpd :
 /etc/init.d/httpd start

 I start radosgw :
 [root@gw-1]# /etc/init.d/ceph-radosgw start
 Starting radosgw instance(s)...
 2013-09-17 08:07:11.954248 7f835d7fb820 -1 WARNING: libcurl doesn't
 support curl_multi_wait()
 2013-09-17 08:07:11.954253 7f835d7fb820 -1 WARNING: cross zone /
 region transfer performance may be affected

 I create a user :
 radosgw-admin user create --uid=alexis

 It works.
 Fine.

 So now, I connect to the gateway via a client (CyberDuck).
 I can create a bucket : test.
 Then, I try to upload a file = does not work.
 I have a time out after about 30 secs.

 And, of course, the file is not uploaded. A rados df on .rgw.buckets
 show that there is no objects inside.

 Here are some logs.

 radosgw.log:
 http://pastebin.com/6NNuczC5
 (the last lines are because I stop radosgw, not to pollute the logs)

 and httpd.log :
 [Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:02:15 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:02:45 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:08:42 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:08:46 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI: comm
 with server /var/www/s3gw.fcgi aborted: idle timeout (30 sec)
 [Tue Sep 17 08:12:35 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi
 [Tue Sep 17 08:13:02 2013] [error] [client 46.231.147.8] FastCGI:
 incomplete headers (0 bytes) received from server /var/www/s3gw.fcgi



 I'm really diapointed because i can't understand where is the issue.
 Thanks A LOT for your help.

 Alexis
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack user survey

2013-09-17 Thread Sage Weil
If you use OpenStack, you should fill out the user survey:

https://www.openstack.org/user-survey/Login

In particular, it helps us to know how openstack users consume their 
storage, and it helps the larger community to know what kind of storage 
systems are being deployed.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Campbell, Bill
As Gregory mentioned, your 'dd' test looks to be reading from the cache (you are writing 8GB in, and then reading that 8GB out, so the reads are all cached reads) so the performance is going to seem good. You can add the 'oflag=direct' to your dd test to try and get a more accurate reading from that. RADOS performance from what I've seen is largely going to hinge on replica size and journal location. Are your journals on separate disks or on the same disk as the OSD? What is the replica size of your pool?From: "Jason Villalta" ja...@rubixnet.comTo: "Bill Campbell" bcampb...@axcess-financial.comCc: "Gregory Farnum" g...@inktank.com, "ceph-users" ceph-users@lists.ceph.comSent: Tuesday, September 17, 2013 11:31:43 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Thanks for you feed back it is helpful.I may have been wrong about the default windows block size. What would be the best tests to compare native performance of the SSD disks at 4K blocks vs Ceph performance with 4K blocks? It just seems their is a huge difference in the results.
On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill bcampb...@axcess-financial.com wrote:
Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 8k as a default for your configuration?
From: "Gregory Farnum" g...@inktank.com
To: "Jason Villalta" ja...@rubixnet.comCc: ceph-users@lists.ceph.com
Sent: Tuesday, September 17, 2013 10:40:09 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even273MB/s with sync 8k writesregardless of your settings.

However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemonshould be going faster. What kind of logging are you running with and what configs have you set?

Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :)-GregOn Tuesday, September 17, 2013, Jason Villalta  wrote:

Hello all, I am new to the list.I have a single machines setup for testing Ceph. It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4.



My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows. I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests. But that is secondary. So far here are my results. I am aware this is all sequential, I just want to know how fast it can go.



DD IO test of SSD disks: I am testing 8K blocks since that is the default block size of windows.dd of=ddbenchfile if=/dev/zero bs=8K count=100819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s


dd if=ddbenchfile of=/dev/null bs=8K819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s
RADOS bench test with 3 SSD disks and 4MB object size(Default):

rados --no-cleanup bench -p pbench 30 writeTotal writes made:   2061Write size:   4194304Bandwidth (MB/sec):   273.004Stddev Bandwidth:67.5237


Max bandwidth (MB/sec): 352Min bandwidth (MB/sec): 0Average Latency:0.234199Stddev Latency: 0.130874Max latency:  0.867119Min latency:  0.039318


-rados bench -p pbench 30 seqTotal reads made:   2061Read size:  4194304Bandwidth (MB/sec):  956.466Average Latency:0.0666347


Max latency:  0.208986Min latency:  0.011625This all looks like I would expect from using three disks. The problems appear to come with the 8K blocks/object size.


RADOS bench test with 3 SSD disks and 8K object size(8K blocks):rados --no-cleanup bench -b 8192 -p pbench 30 writeTotal writes made:   13770Write size:   8192


Bandwidth (MB/sec):   3.581Stddev Bandwidth:1.04405Max bandwidth (MB/sec): 6.19531Min bandwidth (MB/sec): 0Average Latency:0.0348977


Stddev Latency: 0.0349212Max latency:  0.326429Min latency:  0.0019--rados bench -b 8192 -p pbench 30 seqTotal reads made:   13770


Read size:  8192Bandwidth (MB/sec):  52.573Average Latency:0.00237483Max latency:  0.006783Min latency:  0.000521


So are these performance correct or is this something I missed with the testing procedure? The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO. Does anyone know of any configure changes that are needed to get the Ceph performance closer to native performance with 8K blocks?


Thanks in advance.-- --



Jason Villalta
Co-founder
800.799.4407x1230|www.RubixTechnology.com


-- Software Engineer #42 @ 

Re: [ceph-users] problem with ceph-deploy hanging

2013-09-17 Thread Gruher, Joseph R


-Original Message-
From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
boun...@lists.ceph.com] On Behalf Of Gilles Mocellin

So you can add something like this in all ceph nodes' /etc/sudoers (use
visudo) :

Defaults env_keep += http_proxy https_proxy ftp_proxy no_proxy

Hope it can help.


Thanks for the suggestion!  However, no effect on the problem from this change.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
I will try both suggestions,  Thank you for your input.


On Tue, Sep 17, 2013 at 5:06 PM, Josh Durgin josh.dur...@inktank.comwrote:

 Also enabling rbd writeback caching will allow requests to be merged,
 which will help a lot for small sequential I/O.


 On 09/17/2013 02:03 PM, Gregory Farnum wrote:

 Try it with oflag=dsync instead? I'm curious what kind of variation
 these disks will provide.

 Anyway, you're not going to get the same kind of performance with
 RADOS on 8k sync IO that you will with a local FS. It needs to
 traverse the network and go through work queues in the daemon; your
 primary limiter here is probably the per-request latency that you're
 seeing (average ~30 ms, looking at the rados bench results). The good
 news is that means you should be able to scale out to a lot of
 clients, and if you don't force those 8k sync IOs (which RBD won't,
 unless the application asks for them by itself using directIO or
 frequent fsync or whatever) your performance will go way up.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta ja...@rubixnet.com
 wrote:


 Here are the stats with direct io.

 dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=direct
 819200 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

 These numbers are still over all much faster than when using RADOS bench.
 The replica is set to 2.  The Journals are on the same disk but separate
 partitions.

 I kept the block size the same 8K.




 On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill 
 bcampbell@axcess-financial.**com bcampb...@axcess-financial.com
 wrote:


 As Gregory mentioned, your 'dd' test looks to be reading from the cache
 (you are writing 8GB in, and then reading that 8GB out, so the reads are
 all cached reads) so the performance is going to seem good.  You can add
 the 'oflag=direct' to your dd test to try and get a more accurate reading
 from that.

 RADOS performance from what I've seen is largely going to hinge on
 replica size and journal location.  Are your journals on separate disks or
 on the same disk as the OSD?  What is the replica size of your pool?

 __**__
 From: Jason Villalta ja...@rubixnet.com
 To: Bill Campbell 
 bcampbell@axcess-financial.**combcampb...@axcess-financial.com
 
 Cc: Gregory Farnum g...@inktank.com, ceph-users 
 ceph-users@lists.ceph.com
 Sent: Tuesday, September 17, 2013 11:31:43 AM

 Subject: Re: [ceph-users] Ceph performance with 8K blocks.

 Thanks for you feed back it is helpful.

 I may have been wrong about the default windows block size.  What would
 be the best tests to compare native performance of the SSD disks at 4K
 blocks vs Ceph performance with 4K blocks?  It just seems their is a huge
 difference in the results.


 On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
 bcampbell@axcess-financial.**com bcampb...@axcess-financial.com
 wrote:


 Windows default (NTFS) is a 4k block.  Are you changing the allocation
 unit to 8k as a default for your configuration?

 __**__
 From: Gregory Farnum g...@inktank.com
 To: Jason Villalta ja...@rubixnet.com
 Cc: ceph-users@lists.ceph.com
 Sent: Tuesday, September 17, 2013 10:40:09 AM
 Subject: Re: [ceph-users] Ceph performance with 8K blocks.


 Your 8k-block dd test is not nearly the same as your 8k-block rados
 bench or SQL tests. Both rados bench and SQL require the write to be
 committed to disk before moving on to the next one; dd is simply writing
 into the page cache. So you're not going to get 460 or even 273MB/s with
 sync 8k writes regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat
 better numbers -- that rados bench is giving you ~300IOPs on every OSD
 (with a small pipeline!), and an SSD-based daemon should be going faster.
 What kind of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some
 performance tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:


 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc
 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the
 default 

Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Gregory Farnum
Try it with oflag=dsync instead? I'm curious what kind of variation
these disks will provide.

Anyway, you're not going to get the same kind of performance with
RADOS on 8k sync IO that you will with a local FS. It needs to
traverse the network and go through work queues in the daemon; your
primary limiter here is probably the per-request latency that you're
seeing (average ~30 ms, looking at the rados bench results). The good
news is that means you should be able to scale out to a lot of
clients, and if you don't force those 8k sync IOs (which RBD won't,
unless the application asks for them by itself using directIO or
frequent fsync or whatever) your performance will go way up.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta ja...@rubixnet.com wrote:

 Here are the stats with direct io.

 dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=direct
 819200 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

 These numbers are still over all much faster than when using RADOS bench.
 The replica is set to 2.  The Journals are on the same disk but separate 
 partitions.

 I kept the block size the same 8K.




 On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:

 As Gregory mentioned, your 'dd' test looks to be reading from the cache (you 
 are writing 8GB in, and then reading that 8GB out, so the reads are all 
 cached reads) so the performance is going to seem good.  You can add the 
 'oflag=direct' to your dd test to try and get a more accurate reading from 
 that.

 RADOS performance from what I've seen is largely going to hinge on replica 
 size and journal location.  Are your journals on separate disks or on the 
 same disk as the OSD?  What is the replica size of your pool?

 
 From: Jason Villalta ja...@rubixnet.com
 To: Bill Campbell bcampb...@axcess-financial.com
 Cc: Gregory Farnum g...@inktank.com, ceph-users 
 ceph-users@lists.ceph.com
 Sent: Tuesday, September 17, 2013 11:31:43 AM

 Subject: Re: [ceph-users] Ceph performance with 8K blocks.

 Thanks for you feed back it is helpful.

 I may have been wrong about the default windows block size.  What would be 
 the best tests to compare native performance of the SSD disks at 4K blocks 
 vs Ceph performance with 4K blocks?  It just seems their is a huge 
 difference in the results.


 On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:

 Windows default (NTFS) is a 4k block.  Are you changing the allocation unit 
 to 8k as a default for your configuration?

 
 From: Gregory Farnum g...@inktank.com
 To: Jason Villalta ja...@rubixnet.com
 Cc: ceph-users@lists.ceph.com
 Sent: Tuesday, September 17, 2013 10:40:09 AM
 Subject: Re: [ceph-users] Ceph performance with 8K blocks.


 Your 8k-block dd test is not nearly the same as your 8k-block rados bench 
 or SQL tests. Both rados bench and SQL require the write to be committed to 
 disk before moving on to the next one; dd is simply writing into the page 
 cache. So you're not going to get 460 or even 273MB/s with sync 8k writes 
 regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat better 
 numbers -- that rados bench is giving you ~300IOPs on every OSD (with a 
 small pipeline!), and an SSD-based daemon should be going faster. What kind 
 of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some performance 
 tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6 
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520 
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in 
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of 
 performance to expect when using it as a back-end storage for virtual 
 machines mostly windows.  I would also like to try to understand how it 
 will scale IO by removing one disk of the three and doing the benchmark 
 tests.  But that is secondary.  So far here are my results.  I am aware 
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the default 
 block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 

Re: [ceph-users] Pause i/o from time to time

2013-09-17 Thread Timofey
I have examined logs.
Yes, first time it can be scrubbing. It repaired some self. 

I had 2 servers before first problem: one dedicated for osd (osd.0), and second 
- with osd and websites (osd.1).
After problem I add third server - dedicated for osd (osd.2) and call
ceph osd set out osd.1 for replace data.

In ceph -s i saw normal replacing process and all work good about 5-7 hours.
Then I have many misdirected records (few hundreds per second):
osd.0 [WRN] client.359671  misdirected client.359671.1:220843 pg 2.3ae744c0 to 
osd.0 not [2,0] in e1040/1040
and errors in i/o operations.

Now I have about 20GB ceph logs with this errors. (I don't work with cluster 
now - I copy out all data on hdd and work from hdd).

Is any way have local software raid1 with ceph rbd and local image (for work 
when ceph fail or work slow by any reason).
I tried mdadm but it work bad - server hang up every few hours.

 You could be suffering from a known, but unfixed issue [1] where spindle 
 contention from scrub and deep-scrub cause periodic stalls in RBD. You can 
 try to disable scrub and deep-scrub with:
 
 # ceph osd set noscrub
 # ceph osd set nodeep-scrub
 
 If your problem stops, Issue #6278 is likely the cause. To re-enable scrub 
 and deep-scrub:
 
 # ceph osd unset noscrub
 # ceph osd unset nodeep-scrub
 
 Because you seem to only have two OSDs, you may also be saturating your disks 
 even without scrub or deep-scrub.
 
 http://tracker.ceph.com/issues/6278
 
 Cheers,
 Mike Dawson
 
 
 On 9/16/2013 12:30 PM, Timofey wrote:
 I use ceph for HA-cluster.
 Some time ceph rbd go to have pause in work (stop i/o operations). Sometime 
 it can be when one of OSD slow response to requests. Sometime it can be my 
 mistake (xfs_freeze -f for one of OSD-drive).
 I have 2 storage servers with one osd on each. This pauses can be few 
 minutes.
 
 1. Is any settings for fast change primary osd if current osd work bad 
 (slow, don't response).
 2. Can I use ceph-rbd in software raid-array with local drive, for use local 
 drive instead of ceph if ceph cluster fail?
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Josh Durgin

Also enabling rbd writeback caching will allow requests to be merged,
which will help a lot for small sequential I/O.

On 09/17/2013 02:03 PM, Gregory Farnum wrote:

Try it with oflag=dsync instead? I'm curious what kind of variation
these disks will provide.

Anyway, you're not going to get the same kind of performance with
RADOS on 8k sync IO that you will with a local FS. It needs to
traverse the network and go through work queues in the daemon; your
primary limiter here is probably the per-request latency that you're
seeing (average ~30 ms, looking at the rados bench results). The good
news is that means you should be able to scale out to a lot of
clients, and if you don't force those 8k sync IOs (which RBD won't,
unless the application asks for them by itself using directIO or
frequent fsync or whatever) your performance will go way up.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta ja...@rubixnet.com wrote:


Here are the stats with direct io.

dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=direct
819200 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

These numbers are still over all much faster than when using RADOS bench.
The replica is set to 2.  The Journals are on the same disk but separate 
partitions.

I kept the block size the same 8K.




On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill 
bcampb...@axcess-financial.com wrote:


As Gregory mentioned, your 'dd' test looks to be reading from the cache (you 
are writing 8GB in, and then reading that 8GB out, so the reads are all cached 
reads) so the performance is going to seem good.  You can add the 
'oflag=direct' to your dd test to try and get a more accurate reading from that.

RADOS performance from what I've seen is largely going to hinge on replica size 
and journal location.  Are your journals on separate disks or on the same disk 
as the OSD?  What is the replica size of your pool?


From: Jason Villalta ja...@rubixnet.com
To: Bill Campbell bcampb...@axcess-financial.com
Cc: Gregory Farnum g...@inktank.com, ceph-users 
ceph-users@lists.ceph.com
Sent: Tuesday, September 17, 2013 11:31:43 AM

Subject: Re: [ceph-users] Ceph performance with 8K blocks.

Thanks for you feed back it is helpful.

I may have been wrong about the default windows block size.  What would be the 
best tests to compare native performance of the SSD disks at 4K blocks vs Ceph 
performance with 4K blocks?  It just seems their is a huge difference in the 
results.


On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
bcampb...@axcess-financial.com wrote:


Windows default (NTFS) is a 4k block.  Are you changing the allocation unit to 
8k as a default for your configuration?


From: Gregory Farnum g...@inktank.com
To: Jason Villalta ja...@rubixnet.com
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, September 17, 2013 10:40:09 AM
Subject: Re: [ceph-users] Ceph performance with 8K blocks.


Your 8k-block dd test is not nearly the same as your 8k-block rados bench or 
SQL tests. Both rados bench and SQL require the write to be committed to disk 
before moving on to the next one; dd is simply writing into the page cache. So 
you're not going to get 460 or even 273MB/s with sync 8k writes regardless of 
your settings.

However, I think you should be able to tune your OSDs into somewhat better 
numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small 
pipeline!), and an SSD-based daemon should be going faster. What kind of 
logging are you running with and what configs have you set?

Hopefully you can get Mark or Sam or somebody who's done some performance 
tuning to offer some tips as well. :)
-Greg

On Tuesday, September 17, 2013, Jason Villalta wrote:


Hello all,
I am new to the list.

I have a single machines setup for testing Ceph.  It has a dual proc 6 
cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520 240GB 
SSDs and an OSD setup on each disk with the OSD and Journal in separate 
partitions formatted with ext4.

My goal here is to prove just how fast Ceph can go and what kind of performance 
to expect when using it as a back-end storage for virtual machines mostly 
windows.  I would also like to try to understand how it will scale IO by 
removing one disk of the three and doing the benchmark tests.  But that is 
secondary.  So far here are my results.  I am aware this is all sequential, I 
just want to know how fast it can go.

DD IO test of SSD disks:  I am testing 8K blocks since that is the default 
block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

RADOS bench test with 3 SSD disks and 4MB object size(Default):
rados 

Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
Here are the stats with direct io.

dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=direct
819200 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s

dd if=ddbenchfile of=/dev/null bs=8K
819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

These numbers are still over all much faster than when using RADOS bench.
The replica is set to 2.  The Journals are on the same disk but separate
partitions.

I kept the block size the same 8K.




On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill 
bcampb...@axcess-financial.com wrote:

 As Gregory mentioned, your 'dd' test looks to be reading from the cache
 (you are writing 8GB in, and then reading that 8GB out, so the reads are
 all cached reads) so the performance is going to seem good.  You can add
 the 'oflag=direct' to your dd test to try and get a more accurate reading
 from that.

 RADOS performance from what I've seen is largely going to hinge on replica
 size and journal location.  Are your journals on separate disks or on the
 same disk as the OSD?  What is the replica size of your pool?

 --
 *From: *Jason Villalta ja...@rubixnet.com
 *To: *Bill Campbell bcampb...@axcess-financial.com
 *Cc: *Gregory Farnum g...@inktank.com, ceph-users 
 ceph-users@lists.ceph.com
 *Sent: *Tuesday, September 17, 2013 11:31:43 AM

 *Subject: *Re: [ceph-users] Ceph performance with 8K blocks.

 Thanks for you feed back it is helpful.

 I may have been wrong about the default windows block size.  What would be
 the best tests to compare native performance of the SSD disks at 4K blocks
 vs Ceph performance with 4K blocks?  It just seems their is a huge
 difference in the results.


 On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:

 Windows default (NTFS) is a 4k block.  Are you changing the allocation
 unit to 8k as a default for your configuration?

 --
 *From: *Gregory Farnum g...@inktank.com
 *To: *Jason Villalta ja...@rubixnet.com
 *Cc: *ceph-users@lists.ceph.com
 *Sent: *Tuesday, September 17, 2013 10:40:09 AM
 *Subject: *Re: [ceph-users] Ceph performance with 8K blocks.


 Your 8k-block dd test is not nearly the same as your 8k-block rados bench
 or SQL tests. Both rados bench and SQL require the write to be committed to
 disk before moving on to the next one; dd is simply writing into the page
 cache. So you're not going to get 460 or even 273MB/s with sync 8k
 writes regardless of your settings.

 However, I think you should be able to tune your OSDs into somewhat
 better numbers -- that rados bench is giving you ~300IOPs on every OSD
 (with a small pipeline!), and an SSD-based daemon should be going faster.
 What kind of logging are you running with and what configs have you set?

 Hopefully you can get Mark or Sam or somebody who's done some performance
 tuning to offer some tips as well. :)
 -Greg

 On Tuesday, September 17, 2013, Jason Villalta wrote:

 Hello all,
 I am new to the list.

 I have a single machines setup for testing Ceph.  It has a dual proc 6
 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.

 My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly windows.  I would also like to try to understand how it
 will scale IO by removing one disk of the three and doing the benchmark
 tests.  But that is secondary.  So far here are my results.  I am aware
 this is all sequential, I just want to know how fast it can go.

 DD IO test of SSD disks:  I am testing 8K blocks since that is the
 default block size of windows.
  dd of=ddbenchfile if=/dev/zero bs=8K count=100
 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s

 dd if=ddbenchfile of=/dev/null bs=8K
 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s

 RADOS bench test with 3 SSD disks and 4MB object size(Default):
 rados --no-cleanup bench -p pbench 30 write
 Total writes made:  2061
 Write size: 4194304
 Bandwidth (MB/sec): 273.004

 Stddev Bandwidth:   67.5237
 Max bandwidth (MB/sec): 352
 Min bandwidth (MB/sec): 0
 Average Latency:0.234199
 Stddev Latency: 0.130874
 Max latency:0.867119
 Min latency:0.039318
 -
 rados bench -p pbench 30 seq
 Total reads made: 2061
 Read size:4194304
 Bandwidth (MB/sec):956.466

 Average Latency:   0.0666347
 Max latency:   0.208986
 Min latency:   0.011625

 This all looks like I would expect from using three disks.  The problems
 appear to come with the 8K blocks/object size.

 RADOS bench test with 3 SSD disks and 8K object size(8K blocks):
 rados --no-cleanup bench -b 8192 -p pbench 30 write
 Total writes made:  13770
 Write size: 8192
 Bandwidth (MB/sec): 3.581

 Stddev Bandwidth:   1.04405
 

Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Jason Villalta
So what I am gleaming from this is it better to have more than 3 ODSs since
the OSD seems to add additional processing overhead when using small blocks.

I will try to do some more testing by using the same three disks but with 6
or more OSDs.

If the OSD has is limited by processing is it safe to say it would make
sense to just use SSD for the journal and a spindel disk for data and read.


On Tue, Sep 17, 2013 at 5:12 PM, Jason Villalta ja...@rubixnet.com wrote:

 Here are the results:

 dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=dsync
 819200 bytes (8.2 GB) copied, 266.873 s, 30.7 MB/s




 On Tue, Sep 17, 2013 at 5:03 PM, Gregory Farnum g...@inktank.com wrote:

 Try it with oflag=dsync instead? I'm curious what kind of variation
 these disks will provide.

 Anyway, you're not going to get the same kind of performance with
 RADOS on 8k sync IO that you will with a local FS. It needs to
 traverse the network and go through work queues in the daemon; your
 primary limiter here is probably the per-request latency that you're
 seeing (average ~30 ms, looking at the rados bench results). The good
 news is that means you should be able to scale out to a lot of
 clients, and if you don't force those 8k sync IOs (which RBD won't,
 unless the application asks for them by itself using directIO or
 frequent fsync or whatever) your performance will go way up.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Tue, Sep 17, 2013 at 1:47 PM, Jason Villalta ja...@rubixnet.com
 wrote:
 
  Here are the stats with direct io.
 
  dd of=ddbenchfile if=/dev/zero bs=8K count=100 oflag=direct
  819200 bytes (8.2 GB) copied, 68.4789 s, 120 MB/s
 
  dd if=ddbenchfile of=/dev/null bs=8K
  819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s
 
  These numbers are still over all much faster than when using RADOS
 bench.
  The replica is set to 2.  The Journals are on the same disk but
 separate partitions.
 
  I kept the block size the same 8K.
 
 
 
 
  On Tue, Sep 17, 2013 at 11:37 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:
 
  As Gregory mentioned, your 'dd' test looks to be reading from the
 cache (you are writing 8GB in, and then reading that 8GB out, so the reads
 are all cached reads) so the performance is going to seem good.  You can
 add the 'oflag=direct' to your dd test to try and get a more accurate
 reading from that.
 
  RADOS performance from what I've seen is largely going to hinge on
 replica size and journal location.  Are your journals on separate disks or
 on the same disk as the OSD?  What is the replica size of your pool?
 
  
  From: Jason Villalta ja...@rubixnet.com
  To: Bill Campbell bcampb...@axcess-financial.com
  Cc: Gregory Farnum g...@inktank.com, ceph-users 
 ceph-users@lists.ceph.com
  Sent: Tuesday, September 17, 2013 11:31:43 AM
 
  Subject: Re: [ceph-users] Ceph performance with 8K blocks.
 
  Thanks for you feed back it is helpful.
 
  I may have been wrong about the default windows block size.  What
 would be the best tests to compare native performance of the SSD disks at
 4K blocks vs Ceph performance with 4K blocks?  It just seems their is a
 huge difference in the results.
 
 
  On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill 
 bcampb...@axcess-financial.com wrote:
 
  Windows default (NTFS) is a 4k block.  Are you changing the
 allocation unit to 8k as a default for your configuration?
 
  
  From: Gregory Farnum g...@inktank.com
  To: Jason Villalta ja...@rubixnet.com
  Cc: ceph-users@lists.ceph.com
  Sent: Tuesday, September 17, 2013 10:40:09 AM
  Subject: Re: [ceph-users] Ceph performance with 8K blocks.
 
 
  Your 8k-block dd test is not nearly the same as your 8k-block rados
 bench or SQL tests. Both rados bench and SQL require the write to be
 committed to disk before moving on to the next one; dd is simply writing
 into the page cache. So you're not going to get 460 or even 273MB/s with
 sync 8k writes regardless of your settings.
 
  However, I think you should be able to tune your OSDs into somewhat
 better numbers -- that rados bench is giving you ~300IOPs on every OSD
 (with a small pipeline!), and an SSD-based daemon should be going faster.
 What kind of logging are you running with and what configs have you set?
 
  Hopefully you can get Mark or Sam or somebody who's done some
 performance tuning to offer some tips as well. :)
  -Greg
 
  On Tuesday, September 17, 2013, Jason Villalta wrote:
 
  Hello all,
  I am new to the list.
 
  I have a single machines setup for testing Ceph.  It has a dual proc
 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520
 240GB SSDs and an OSD setup on each disk with the OSD and Journal in
 separate partitions formatted with ext4.
 
  My goal here is to prove just how fast Ceph can go and what kind of
 performance to expect when using it as a back-end storage for virtual
 machines mostly 

[ceph-users] Scaling RBD module

2013-09-17 Thread Somnath Roy
Hi,
I am running Ceph on a 3 node cluster and each of my server node is running 10 
OSDs, one for each disk. I have one admin node and all the nodes are connected 
with 2 X 10G network. One network is for cluster and other one configured as 
public network.

Here is the status of my cluster.

~/fio_test# ceph -s

  cluster b2e0b4db-6342-490e-9c28-0aadf0188023
   health HEALTH_WARN clock skew detected on mon. server-name-2, mon. 
server-name-3
   monmap e1: 3 mons at {server-name-1=xxx.xxx.xxx.xxx:6789/0, 
server-name-2=xxx.xxx.xxx.xxx:6789/0, 
server-name-3=xxx.xxx.xxx.xxx:6789/0}, election epoch 64, quorum 0,1,2 
server-name-1,server-name-2,server-name-3
   osdmap e391: 30 osds: 30 up, 30 in
pgmap v5202: 30912 pgs: 30912 active+clean; 8494 MB data, 27912 MB used, 
11145 GB / 11172 GB avail
   mdsmap e1: 0/0/1 up


I started with rados bench command to benchmark the read performance of this 
Cluster on a large pool (~10K PGs) and found that each rados client has a 
limitation. Each client can only drive up to a certain mark. Each server  node 
cpu utilization shows it is  around 85-90% idle and the admin node (from where 
rados client is running) is around ~80-85% idle. I am trying with 4K object 
size.

Now, I started running more clients on the admin node and the performance is 
scaling till it hits the client cpu limit. Server still has the cpu of 30-35% 
idle. With small object size I must say that the ceph per osd cpu utilization 
is not promising!

After this, I started testing the rados block interface with kernel rbd module 
from my admin node.
I have created 8 images mapped on the pool having around 10K PGs and I am not 
able to scale up the performance by running fio (either by creating a software 
raid or running on individual /dev/rbd* instances). For example, running 
multiple fio instances (one in /dev/rbd1 and the other in /dev/rbd2)  the 
performance I am getting is half of what I am getting if running one instance. 
Here is my fio job script.

[random-reads]
ioengine=libaio
iodepth=32
filename=/dev/rbd1
rw=randread
bs=4k
direct=1
size=2G
numjobs=64

Let me know if I am following the proper procedure or not.

But, If my understanding is correct, kernel rbd module is acting as a client to 
the cluster and in one admin node I can run only one of such kernel instance.
If so, I am then limited to the client bottleneck that I stated earlier. The 
cpu utilization of the server side is around 85-90% idle, so, it is clear that 
client is not driving.

My question is, is there any way to hit the cluster  with more client from a 
single box while testing the rbd module ?

Appreciate, if anybody can help me on this.

Thanks  Regards
Somnath





PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com