[ceph-users] ceph.com 403 forbidden

2014-09-01 Thread 박선규
Hi all.

When connect to ceph.com , I got message 403 forbidden.
If I using US-proxy server , it's work well.
Please solve this problem.

Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-01 Thread Somnath Roy
As I said, 107K with IOs serving from memory, not hitting the disk..

From: Jian Zhang [mailto:amberzhan...@gmail.com]
Sent: Sunday, August 31, 2014 8:54 PM
To: Somnath Roy
Cc: Haomai Wang; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS

Somnath,
on the small workload performance, 107k is higher than the theoretical IOPS of 
520, any idea why?



Single client is ~14K iops, but scaling as number of clients increases. 10 
clients ~107K iops. ~25 cpu cores are used.

2014-09-01 11:52 GMT+08:00 Jian Zhang 
amberzhan...@gmail.commailto:amberzhan...@gmail.com:
Somnath,
on the small workload performance,


2014-08-29 14:37 GMT+08:00 Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com:


Thanks Haomai !

Here is some of the data from my setup.



--

Set up:





32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) - one 
OSD. 5 client m/c with 12 core cpu and each running two instances of 
ceph_smalliobench (10 clients total). Network is 10GbE.



Workload:

-



Small workload – 20K objects with 4K size and io_size is also 4K RR. The intent 
is to serve the ios from memory so that it can uncover the performance problems 
within single OSD.



Results from Firefly:

--



Single client throughput is ~14K iops, but as the number of client increases 
the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu 
cores are used.



Result with latest master:

--



Single client is ~14K iops, but scaling as number of clients increases. 10 
clients ~107K iops. ~25 cpu cores are used.



--





More realistic workload:

-

Let’s see how it is performing while  90% of the ios are served from disks

Setup:

---

40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 
SSDs - 8 OSDs. One similar node for monitor and rgw. Another node for client 
running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 GbE network



Workload:





8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.



Results from Firefly:





Aggregated output while 4 rbd clients stressing the cluster in parallel is 
~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember precisely)



Results from latest master:





Aggregated output while 4 rbd clients stressing the cluster in parallel is 
~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.



Hope this helps.



Thanks  Regards

Somnath


-Original Message-
From: Haomai Wang [mailto:haomaiw...@gmail.commailto:haomaiw...@gmail.com]
Sent: Thursday, August 28, 2014 8:01 PM
To: Somnath Roy
Cc: Andrey Korolyov; ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS



Hi Roy,



I already scan your merged codes about fdcache and optimizing for 
lfn_find/lfn_open, could you give some performance improvement data about it? 
I fully agree with your orientation, do you have any update about it?



As for messenger level, I have some very early works on 
it(https://github.com/yuyuyu101/ceph/tree/msg-event), it contains a new 
messenger implementation which support different event mechanism.

It looks like at least one more week to make it work.



On Fri, Aug 29, 2014 at 5:48 AM, Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com wrote:

 Yes, what I saw the messenger level bottleneck is still huge !

 Hopefully RDMA messenger will resolve that and the performance gain will be 
 significant for Read (on SSDs). For write we need to uncover the OSD 
 bottlenecks first to take advantage of the improved upstream.

 What I experienced that till you remove the very last bottleneck the 
 performance improvement will not be visible and that could be confusing 
 because you might think that the upstream improvement you did is not valid 
 (which is not).



 Thanks  Regards

 Somnath

 -Original Message-

 From: Andrey Korolyov [mailto:and...@xdel.ru]

 Sent: Thursday, August 28, 2014 12:57 PM

 To: Somnath Roy

 Cc: David Moreau Simard; Mark Nelson; 
 ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

 Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go

 over 3, 2K IOPS



 On Thu, Aug 28, 2014 at 10:48 PM, Somnath Roy 
 somnath@sandisk.commailto:somnath@sandisk.com wrote:

 Nope, this will not be back ported to Firefly I 

[ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-01 Thread Kenneth Waegeman

Hi,

I reinstalled the cluster with 0.84, and tried again running rados  
bench on a EC coded pool on keyvaluestore.

Nothing crashed this time, but when I check the status:

 health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too  
few pgs per osd (15  min 20)
 monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003

 osdmap e174: 78 osds: 78 up, 78 in
  pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects
1753 GB used, 129 TB / 131 TB avail
1088 active+clean
 128 active+clean+inconsistent

the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the  
others are on Filestore)


The only thing I can see in the logs is that after the rados tests, it  
start scrubbing, and for each KV pg I get something like this:


2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR]  
2.3s0 scrub stat mismatch, got 28164/29291 objects, 0/0 clones,  
28164/29291 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,  
118128377856/122855358464 bytes.


What could here be the problem?
Thanks again!!

Kenneth


- Message from Haomai Wang haomaiw...@gmail.com -
   Date: Tue, 26 Aug 2014 17:11:43 +0800
   From: Haomai Wang haomaiw...@gmail.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
 To: Kenneth Waegeman kenneth.waege...@ugent.be
 Cc: ceph-users@lists.ceph.com



Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223).

Sorry for the late message, I forget that this fix is merged into 0.84.

Thanks for your patient :-)

On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman
kenneth.waege...@ugent.be wrote:


Hi,

In the meantime I already tried with upgrading the cluster to 0.84, to see
if that made a difference, and it seems it does.
I can't reproduce the crashing osds by doing a 'rados -p ecdata ls' anymore.

But now the cluster detect it is inconsistent:

  cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
   health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few pgs
per osd (4  min 20); mon.ceph002 low disk space
   monmap e3: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
   mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
   osdmap e145384: 78 osds: 78 up, 78 in
pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
  1502 GB used, 129 TB / 131 TB avail
   279 active+clean
40 active+clean+inconsistent
 1 active+clean+scrubbing+deep


I tried to do ceph pg repair for all the inconsistent pgs:

  cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
   health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub errors;
too few pgs per osd (4  min 20); mon.ceph002 low disk space
   monmap e3: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
   mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
   osdmap e146452: 78 osds: 78 up, 78 in
pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
  1503 GB used, 129 TB / 131 TB avail
   279 active+clean
39 active+clean+inconsistent
 1 active+clean+scrubbing+deep
 1 active+clean+scrubbing+deep+inconsistent+repair

I let it recovering through the night, but this morning the mons were all
gone, nothing to see in the log files.. The osds were all still up!

cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
 health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub errors;
too few pgs per osd (4  min 20)
 monmap e7: 3 mons at
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0},
election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003
 mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3 up:standby
 osdmap e203410: 78 osds: 78 up, 78 in
  pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects
1547 GB used, 129 TB / 131 TB avail
   1 active+clean+scrubbing+deep+inconsistent+repair
 284 active+clean
  35 active+clean+inconsistent

I restarted the monitors now, I will let you know when I see something
more..




- Message from Haomai Wang haomaiw...@gmail.com -
 Date: Sun, 24 Aug 2014 12:51:41 +0800

 From: Haomai Wang haomaiw...@gmail.com
Subject: Re: [ceph-users] ceph cluster inconsistency?
   To: Kenneth Waegeman kenneth.waege...@ugent.be,
ceph-users@lists.ceph.com



It's really strange! I write a test program according the key ordering
you provided and parse the corresponding value. It's true!

I have no idea now. If free, 

[ceph-users] Paris Ceph meetup : september 18th, 2014

2014-09-01 Thread Loic Dachary
Hi Ceph,

The next Paris Ceph meetup is scheduled immediately after the Ceph day.

   http://www.meetup.com/Ceph-in-Paris/events/204412892/

I'll be there and hope to discuss the Giant features on this occasion :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-community] Paris Ceph meetup : september 18th, 2014

2014-09-01 Thread David Moreau Simard
This reminds me that we should also schedule some sort of meetup during
the Openstack summit which is also in Paris !

-- 
David Moreau Simard



Le 2014-09-01, 8:06 AM, « Loic Dachary » l...@dachary.org a écrit :

Hi Ceph,

The next Paris Ceph meetup is scheduled immediately after the Ceph day.

   http://www.meetup.com/Ceph-in-Paris/events/204412892/

I'll be there and hope to discuss the Giant features on this occasion :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

___
Ceph-community mailing list
ceph-commun...@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-01 Thread Haomai Wang
Hmm, could you please list your instructions including cluster existing
time and all relevant ops? I want to reproduce it.


On Mon, Sep 1, 2014 at 4:45 PM, Kenneth Waegeman kenneth.waege...@ugent.be
wrote:

 Hi,

 I reinstalled the cluster with 0.84, and tried again running rados bench
 on a EC coded pool on keyvaluestore.
 Nothing crashed this time, but when I check the status:

  health HEALTH_ERR 128 pgs inconsistent; 128 scrub errors; too few pgs
 per osd (15  min 20)
  monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0,
 ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch
 8, quorum 0,1,2 ceph001,ceph002,ceph003
  osdmap e174: 78 osds: 78 up, 78 in
   pgmap v147680: 1216 pgs, 3 pools, 14758 GB data, 3690 kobjects
 1753 GB used, 129 TB / 131 TB avail
 1088 active+clean
  128 active+clean+inconsistent

 the 128 inconsistent pgs are ALL the pgs of the EC KV store ( the others
 are on Filestore)

 The only thing I can see in the logs is that after the rados tests, it
 start scrubbing, and for each KV pg I get something like this:

 2014-08-31 11:14:09.050747 osd.11 10.141.8.180:6833/61098 4 : [ERR] 2.3s0
 scrub stat mismatch, got 28164/29291 objects, 0/0 clones, 28164/29291
 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
 118128377856/122855358464 bytes.

 What could here be the problem?
 Thanks again!!

 Kenneth


 - Message from Haomai Wang haomaiw...@gmail.com -
Date: Tue, 26 Aug 2014 17:11:43 +0800
From: Haomai Wang haomaiw...@gmail.com
 Subject: Re: [ceph-users] ceph cluster inconsistency?
  To: Kenneth Waegeman kenneth.waege...@ugent.be
  Cc: ceph-users@lists.ceph.com


  Hmm, it looks like you hit this bug(http://tracker.ceph.com/issues/9223).

 Sorry for the late message, I forget that this fix is merged into 0.84.

 Thanks for your patient :-)

 On Tue, Aug 26, 2014 at 4:39 PM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:


 Hi,

 In the meantime I already tried with upgrading the cluster to 0.84, to
 see
 if that made a difference, and it seems it does.
 I can't reproduce the crashing osds by doing a 'rados -p ecdata ls'
 anymore.

 But now the cluster detect it is inconsistent:

   cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
health HEALTH_ERR 40 pgs inconsistent; 40 scrub errors; too few
 pgs
 per osd (4  min 20); mon.ceph002 low disk space
monmap e3: 3 mons at
 {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
 ceph003=10.141.8.182:6789/0},
 election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
mdsmap e78951: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
 up:standby
osdmap e145384: 78 osds: 78 up, 78 in
 pgmap v247095: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
   1502 GB used, 129 TB / 131 TB avail
279 active+clean
 40 active+clean+inconsistent
  1 active+clean+scrubbing+deep


 I tried to do ceph pg repair for all the inconsistent pgs:

   cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
health HEALTH_ERR 40 pgs inconsistent; 1 pgs repair; 40 scrub
 errors;
 too few pgs per osd (4  min 20); mon.ceph002 low disk space
monmap e3: 3 mons at
 {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
 ceph003=10.141.8.182:6789/0},
 election epoch 30, quorum 0,1,2 ceph001,ceph002,ceph003
mdsmap e79486: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
 up:standby
osdmap e146452: 78 osds: 78 up, 78 in
 pgmap v248520: 320 pgs, 4 pools, 15366 GB data, 3841 kobjects
   1503 GB used, 129 TB / 131 TB avail
279 active+clean
 39 active+clean+inconsistent
  1 active+clean+scrubbing+deep
  1 active+clean+scrubbing+deep+inconsistent+repair

 I let it recovering through the night, but this morning the mons were all
 gone, nothing to see in the log files.. The osds were all still up!

 cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
  health HEALTH_ERR 36 pgs inconsistent; 1 pgs repair; 36 scrub
 errors;
 too few pgs per osd (4  min 20)
  monmap e7: 3 mons at
 {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,
 ceph003=10.141.8.182:6789/0},
 election epoch 44, quorum 0,1,2 ceph001,ceph002,ceph003
  mdsmap e109481: 1/1/1 up {0=ceph003.cubone.os=up:active}, 3
 up:standby
  osdmap e203410: 78 osds: 78 up, 78 in
   pgmap v331747: 320 pgs, 4 pools, 15251 GB data, 3812 kobjects
 1547 GB used, 129 TB / 131 TB avail
1 active+clean+scrubbing+deep+inconsistent+repair
  284 active+clean
   35 active+clean+inconsistent

 I restarted the monitors now, I will let you know when I see something
 more..




 - Message from Haomai Wang haomaiw...@gmail.com -
  Date: Sun, 24 Aug 2014 12:51:41 +0800

  From: Haomai Wang haomaiw...@gmail.com
 Subject: 

Re: [ceph-users] ceph cluster inconsistency keyvaluestore

2014-09-01 Thread Kenneth Waegeman

Hi,


The cluster got installed with quattor, which uses ceph-deploy for  
installation of daemons, writes the config file and installs the  
crushmap.
I have 3 hosts, each 12 disks, having a large KV partition (3.6T) for  
the ECdata pool and a small cache partition (50G) for the cache


I manually did this:

ceph osd pool create cache 1024 1024
ceph osd pool set cache size 2
ceph osd pool set cache min_size 1
ceph osd erasure-code-profile set profile11 k=8 m=3 ruleset-failure-domain=osd
ceph osd pool create ecdata 128 128 erasure profile11
ceph osd tier add ecdata cache
ceph osd tier cache-mode cache writeback
ceph osd tier set-overlay ecdata cache
ceph osd pool set cache hit_set_type bloom
ceph osd pool set cache hit_set_count 1
ceph osd pool set cache hit_set_period 3600
ceph osd pool set cache target_max_bytes $((280*1024*1024*1024))

(But the previous time I had the problem already without the cache part)



Cluster live since 2014-08-29 15:34:16

Config file on host ceph001:

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.143.8.0/24
filestore_xattr_use_omap = 1
fsid = 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
mon_cluster_log_to_syslog = 1
mon_host = ceph001.cubone.os, ceph002.cubone.os, ceph003.cubone.os
mon_initial_members = ceph001, ceph002, ceph003
osd_crush_update_on_start = 0
osd_journal_size = 10240
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 512
osd_pool_default_pgp_num = 512
osd_pool_default_size = 3
public_network = 10.141.8.0/24

[osd.11]
osd_objectstore = keyvaluestore-dev

[osd.13]
osd_objectstore = keyvaluestore-dev

[osd.15]
osd_objectstore = keyvaluestore-dev

[osd.17]
osd_objectstore = keyvaluestore-dev

[osd.19]
osd_objectstore = keyvaluestore-dev

[osd.21]
osd_objectstore = keyvaluestore-dev

[osd.23]
osd_objectstore = keyvaluestore-dev

[osd.25]
osd_objectstore = keyvaluestore-dev

[osd.3]
osd_objectstore = keyvaluestore-dev

[osd.5]
osd_objectstore = keyvaluestore-dev

[osd.7]
osd_objectstore = keyvaluestore-dev

[osd.9]
osd_objectstore = keyvaluestore-dev


OSDs:
# idweight  type name   up/down reweight
-12 140.6   root default-cache
-9  46.87   host ceph001-cache
2   3.906   osd.2   up  1
4   3.906   osd.4   up  1
6   3.906   osd.6   up  1
8   3.906   osd.8   up  1
10  3.906   osd.10  up  1
12  3.906   osd.12  up  1
14  3.906   osd.14  up  1
16  3.906   osd.16  up  1
18  3.906   osd.18  up  1
20  3.906   osd.20  up  1
22  3.906   osd.22  up  1
24  3.906   osd.24  up  1
-10 46.87   host ceph002-cache
28  3.906   osd.28  up  1
30  3.906   osd.30  up  1
32  3.906   osd.32  up  1
34  3.906   osd.34  up  1
36  3.906   osd.36  up  1
38  3.906   osd.38  up  1
40  3.906   osd.40  up  1
42  3.906   osd.42  up  1
44  3.906   osd.44  up  1
46  3.906   osd.46  up  1
48  3.906   osd.48  up  1
50  3.906   osd.50  up  1
-11 46.87   host ceph003-cache
54  3.906   osd.54  up  1
56  3.906   osd.56  up  1
58  3.906   osd.58  up  1
60  3.906   osd.60  up  1
62  3.906   osd.62  up  1
64  3.906   osd.64  up  1
66  3.906   osd.66  up  1
68  3.906   osd.68  up  1
70  3.906   osd.70  up  1
72  3.906   osd.72  up  1
74  3.906   osd.74  up  1
76  3.906   osd.76  up  1
-8  140.6   root default-ec
-5  46.87   host ceph001-ec
3   3.906   osd.3   up  1
5   3.906   osd.5   up  1
7   3.906   osd.7   up  1
9   3.906   osd.9   up  1
11  3.906   osd.11  up  1
13  3.906   osd.13  up  1
15  3.906   osd.15  up  1
17  3.906   osd.17  up  1
19  3.906   osd.19  up  1
21  3.906   osd.21  up  1
23  3.906   osd.23  up  1
25  3.906   osd.25  up  1
-6  46.87   host ceph002-ec
29  3.906   osd.29  up  1
31  3.906   osd.31  up  1
33  3.906   osd.33  up  1
35  3.906 

[ceph-users] kvm guest with rbd-disks are unaccesible after app. 3h afterwards one OSD node fails

2014-09-01 Thread Udo Lembke
Hi list,
on the weekend one of five OSD-nodes fails (hung with kernel panic).
The cluster degraded (12 of 60 osds), but from our monitoring-host the
noout-flag is set in this case.

But around three hours later the kvm-guest, which used storage on the
ceph cluster (and use writes) are unaccessible. After restarting the
failed ceph-node the ceph-cluster are healthy again, but the VMs need to
be restartet to work again.

In the ceph.config I had defined osd_pool_default_min_size = 1
therefore I don't understand why this happens.
Which parameter must be changed/set so that the kvm-clients still
working on the unhealthy cluster?

Ceph-version is  0.72.2 - pool replication 2.


Thanks for a hint.

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] I fail to add a monitor in a ceph cluster

2014-09-01 Thread Pascal GREGIS
Hello,

I am currently testing ceph to make a replicated block device for a project 
that would involve 2 data servers accessing this block device, so that if one 
fails or crashes, the data can still be used and the cluster can be rebuilt.

This project requires that both machines run an OSD and a monitor, and that a 
3rd monitor is run somewhere else, so that there is not a single point of 
failure.
I know it is not the best thing to run an OSD and a monitor on the same 
machine, but I cannot really find a better solution.

My problem is that, after having read several times and followed the 
documentation, I cannot succeed to add a second monitor.

I have bootstrapped a first monitor, added 2 OSDs (one on the machine with the 
monitor, one on the other), and I try to add a second monitor but it doesn't 
work.
I think I misunderstood something.

Here's what I did :

On the first machine named grenier:
# setup the configuration file /etc/ceph/ceph.conf (see content further)
# bootstrap monitor:
$ ceph-authtool --create-keyring /var/tmp/ceph.mon.keyring --gen-key -n mon. 
--cap mon 'allow *' 
$ sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring 
--gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' 
--cap mds 'allow'
$ sudo chown myuser /etc/ceph/ceph.client.admin.keyring
$ ceph-authtool /var/tmp/ceph.mon.keyring --import-keyring 
/etc/ceph/ceph.client.admin.keyring
$ monmaptool --create --add grenier 172.16.1.11 --fsid $monuuid $tmp/monmap
$ sudo mkdir -p /var/lib/ceph/mon/ceph-grenier
$ sudo chown $ID -R /var/lib/ceph/mon/ceph-grenier
$ ceph-mon --mkfs -i grenier --monmap /var/tmp/monmap --keyring 
/var/tmp/ceph.mon.keyring
# start monitor:
$ sudo start ceph-mon id=grenier
# add OSD:
$ sudo ceph osd create $osduuid
$ sudo mkdir -p /var/lib/ceph/osd/ceph-0
$ sudo ceph-osd -i 0 --mkfs --mkkey --osd-uuid $osduuid
$ sudo ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i 
/var/lib/ceph/osd/ceph-0/keyring
$ ceph osd crush add-bucket grenier host
$ ceph osd crush move grenier root=default
$ ceph osd crush add osd.0 1.0 host=grenier
# start this OSD
$ sudo ceph-osd -i 0

# copy /etc/ceph/ceph.conf, /etc/ceph/ceph.client.admin.keyring, 
/var/tmp/ceph/ceph.mon.keyring and /var/tmp/ceph/monmap from grenier to second 
node named gail:
# add and start OSD on the second node
$ sudo ceph osd create $newosduuid
$ sudo mkdir -p /var/lib/ceph/osd/ceph-1
$ sudo ceph-osd -i 1 --mkfs --mkkey --osd-uuid $newosduuid
$ sudo ceph auth add osd.1 osd 'allow *' mon 'allow profile osd' -i 
/var/lib/ceph/osd/ceph-1/keyring
$ ceph osd crush add-bucket gail host
$ ceph osd crush move gail root=default
$ ceph osd crush add osd.1 1.0 host=gail
# start this OSD
$ sudo ceph-osd -i 1

There, everything works correctly, I can create and map a block device, and 
then write on it and the data is replicated on both nodes.
When I perform a ceph -s I get :
cluster a98faf65-b105-4ec7-913c-f8a33a4db4d1
 health HEALTH_OK
 monmap e1: 1 mons at {grenier=172.16.1.11:6789/0}, election epoch 2, 
quorum 0 grenier
 osdmap e13: 2 osds: 2 up, 2 in
  pgmap v47: 192 pgs, 3 pools, 0 bytes data, 0 objects
18400 MB used,
 105 GB / 129 GB avail 192 active+clean

And here is what I do when trying to add a second monitor on gail:
$ sudo mkdir -p /var/lib/ceph/mon/ceph-gail
$ ceph mon getmap -o /var/tmp/monmap
$ sudo ceph-mon -i gail --mkfs --monmap /var/tmp/monmap --keyring 
/var/tmp/ceph.mon.keyring
  which prints:
ceph-mon: set fsid to a98faf65-b105-4ec7-913c-f8a33a4db4d1
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-gail for mon.gail
  which seems correct (same uuid as in ceph.conf)
$ sudo ceph-mon add gail 172.16.1.12
  This command prints:
2014-09-01 17:07:26.033688 7f5538ada700  0 monclient: hunting for new mon
  and hangs

Then I would like to do this:
$ sudo ceph-mon -i gail --public-addr 172.16.1.12
  but it is useless as the previous command failed.


Would anybody guess what I am doing wrong ?

I use ceph 0.80 on an Ubuntu trusty.
My ceph.conf is as follows :
[global]
  fsid = a98faf65-b105-4ec7-913c-f8a33a4db4d1
  mon initial members = grenier
  mon host = 172.16.1.11
  public network = 172.16.0.0/16
  auth cluster required = none
  auth service required = none
  auth client required = none
  osd journal size = 1024
  filestore xattr use omap = true
  osd pool default size = 2
  osd pool default min size = 1
  osd pool default pg num = 333
  osd pool default pgp num = 333
  osd crush chooseleaf type = 1
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-01 Thread Sebastien Han
Mark and all, Ceph IOPS performance has definitely improved with Giant.
With this version: ceph version 0.84-940-g3215c52 
(3215c520e1306f50d0094b5646636c02456c9df4) on Debian 7.6 with Kernel 3.14-0.

I got 6340 IOPS on a single OSD SSD. (journal and data on the same partition).
So basically twice the amount of IOPS that I was getting with Firefly.

Rand reads 4k went from 12431 to 10201, so I’m a bit disappointed here.

The SSD is still under-utilised:

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdp1  0.00   540.370.00 5902.30 0.0047.1416.36 
0.870.150.000.15   0.07  40.15
sdp2  0.00 0.000.00 4454.67 0.0049.1622.60 
0.310.070.000.07   0.07  30.61

Thanks a ton for all your comments and assistance guys :).

One last question for Sage (or other that might know), what’s the status of the 
S2FS implementation? (or maybe we are waiting for S2FS to provide atomic 
transactions?)
I tried to run the OSD on f2fs however ceph-osd mkfs got stuck on a xattr test:

fremovexattr(10, user.test@5848273)   = 0

On 01 Sep 2014, at 11:13, Sebastien Han sebastien@enovance.com wrote:

 Mark, thanks a lot for experimenting this for me.
 I’m gonna try master soon and will tell you how much I can get. 
 
 It’s interesting to see that using 2 SSDs brings up more performance, even 
 both SSDs are under-utilized…
 They should be able to sustain both loads at the same time (journal and osd 
 data).
 
 On 01 Sep 2014, at 09:51, Somnath Roy somnath@sandisk.com wrote:
 
 As I said, 107K with IOs serving from memory, not hitting the disk..
 
 From: Jian Zhang [mailto:amberzhan...@gmail.com] 
 Sent: Sunday, August 31, 2014 8:54 PM
 To: Somnath Roy
 Cc: Haomai Wang; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 
 2K IOPS
 
 Somnath,
 on the small workload performance, 107k is higher than the theoretical IOPS 
 of 520, any idea why? 
 
 
 
 Single client is ~14K iops, but scaling as number of clients increases. 10 
 clients ~107K iops. ~25 cpu cores are used.
 
 
 2014-09-01 11:52 GMT+08:00 Jian Zhang amberzhan...@gmail.com:
 Somnath,
 on the small workload performance, 
 
 
 
 2014-08-29 14:37 GMT+08:00 Somnath Roy somnath@sandisk.com:
 
 Thanks Haomai !
 
 Here is some of the data from my setup.
 
 
 
 --
 
 Set up:
 
 
 
 
 
 32 core cpu with HT enabled, 128 GB RAM, one SSD (both journal and data) - 
 one OSD. 5 client m/c with 12 core cpu and each running two instances of 
 ceph_smalliobench (10 clients total). Network is 10GbE.
 
 
 
 Workload:
 
 -
 
 
 
 Small workload – 20K objects with 4K size and io_size is also 4K RR. The 
 intent is to serve the ios from memory so that it can uncover the 
 performance problems within single OSD.
 
 
 
 Results from Firefly:
 
 --
 
 
 
 Single client throughput is ~14K iops, but as the number of client increases 
 the aggregated throughput is not increasing. 10 clients ~15K iops. ~9-10 cpu 
 cores are used.
 
 
 
 Result with latest master:
 
 --
 
 
 
 Single client is ~14K iops, but scaling as number of clients increases. 10 
 clients ~107K iops. ~25 cpu cores are used.
 
 
 
 --
 
 
 
 
 
 More realistic workload:
 
 -
 
 Let’s see how it is performing while  90% of the ios are served from disks
 
 Setup:
 
 ---
 
 40 cpu core server as a cluster node (single node cluster) with 64 GB RAM. 8 
 SSDs - 8 OSDs. One similar node for monitor and rgw. Another node for 
 client running fio/vdbench. 4 rbds are configured with ‘noshare’ option. 40 
 GbE network
 
 
 
 Workload:
 
 
 
 
 
 8 SSDs are populated , so, 8 * 800GB = ~6.4 TB of data.  Io_size = 4K RR.
 
 
 
 Results from Firefly:
 
 
 
 
 
 Aggregated output while 4 rbd clients stressing the cluster in parallel is 
 ~20-25K IOPS , cpu cores used ~8-10 cores (may be less can’t remember 
 precisely)
 
 
 
 Results from latest master:
 
 
 
 
 
 Aggregated output while 4 rbd clients stressing the cluster in parallel is 
 ~120K IOPS , cpu is 7% idle i.e  ~37-38 cpu cores.
 
 
 
 Hope this helps.
 
 
 
 Thanks  Regards
 
 Somnath
 
 
 
 -Original Message-
 From: Haomai Wang [mailto:haomaiw...@gmail.com] 
 Sent: Thursday, August 28, 2014 8:01 PM
 To: Somnath Roy
 Cc: Andrey Korolyov; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] 

[ceph-users] Questions regarding Crush Map

2014-09-01 Thread Jakes John
Hi,
   I have some general questions regarding the crush map. It would be
helpful if someone can help me out by clarifying them.

1.  I saw that a bucket 'host' is always created for the crush maps which
are automatically generated by ceph. If I am manually creating crushmap,
 do I need to always add a bucket called ' host' ? As I was looking through
the source code, I didn't see any need for this. If not necessary, can
osd's of the same host be split into mulitple buckets?

eg : Say host 1 has four osd's- osd.0,osd.1,osd.2, osd.3
host 2 has four osd's-
osd.4,osd.5,osd.6,osd.7

and create two buckets -

HostGroup bucket1- {osd.0, osd.1,osd.4,osd.5}
HostGroup bucket2-{osd.2,osd.3,osd.6,osd.7} where HostGroup is new bucket
type instead of the default 'host' type.


Is this configuration possible or invalid? If this is possible, I can group
SSD's of all hosts into 1 bucket and HDD's into other.

2. I have read in Ceph docs that same osd is not advised to be part of two
buckets(two pools). Is there any reason for it? But,I couldn't find this
limitation in the source code.


eg:osd.0 is in bucket1 and bucket2.

Is this configuration possible or invalid? If this is possible, I have the
flexibility to have group data which are written to different pools.

3. Is it possible to exclude or include a particular osd/host/rack in the
crush mapping?.

eg: I need to have third replica always in rack3 (a specified row/rack/host
based on requirements) . First two can be chosen randomly

If possible, how can I configure it?


4. It is said that osd weights must be configured based on the storage. Say
if I have SSD of 512 GB and HDD of 1 TB and if I configure .5 and 1
respectively, am I treating both SSD and HDD equally? How do I prioritize
SSD over HDD?

5. Continuing from 4), If i have mix of SSD's and HDD's in the  same host,
what are the best ways possible to utilize the SSD capabilities in the ceph
cluster?


Looking forward to your help,

Thanks,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com