Re: [ceph-users] How to see which crush tunables are active in a ceph-cluster?

2014-12-20 Thread Udo Lembke
Hi,
for information for other cepher...

I switched from unknown crush tunables to firefly and it's takes 6 hour
(30.853% degration) to finisched on our production-cluster (5 Nodes, 60
OSDs, 10GBE, 20% data used:  pgmap v35678572: 3904 pgs, 4 pools, 21947
GB data, 5489 kobjects).

Should an chooseleaf_vary_r 1 (from 0) take round about the same time
to finished??


Regards

Udo

On 04.12.2014 14:09, Udo Lembke wrote:
 Hi,
 to answer myself.

 With ceph osd crush show-tunables I see a little bit more, but doesn't
 know how far away from firefly-tunables I'm at the procuction cluster are.

 New testcluster with profile optimal:
 ceph osd crush show-tunables
 { choose_local_tries: 0,
   choose_local_fallback_tries: 0,
   choose_total_tries: 50,
   chooseleaf_descend_once: 1,
   profile: firefly,
   optimal_tunables: 1,
   legacy_tunables: 0,
   require_feature_tunables: 1,
   require_feature_tunables2: 1}

 the production cluster:
  ceph osd crush show-tunables
 { choose_local_tries: 0,
   choose_local_fallback_tries: 0,
   choose_total_tries: 50,
   chooseleaf_descend_once: 0,
   profile: unknown,
   optimal_tunables: 0,
   legacy_tunables: 0,
   require_feature_tunables: 1,
   require_feature_tunables2: 0}

 Look this like argonaut or bobtail?

 And how proceed to update?
 Does in makes sense first go to profile bobtail and then to firefly?


 Regards

 Udo

 Am 01.12.2014 17:39, schrieb Udo Lembke:
 Hi all,
 http://ceph.com/docs/master/rados/operations/crush-map/#crush-tunables
 described how to set the tunables to legacy, argonaut, bobtail, firefly
 or optimal.

 But how can I see, which profile is active in an ceph-cluster?

 With ceph osd getcrushmap I got not realy much info
 (only tunable choose_local_tries 0
 tunable choose_local_fallback_tries 0
 tunable choose_total_tries 50)


 Udo

 _
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph-deploy install and pinning on Ubuntu 14.04

2014-12-20 Thread Giuseppe Civitella
Hi all,

I'm using deph-deploy on Ubuntu 14.04. When I do a ceph-deploy install I
see packages getting installed from ubuntu repositories instead of ceph's
ones, am I missing something? Do I need to do some pinning on repositories?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to see which crush tunables are active in a ceph-cluster?

2014-12-20 Thread Udo Lembke
Hi Craig,
right! I had also post one mail in that thread.

My question was, if the whole step to chooseleaf_vary_r 1 take the
same amount of time like the setting tunables to firefly.

The funny thing: I just decompile the crushmap to start with
chooseleaf_vary_r 4 and see, that after upgrade tonight the
chooseleaf_vary_r  allready on 1!

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
...

ceph osd crush show-tunables -f json-pretty

{ choose_local_tries: 0,
  choose_local_fallback_tries: 0,
  choose_total_tries: 50,
  chooseleaf_descend_once: 1,
  profile: firefly,
  optimal_tunables: 1,
  legacy_tunables: 0,
  require_feature_tunables: 1,
  require_feature_tunables2: 1}


Udo

On 20.12.2014 17:53, Craig Lewis wrote:
 There was a tunables discussion on the ML a few months ago, with a lot
 of good suggestions.  Sage gave some suggestions on rolling out (and
 rolling back) chooseleaf_vary_r changes.  That reminds me... I
 intended to try those changes over the holidays...


 Found it; the subject was ceph osd crush tunables optimal AND add new
 OSD at the same time.


 On Sat, Dec 20, 2014 at 3:26 AM, Udo Lembke ulem...@polarzone.de
 mailto:ulem...@polarzone.de wrote:

 Hi,
 for information for other cepher...

 I switched from unknown crush tunables to firefly and it's takes 6
 hour
 (30.853% degration) to finisched on our production-cluster (5
 Nodes, 60
 OSDs, 10GBE, 20% data used:  pgmap v35678572: 3904 pgs, 4 pools, 21947
 GB data, 5489 kobjects).

 Should an chooseleaf_vary_r 1 (from 0) take round about the same
 time
 to finished??


 Regards

 Udo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have 2 different public networks

2014-12-20 Thread Alex Moore
Thought I'd share details of my setup as I am effectively achieving this 
(ie making monitors accessible over multiple interfaces) with IP routing 
as follows:


My Ceph hosts each have a /32 IP address on a loopback interface. And 
that is the IP address that all their Ceph daemons are bound to. In 
ceph.conf I do that by setting all the following values to the host's 
loopback IP: mon addr in the [mon.x] sections; cluster addr and 
public addr in the [mds.x] sections; and cluster addr, public 
addr, and osd heartbeat addr in the [osd.x] sections. Then I use IP 
routing to ensure that the hosts can all reach each other's loopback 
IPs, and that clients can reach them too, over the relevant networks. 
This also allows inter-host traffic to fail over to using alternate 
paths if the normal path is down for some reason.


To put this in context, my cluster is just a small 3-node cluster, and I 
have a pair of layer 3 switches, with networking arranged like this:


node1:

has an 8 Gbps point-to-point routed link to node2 (and uses this link to 
communicate with node2 under normal circumstances)
has an 8 Gbps point-to-point routed link to node3 (and uses this link to 
communicate with node3 under normal circumstances)
has a 2 Gbps routed link to the primary layer 3 switch (clients which 
are not other nodes in the cluster use this link to communicate with 
node1 under normal circumstances, and traffic to nodes 2 and 3 will 
switch to using this if their respective point-to-point links go down)
has a 1 Gbps routed link to the secondary layer 3 switch (this is not 
used under normal circumstances - only if the 2 Gbps link goes down)


node2:

has an 8 Gbps point-to-point routed link to node3 (and uses this link to 
communicate with node3 under normal circumstances)
has an 8 Gbps point-to-point routed link to node1 (and uses this link to 
communicate with node1 under normal circumstances)
has a 2 Gbps routed link to the primary layer 3 switch (clients which 
are not other nodes in the cluster use this link to communicate with 
node2 under normal circumstances, and traffic to nodes 3 and 1 will 
switch to using this if their respective point-to-point links go down)
has a 1 Gbps routed link to the secondary layer 3 switch (this is not 
used under normal circumstances - only if the 2 Gbps link goes down)


node3:

has an 8 Gbps point-to-point routed link to node2 (and uses this link to 
communicate with node1 under normal circumstances)
has an 8 Gbps point-to-point routed link to node3 (and uses this link to 
communicate with node2 under normal circumstances)
has a 2 Gbps routed link to the primary layer 3 switch (clients which 
are not other nodes in the cluster use this link to communicate with 
node3 under normal circumstances, and traffic to nodes 1 and 2 will 
switch to using this if their respective point-to-point links go down)
has a 1 Gbps routed link to the secondary layer 3 switch (this is not 
used under normal circumstances - only if the 2 Gbps link goes down)


I have avoided using a proper routing protocol for this, as the failover 
still works automatically when links go down even with static routes. 
Although I do also have scripts running on the hosts that detect when a 
device at the other end of a link is not pingable even though the link 
is up, and dynamically removes/inserts the routes as necessary in such a 
situation. But adapting this approach to a larger cluster where 
point-to-point links between all hosts isn't viable might well warrant 
use of a routing protocol.


The end result being that I have more control over where the different 
traffic goes, and it allows me to mess around with the networking 
without any effect on the cluster.


Alex

On 20/12/2014 5:23 AM, Craig Lewis wrote:



On Fri, Dec 19, 2014 at 6:19 PM, Francois Lafont flafdiv...@free.fr 
mailto:flafdiv...@free.fr wrote:



So, indeed, I have to use routing *or* maybe create 2 monitors
by server like this:

[mon.node1-public1]
host = ceph-node1
mon addr = 10.0.1.1

[mon.node1-public2]
host = ceph-node1
mon addr = 10.0.2.1

# etc...

But, in this case, the working directories of mon.node1-public1
and mon.node1-public2 will be in the same disk (I have no
choice). Is it a problem? Are monitors big consumers of I/O disk?


Interesting idea.  While you will have an even number of monitors, 
you'll still have an odd number of failure domains.  I'm not sure if 
it'll work though... make sure you test having the leader on both 
networks.  It might cause problems if the leader is on the 10.0.1.0/24 
http://10.0.1.0/24 network?


Monitors can be big consumers of disk IO, if there is a lot of cluster 
activity.  Monitors records all of the cluster changes in LevelDB, and 
send copies to all of the daemons.  There have been posts to the ML 
about people running out of Disk IOps on the monitors, and the 
problems it causes.  The bigger the cluster, the more