Re: [ceph-users] How to see which crush tunables are active in a ceph-cluster?
Hi, for information for other cepher... I switched from unknown crush tunables to firefly and it's takes 6 hour (30.853% degration) to finisched on our production-cluster (5 Nodes, 60 OSDs, 10GBE, 20% data used: pgmap v35678572: 3904 pgs, 4 pools, 21947 GB data, 5489 kobjects). Should an chooseleaf_vary_r 1 (from 0) take round about the same time to finished?? Regards Udo On 04.12.2014 14:09, Udo Lembke wrote: Hi, to answer myself. With ceph osd crush show-tunables I see a little bit more, but doesn't know how far away from firefly-tunables I'm at the procuction cluster are. New testcluster with profile optimal: ceph osd crush show-tunables { choose_local_tries: 0, choose_local_fallback_tries: 0, choose_total_tries: 50, chooseleaf_descend_once: 1, profile: firefly, optimal_tunables: 1, legacy_tunables: 0, require_feature_tunables: 1, require_feature_tunables2: 1} the production cluster: ceph osd crush show-tunables { choose_local_tries: 0, choose_local_fallback_tries: 0, choose_total_tries: 50, chooseleaf_descend_once: 0, profile: unknown, optimal_tunables: 0, legacy_tunables: 0, require_feature_tunables: 1, require_feature_tunables2: 0} Look this like argonaut or bobtail? And how proceed to update? Does in makes sense first go to profile bobtail and then to firefly? Regards Udo Am 01.12.2014 17:39, schrieb Udo Lembke: Hi all, http://ceph.com/docs/master/rados/operations/crush-map/#crush-tunables described how to set the tunables to legacy, argonaut, bobtail, firefly or optimal. But how can I see, which profile is active in an ceph-cluster? With ceph osd getcrushmap I got not realy much info (only tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50) Udo _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph-deploy install and pinning on Ubuntu 14.04
Hi all, I'm using deph-deploy on Ubuntu 14.04. When I do a ceph-deploy install I see packages getting installed from ubuntu repositories instead of ceph's ones, am I missing something? Do I need to do some pinning on repositories? Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to see which crush tunables are active in a ceph-cluster?
Hi Craig, right! I had also post one mail in that thread. My question was, if the whole step to chooseleaf_vary_r 1 take the same amount of time like the setting tunables to firefly. The funny thing: I just decompile the crushmap to start with chooseleaf_vary_r 4 and see, that after upgrade tonight the chooseleaf_vary_r allready on 1! # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 ... ceph osd crush show-tunables -f json-pretty { choose_local_tries: 0, choose_local_fallback_tries: 0, choose_total_tries: 50, chooseleaf_descend_once: 1, profile: firefly, optimal_tunables: 1, legacy_tunables: 0, require_feature_tunables: 1, require_feature_tunables2: 1} Udo On 20.12.2014 17:53, Craig Lewis wrote: There was a tunables discussion on the ML a few months ago, with a lot of good suggestions. Sage gave some suggestions on rolling out (and rolling back) chooseleaf_vary_r changes. That reminds me... I intended to try those changes over the holidays... Found it; the subject was ceph osd crush tunables optimal AND add new OSD at the same time. On Sat, Dec 20, 2014 at 3:26 AM, Udo Lembke ulem...@polarzone.de mailto:ulem...@polarzone.de wrote: Hi, for information for other cepher... I switched from unknown crush tunables to firefly and it's takes 6 hour (30.853% degration) to finisched on our production-cluster (5 Nodes, 60 OSDs, 10GBE, 20% data used: pgmap v35678572: 3904 pgs, 4 pools, 21947 GB data, 5489 kobjects). Should an chooseleaf_vary_r 1 (from 0) take round about the same time to finished?? Regards Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Have 2 different public networks
Thought I'd share details of my setup as I am effectively achieving this (ie making monitors accessible over multiple interfaces) with IP routing as follows: My Ceph hosts each have a /32 IP address on a loopback interface. And that is the IP address that all their Ceph daemons are bound to. In ceph.conf I do that by setting all the following values to the host's loopback IP: mon addr in the [mon.x] sections; cluster addr and public addr in the [mds.x] sections; and cluster addr, public addr, and osd heartbeat addr in the [osd.x] sections. Then I use IP routing to ensure that the hosts can all reach each other's loopback IPs, and that clients can reach them too, over the relevant networks. This also allows inter-host traffic to fail over to using alternate paths if the normal path is down for some reason. To put this in context, my cluster is just a small 3-node cluster, and I have a pair of layer 3 switches, with networking arranged like this: node1: has an 8 Gbps point-to-point routed link to node2 (and uses this link to communicate with node2 under normal circumstances) has an 8 Gbps point-to-point routed link to node3 (and uses this link to communicate with node3 under normal circumstances) has a 2 Gbps routed link to the primary layer 3 switch (clients which are not other nodes in the cluster use this link to communicate with node1 under normal circumstances, and traffic to nodes 2 and 3 will switch to using this if their respective point-to-point links go down) has a 1 Gbps routed link to the secondary layer 3 switch (this is not used under normal circumstances - only if the 2 Gbps link goes down) node2: has an 8 Gbps point-to-point routed link to node3 (and uses this link to communicate with node3 under normal circumstances) has an 8 Gbps point-to-point routed link to node1 (and uses this link to communicate with node1 under normal circumstances) has a 2 Gbps routed link to the primary layer 3 switch (clients which are not other nodes in the cluster use this link to communicate with node2 under normal circumstances, and traffic to nodes 3 and 1 will switch to using this if their respective point-to-point links go down) has a 1 Gbps routed link to the secondary layer 3 switch (this is not used under normal circumstances - only if the 2 Gbps link goes down) node3: has an 8 Gbps point-to-point routed link to node2 (and uses this link to communicate with node1 under normal circumstances) has an 8 Gbps point-to-point routed link to node3 (and uses this link to communicate with node2 under normal circumstances) has a 2 Gbps routed link to the primary layer 3 switch (clients which are not other nodes in the cluster use this link to communicate with node3 under normal circumstances, and traffic to nodes 1 and 2 will switch to using this if their respective point-to-point links go down) has a 1 Gbps routed link to the secondary layer 3 switch (this is not used under normal circumstances - only if the 2 Gbps link goes down) I have avoided using a proper routing protocol for this, as the failover still works automatically when links go down even with static routes. Although I do also have scripts running on the hosts that detect when a device at the other end of a link is not pingable even though the link is up, and dynamically removes/inserts the routes as necessary in such a situation. But adapting this approach to a larger cluster where point-to-point links between all hosts isn't viable might well warrant use of a routing protocol. The end result being that I have more control over where the different traffic goes, and it allows me to mess around with the networking without any effect on the cluster. Alex On 20/12/2014 5:23 AM, Craig Lewis wrote: On Fri, Dec 19, 2014 at 6:19 PM, Francois Lafont flafdiv...@free.fr mailto:flafdiv...@free.fr wrote: So, indeed, I have to use routing *or* maybe create 2 monitors by server like this: [mon.node1-public1] host = ceph-node1 mon addr = 10.0.1.1 [mon.node1-public2] host = ceph-node1 mon addr = 10.0.2.1 # etc... But, in this case, the working directories of mon.node1-public1 and mon.node1-public2 will be in the same disk (I have no choice). Is it a problem? Are monitors big consumers of I/O disk? Interesting idea. While you will have an even number of monitors, you'll still have an odd number of failure domains. I'm not sure if it'll work though... make sure you test having the leader on both networks. It might cause problems if the leader is on the 10.0.1.0/24 http://10.0.1.0/24 network? Monitors can be big consumers of disk IO, if there is a lot of cluster activity. Monitors records all of the cluster changes in LevelDB, and send copies to all of the daemons. There have been posts to the ML about people running out of Disk IOps on the monitors, and the problems it causes. The bigger the cluster, the more