Hi Vlad, Have not thoroughly tested my setup but so far things look good. Only problem is that I have to manually activate the osd's using the ceph-deploy command. Manually mounting the osd partition doesnt work.
Thanks for replying. Regards, Rahul S On 27 June 2018 at 14:15, Дробышевский, Владимир <v...@itgorod.ru> wrote: > Hello, Rahul! > > Do you have your problem during initial cluster creation or on any > reboot\leadership transfer? If the first then try to remove floating IP > while creating mons and temporarily transfer the leadership from the server > your going to create OSD on. > > We are using the same configuration without any issues (though have a > little bit more servers) but ceph cluster had been created before > OpenNebula setup. > > We have a number of physical\virtual interfaces on top of IPoIB _and_ > ethernet network (with bonding). > > So there are 3 interfaces for the internal communications: > > ib0.8003 - 10.103.0.0/16 - ceph public network and opennebula raft > virtual ip > ib0.8004 - 10.104.0.0/16 - ceph cluster network > br0 (on top of ethernet bonding interface) - 10.101.0.0/16 - physical > "management" network > > also we have a number of other virtual interfaces for per-tenant > intra-VM networks (vxlan on top of IP) and so on. > > > > in /etc/hosts we have only "fixed" IPs from 10.103.0.0/16 networks like: > > 10.103.0.1 e001n01.dc1.xxxxxxxx.xx e001n01 > > > > /etc/one/oned.conf: > > # Executed when a server transits from follower->leader > RAFT_LEADER_HOOK = [ > COMMAND = "raft/vip.sh", > ARGUMENTS = "leader ib0.8003 10.103.255.254/16" > ] > > # Executed when a server transits from leader->follower > RAFT_FOLLOWER_HOOK = [ > COMMAND = "raft/vip.sh", > ARGUMENTS = "follower ib0.8003 10.103.255.254/16" > ] > > > > /etc/ceph/ceph.conf: > > [global] > public_network = 10.103.0.0/16 > cluster_network = 10.104.0.0/16 > > mon_initial_members = e001n01, e001n02, e001n03 > mon_host = 10.103.0.1,10.103.0.2,10.103.0.3 > > > > Cluster and mons created with ceph-deploy, each OSD has been added via > modified ceph-disk.py (as we have only 3 drive slots per server we had to > co-locate system partition with OSD partition on our SSDs) on > per-host\drive manner: > > admin@<host>:~$ sudo ./ceph-disk-mod.py -v prepare --dmcrypt > --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore --cluster ceph > --fs-type xfs -- /dev/sda > > > And the current state on the leader: > > oneadmin@e001n02:~/remotes/tm$ onezone show 0 > ZONE 0 INFORMATION > ID : 0 > NAME : OpenNebula > > > ZONE SERVERS > ID NAME ENDPOINT > 0 e001n01 http://10.103.0.1:2633/RPC2 > 1 e001n02 http://10.103.0.2:2633/RPC2 > 2 e001n03 http://10.103.0.3:2633/RPC2 > > HA & FEDERATION SYNC STATUS > ID NAME STATE TERM INDEX COMMIT VOTE > FED_INDEX > 0 e001n01 follower 1571 68250418 68250417 1 -1 > 1 e001n02 leader 1571 68250418 68250418 1 -1 > 2 e001n03 follower 1571 68250418 68250417 -1 -1 > ... > > > admin@e001n02:~$ ip addr show ib0.8003 > 9: ib0.8003@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq > state UP group default qlen 256 > link/infiniband > a0:00:03:00:fe:80:00:00:00:00:00:00:00:1e:67:03:00:47:c1:1b > brd 00:ff:ff:ff:ff:12:40:1b:80:03:00:00:00:00:00:00:ff:ff:ff:ff > inet 10.103.0.2/16 brd 10.103.255.255 scope global ib0.8003 > valid_lft forever preferred_lft forever > inet 10.103.255.254/16 scope global secondary ib0.8003 > valid_lft forever preferred_lft forever > inet6 fe80::21e:6703:47:c11b/64 scope link > valid_lft forever preferred_lft forever > > admin@e001n02:~$ sudo netstat -anp | grep mon > tcp 0 0 10.103.0.2:6789 0.0.0.0:* > LISTEN 168752/ceph-mon > tcp 0 0 10.103.0.2:6789 10.103.0.2:44270 > ESTABLISHED 168752/ceph-mon > ... > > admin@e001n02:~$ sudo netstat -anp | grep osd > tcp 0 0 10.104.0.2:6800 0.0.0.0:* > LISTEN 6736/ceph-osd > tcp 0 0 10.104.0.2:6801 0.0.0.0:* > LISTEN 6736/ceph-osd > tcp 0 0 10.103.0.2:6801 0.0.0.0:* > LISTEN 6736/ceph-osd > tcp 0 0 10.103.0.2:6802 0.0.0.0:* > LISTEN 6736/ceph-osd > tcp 0 0 10.104.0.2:6801 10.104.0.6:42868 > ESTABLISHED 6736/ceph-osd > tcp 0 0 10.104.0.2:51788 10.104.0.1:6800 > ESTABLISHED 6736/ceph-osd > ... > > admin@e001n02:~$ sudo ceph -s > cluster: > id: <uuid> > health: HEALTH_OK > > oneadmin@e001n02:~/remotes/tm$ onedatastore show 0 > DATASTORE 0 INFORMATION > ID : 0 > NAME : system > USER : oneadmin > GROUP : oneadmin > CLUSTERS : 0 > TYPE : SYSTEM > DS_MAD : - > TM_MAD : ceph_shared > BASE PATH : /var/lib/one//datastores/0 > DISK_TYPE : RBD > STATE : READY > > ... > > DATASTORE TEMPLATE > ALLOW_ORPHANS="YES" > BRIDGE_LIST="e001n01 e001n02 e001n03" > CEPH_HOST="e001n01 e001n02 e001n03" > CEPH_SECRET="secret_uuid" > CEPH_USER="libvirt" > DEFAULT_DEVICE_PREFIX="sd" > DISK_TYPE="RBD" > DS_MIGRATE="NO" > POOL_NAME="rbd-ssd" > RESTRICTED_DIRS="/" > SAFE_DIRS="/mnt" > SHARED="YES" > TM_MAD="ceph_shared" > TYPE="SYSTEM_DS" > > ... > > oneadmin@e001n02:~/remotes/tm$ onedatastore show 1 > DATASTORE 1 INFORMATION > ID : 1 > NAME : default > USER : oneadmin > GROUP : oneadmin > CLUSTERS : 0 > TYPE : IMAGE > DS_MAD : ceph > TM_MAD : ceph_shared > BASE PATH : /var/lib/one//datastores/1 > DISK_TYPE : RBD > STATE : READY > > ... > > DATASTORE TEMPLATE > ALLOW_ORPHANS="YES" > BRIDGE_LIST="e001n01 e001n02 e001n03" > CEPH_HOST="e001n01 e001n02 e001n03" > CEPH_SECRET="secret_uuid" > CEPH_USER="libvirt" > CLONE_TARGET="SELF" > DISK_TYPE="RBD" > DRIVER="raw" > DS_MAD="ceph" > LN_TARGET="NONE" > POOL_NAME="rbd-ssd" > SAFE_DIRS="/mnt /var/lib/one/datastores/tmp" > STAGING_DIR="/var/lib/one/datastores/tmp" > TM_MAD="ceph_shared" > TYPE="IMAGE_DS" > > IMAGES > ... > > Leadership transfers without any issues as well. > > BR > > 2018-06-26 13:17 GMT+05:00 Rahul S <saple.rahul.eightyth...@gmail.com>: > >> Hi! In my organisation we are using OpenNebula as our Cloud Platform. >> Currently we are testing High Availability(HA) feature with Ceph Cluster as >> our storage backend. In our test setup we have 3 systems with front-end HA >> already successfully setup and configured with a floating IP in between >> them. We are having our ceph cluster(3 osds and 3 mons) on these very 3 >> machines. However, when we try to deploy a ceph cluster, we have a >> successful quorum with the following issues on the OpenNebula 'LEADER' node >> >> 1) The mon daemon successfully starts, but takes up the floating IP >> rather than the actual IP. >> >> 2) The osd daemon on the other hand goes down after a while giving an >> error >> log_channel(cluster) log [ERR] : map e29 had wrong cluster addr >> (192.x.x.20:6801/10821 != my 192.x.x.245:6801/10821) >> 192.x.x.20 being the floating ip >> 192.x.x.245 being the actual ip >> >> Apart from that, we are getting HEALTH_WARN status on running ceph -s, >> with many pgs in a degraded, unclean, undersized state >> >> Also, if that matters, we have our osds on a seperate partition rather >> than a disk. >> >> We only need to get the cluster in a healthy state in our minimalistic >> setup. Any idea on how to get past this? >> >> Thanks and Regards, >> Rahul S >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > > С уважением, > Дробышевский Владимир > Компания "АйТи Город" > +7 343 2222192 > > ИТ-консалтинг > Поставка проектов "под ключ" > Аутсорсинг ИТ-услуг > Аутсорсинг ИТ-инфраструктуры >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com