Re: [ceph-users] In a High Avaiability setup, MON, OSD daemon take up the floating IP

Дробышевский , Владимир Thu, 28 Jun 2018 22:08:24 -0700

Rahul,

  if you are using the whole drives for OSDs then ceph-deploy is a good
option in most cases.


2018-06-28 18:12 GMT+05:00 Rahul S <saple.rahul.eightyth...@gmail.com>:

> Hi Vlad,
>
> Have not thoroughly tested my setup but so far things look good. Only
> problem is that I have to manually activate the osd's using the ceph-deploy
> command. Manually mounting the osd partition doesnt work.
>
> Thanks for replying.
>
> Regards,
> Rahul S
>
> On 27 June 2018 at 14:15, Дробышевский, Владимир <v...@itgorod.ru> wrote:
>
>> Hello, Rahul!
>>
>>   Do you have your problem during initial cluster creation or on any
>> reboot\leadership transfer? If the first then try to remove floating IP
>> while creating mons and temporarily transfer the leadership from the server
>> your going to create OSD on.
>>
>>   We are using the same configuration without any issues (though have a
>> little bit more servers) but ceph cluster had been created before
>> OpenNebula setup.
>>
>>   We have a number of physical\virtual interfaces on top of IPoIB _and_
>> ethernet network (with bonding).
>>
>>   So there are 3 interfaces for the internal communications:
>>
>>   ib0.8003 - 10.103.0.0/16 - ceph public network and opennebula raft
>> virtual ip
>>   ib0.8004 - 10.104.0.0/16 - ceph cluster network
>>   br0 (on top of ethernet bonding interface) - 10.101.0.0/16 - physical
>> "management" network
>>
>>   also we have a number of other virtual interfaces for per-tenant
>> intra-VM networks (vxlan on top of IP) and so on.
>>
>>
>>
>> in /etc/hosts we have only "fixed" IPs from 10.103.0.0/16 networks like:
>>
>> 10.103.0.1      e001n01.dc1.xxxxxxxx.xx        e001n01
>>
>>
>>
>>   /etc/one/oned.conf:
>>
>> # Executed when a server transits from follower->leader
>>  RAFT_LEADER_HOOK = [
>>      COMMAND = "raft/vip.sh",
>>      ARGUMENTS = "leader ib0.8003 10.103.255.254/16"
>>  ]
>>
>> # Executed when a server transits from leader->follower
>>  RAFT_FOLLOWER_HOOK = [
>>      COMMAND = "raft/vip.sh",
>>      ARGUMENTS = "follower ib0.8003 10.103.255.254/16"
>>  ]
>>
>>
>>
>>   /etc/ceph/ceph.conf:
>>
>> [global]
>> public_network = 10.103.0.0/16
>> cluster_network = 10.104.0.0/16
>>
>> mon_initial_members = e001n01, e001n02, e001n03
>> mon_host = 10.103.0.1,10.103.0.2,10.103.0.3
>>
>>
>>
>>   Cluster and mons created with ceph-deploy, each OSD has been added via
>> modified ceph-disk.py (as we have only 3 drive slots per server we had to
>> co-locate system partition with OSD partition on our SSDs) on
>> per-host\drive manner:
>>
>> admin@<host>:~$ sudo ./ceph-disk-mod.py -v prepare --dmcrypt
>> --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --bluestore --cluster ceph
>> --fs-type xfs -- /dev/sda
>>
>>
>>   And the current state on the leader:
>>
>> oneadmin@e001n02:~/remotes/tm$ onezone show 0
>> ZONE 0 INFORMATION
>> ID                : 0
>> NAME              : OpenNebula
>>
>>
>> ZONE SERVERS
>> ID NAME            ENDPOINT
>>  0 e001n01         http://10.103.0.1:2633/RPC2
>>  1 e001n02         http://10.103.0.2:2633/RPC2
>>  2 e001n03         http://10.103.0.3:2633/RPC2
>>
>> HA & FEDERATION SYNC STATUS
>> ID NAME            STATE      TERM       INDEX      COMMIT     VOTE
>> FED_INDEX
>>  0 e001n01         follower   1571       68250418   68250417   1     -1
>>  1 e001n02         leader     1571       68250418   68250418   1     -1
>>  2 e001n03         follower   1571       68250418   68250417   -1    -1
>> ...
>>
>>
>> admin@e001n02:~$ ip addr show ib0.8003
>> 9: ib0.8003@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc mq
>> state UP group default qlen 256
>>     link/infiniband 
>> a0:00:03:00:fe:80:00:00:00:00:00:00:00:1e:67:03:00:47:c1:1b
>> brd 00:ff:ff:ff:ff:12:40:1b:80:03:00:00:00:00:00:00:ff:ff:ff:ff
>>     inet 10.103.0.2/16 brd 10.103.255.255 scope global ib0.8003
>>        valid_lft forever preferred_lft forever
>>     inet 10.103.255.254/16 scope global secondary ib0.8003
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::21e:6703:47:c11b/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> admin@e001n02:~$ sudo netstat -anp | grep mon
>> tcp        0      0 10.103.0.2:6789         0.0.0.0:*
>>  LISTEN      168752/ceph-mon
>> tcp        0      0 10.103.0.2:6789         10.103.0.2:44270
>> ESTABLISHED 168752/ceph-mon
>> ...
>>
>> admin@e001n02:~$ sudo netstat -anp | grep osd
>> tcp        0      0 10.104.0.2:6800         0.0.0.0:*
>>  LISTEN      6736/ceph-osd
>> tcp        0      0 10.104.0.2:6801         0.0.0.0:*
>>  LISTEN      6736/ceph-osd
>> tcp        0      0 10.103.0.2:6801         0.0.0.0:*
>>  LISTEN      6736/ceph-osd
>> tcp        0      0 10.103.0.2:6802         0.0.0.0:*
>>  LISTEN      6736/ceph-osd
>> tcp        0      0 10.104.0.2:6801         10.104.0.6:42868
>> ESTABLISHED 6736/ceph-osd
>> tcp        0      0 10.104.0.2:51788        10.104.0.1:6800
>>  ESTABLISHED 6736/ceph-osd
>> ...
>>
>> admin@e001n02:~$ sudo ceph -s
>>   cluster:
>>     id:     <uuid>
>>     health: HEALTH_OK
>>
>> oneadmin@e001n02:~/remotes/tm$ onedatastore show 0
>> DATASTORE 0 INFORMATION
>> ID             : 0
>> NAME           : system
>> USER           : oneadmin
>> GROUP          : oneadmin
>> CLUSTERS       : 0
>> TYPE           : SYSTEM
>> DS_MAD         : -
>> TM_MAD         : ceph_shared
>> BASE PATH      : /var/lib/one//datastores/0
>> DISK_TYPE      : RBD
>> STATE          : READY
>>
>> ...
>>
>> DATASTORE TEMPLATE
>> ALLOW_ORPHANS="YES"
>> BRIDGE_LIST="e001n01 e001n02 e001n03"
>> CEPH_HOST="e001n01 e001n02 e001n03"
>> CEPH_SECRET="secret_uuid"
>> CEPH_USER="libvirt"
>> DEFAULT_DEVICE_PREFIX="sd"
>> DISK_TYPE="RBD"
>> DS_MIGRATE="NO"
>> POOL_NAME="rbd-ssd"
>> RESTRICTED_DIRS="/"
>> SAFE_DIRS="/mnt"
>> SHARED="YES"
>> TM_MAD="ceph_shared"
>> TYPE="SYSTEM_DS"
>>
>> ...
>>
>> oneadmin@e001n02:~/remotes/tm$ onedatastore show 1
>> DATASTORE 1 INFORMATION
>> ID             : 1
>> NAME           : default
>> USER           : oneadmin
>> GROUP          : oneadmin
>> CLUSTERS       : 0
>> TYPE           : IMAGE
>> DS_MAD         : ceph
>> TM_MAD         : ceph_shared
>> BASE PATH      : /var/lib/one//datastores/1
>> DISK_TYPE      : RBD
>> STATE          : READY
>>
>> ...
>>
>> DATASTORE TEMPLATE
>> ALLOW_ORPHANS="YES"
>> BRIDGE_LIST="e001n01 e001n02 e001n03"
>> CEPH_HOST="e001n01 e001n02 e001n03"
>> CEPH_SECRET="secret_uuid"
>> CEPH_USER="libvirt"
>> CLONE_TARGET="SELF"
>> DISK_TYPE="RBD"
>> DRIVER="raw"
>> DS_MAD="ceph"
>> LN_TARGET="NONE"
>> POOL_NAME="rbd-ssd"
>> SAFE_DIRS="/mnt /var/lib/one/datastores/tmp"
>> STAGING_DIR="/var/lib/one/datastores/tmp"
>> TM_MAD="ceph_shared"
>> TYPE="IMAGE_DS"
>>
>> IMAGES
>> ...
>>
>> Leadership transfers without any issues as well.
>>
>> BR
>>
>> 2018-06-26 13:17 GMT+05:00 Rahul S <saple.rahul.eightyth...@gmail.com>:
>>
>>> Hi! In my organisation we are using OpenNebula as our Cloud Platform.
>>> Currently we are testing High Availability(HA) feature with Ceph Cluster as
>>> our storage backend. In our test setup we have 3 systems with front-end HA
>>> already successfully setup and configured with a floating IP in between
>>> them. We are having our ceph cluster(3 osds and 3 mons) on these very 3
>>> machines. However, when we try to deploy a ceph cluster, we have a
>>> successful quorum with the following issues on the OpenNebula 'LEADER' node
>>>
>>>     1) The mon daemon successfully starts, but takes up the floating IP
>>> rather than the actual IP.
>>>
>>>     2) The osd daemon on the other hand goes down after a while giving
>>> an error
>>>     log_channel(cluster) log [ERR] : map e29 had wrong cluster addr
>>> (192.x.x.20:6801/10821 != my 192.x.x.245:6801/10821)
>>>     192.x.x.20 being the floating ip
>>>     192.x.x.245 being the actual ip
>>>
>>> Apart from that, we are getting HEALTH_WARN status on running ceph -s,
>>> with many pgs in a degraded, unclean, undersized state
>>>
>>> Also, if that matters, we have our osds on a seperate partition rather
>>> than a disk.
>>>
>>> We only need to get the cluster in a healthy state in our minimalistic
>>> setup. Any idea on how to get past this?
>>>
>>> Thanks and Regards,
>>> Rahul S
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>>
>> С уважением,
>> Дробышевский Владимир
>> Компания "АйТи Город"
>> +7 343 2222192
>>
>> ИТ-консалтинг
>> Поставка проектов "под ключ"
>> Аутсорсинг ИТ-услуг
>> Аутсорсинг ИТ-инфраструктуры
>>
>
>


-- 

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 2222192

ИТ-консалтинг
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг
Аутсорсинг ИТ-инфраструктуры

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] In a High Avaiability setup, MON, OSD daemon take up the floating IP

Reply via email to