Whoops, replied off-list. Additionally I noticed that the generated corosync config is not valid, as there is no interface section:
/etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: rd-ganesha-ha transport: udpu } nodelist { node { ring0_addr: cobalt nodeid: 1 } node { ring0_addr: iron nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_syslog: yes } ---------- Forwarded message ---------- From: Tiemen Ruiten <t.rui...@rdmedia.com> Date: 21 September 2015 at 17:16 Subject: Re: [Gluster-users] nfs-ganesha HA with arbiter volume To: Jiffin Tony Thottan <jthot...@redhat.com> Could you point me to the latest documentation? I've been struggling to find something up-to-date. I believe I have all the prerequisites: - shared storage volume exists and is mounted - all nodes in hosts files - Gluster-NFS disabled - corosync, pacemaker and nfs-ganesha rpm's installed Anything I missed? Everything has been installed by RPM so is in the default locations: /usr/libexec/ganesha/ganesha-ha.sh /etc/ganesha/ganesha.conf (empty) /etc/ganesha/ganesha-ha.conf After I started the pcsd service manually, nfs-ganesha could be enabled successfully, but there was no virtual IP present on the interfaces and looking at the system log, I noticed corosync failed to start: - on the host where I issued the gluster nfs-ganesha enable command: Sep 21 17:07:18 iron systemd: Starting NFS-Ganesha file server... Sep 21 17:07:19 iron systemd: Started NFS-Ganesha file server. Sep 21 17:07:19 iron rpc.statd[2409]: Received SM_UNMON_ALL request from iron.int.rdmedia.com while not monitoring any hosts Sep 21 17:07:20 iron systemd: Starting Corosync Cluster Engine... Sep 21 17:07:20 iron corosync[3426]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service. Sep 21 17:07:20 iron corosync[3426]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transport (UDP/IP Unicast). Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] The network interface [10.100.30.38] is now up. Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync configuration map access [0] Sep 21 17:07:20 iron corosync[3427]: [QB ] server name: cmap Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync configuration service [1] Sep 21 17:07:20 iron corosync[3427]: [QB ] server name: cfg Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Sep 21 17:07:20 iron corosync[3427]: [QB ] server name: cpg Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync profile loading service [4] Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Using quorum provider corosync_votequorum Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Sep 21 17:07:20 iron corosync[3427]: [QB ] server name: votequorum Sep 21 17:07:20 iron corosync[3427]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Sep 21 17:07:20 iron corosync[3427]: [QB ] server name: quorum Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member {10.100.30.38} Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] adding new UDPU member {10.100.30.37} Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership ( 10.100.30.38:104) was formed. Members joined: 1 Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:20 iron corosync[3427]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:20 iron corosync[3427]: [QUORUM] Members[1]: 1 Sep 21 17:07:20 iron corosync[3427]: [MAIN ] Completed service synchronization, ready to provide service. Sep 21 17:07:20 iron corosync[3427]: [TOTEM ] A new membership ( 10.100.30.37:108) was formed. Members joined: 1 Sep 21 17:08:21 iron corosync: Starting Corosync Cluster Engine (corosync): [FAILED] Sep 21 17:08:21 iron systemd: corosync.service: control process exited, code=exited status=1 Sep 21 17:08:21 iron systemd: Failed to start Corosync Cluster Engine. Sep 21 17:08:21 iron systemd: Unit corosync.service entered failed state. - on the other host: Sep 21 17:07:19 cobalt systemd: Starting Preprocess NFS configuration... Sep 21 17:07:19 cobalt systemd: Starting RPC Port Mapper. Sep 21 17:07:19 cobalt systemd: Reached target RPC Port Mapper. Sep 21 17:07:19 cobalt systemd: Starting Host and Network Name Lookups. Sep 21 17:07:19 cobalt systemd: Reached target Host and Network Name Lookups. Sep 21 17:07:19 cobalt systemd: Starting RPC bind service... Sep 21 17:07:19 cobalt systemd: Started Preprocess NFS configuration. Sep 21 17:07:19 cobalt systemd: Started RPC bind service. Sep 21 17:07:19 cobalt systemd: Starting NFS status monitor for NFSv2/3 locking.... Sep 21 17:07:19 cobalt rpc.statd[2662]: Version 1.3.0 starting Sep 21 17:07:19 cobalt rpc.statd[2662]: Flags: TI-RPC Sep 21 17:07:19 cobalt systemd: Started NFS status monitor for NFSv2/3 locking.. Sep 21 17:07:19 cobalt systemd: Starting NFS-Ganesha file server... Sep 21 17:07:19 cobalt systemd: Started NFS-Ganesha file server. Sep 21 17:07:19 cobalt kernel: warning: `ganesha.nfsd' uses 32-bit capabilities (legacy support in use) Sep 21 17:07:19 cobalt logger: setting up rd-ganesha-ha Sep 21 17:07:19 cobalt rpc.statd[2662]: Received SM_UNMON_ALL request from cobalt.int.rdmedia.com while not monitoring any hosts Sep 21 17:07:19 cobalt logger: setting up cluster rd-ganesha-ha with the following cobalt iron Sep 21 17:07:20 cobalt systemd: Stopped Pacemaker High Availability Cluster Manager. Sep 21 17:07:20 cobalt systemd: Stopped Corosync Cluster Engine. Sep 21 17:07:20 cobalt systemd: Reloading. Sep 21 17:07:20 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 21 17:07:20 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 21 17:07:20 cobalt systemd: Reloading. Sep 21 17:07:20 cobalt systemd: [/usr/lib/systemd/system/dm-event.socket:10] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 21 17:07:20 cobalt systemd: [/usr/lib/systemd/system/lvm2-lvmetad.socket:9] Unknown lvalue 'RemoveOnStop' in section 'Socket' Sep 21 17:07:20 cobalt systemd: Starting Corosync Cluster Engine... Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to provide service. Sep 21 17:07:20 cobalt corosync[2816]: [MAIN ] Corosync built-in features: dbus systemd xmlconf snmp pie relro bindnow Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transport (UDP/IP Unicast). Sep 21 17:07:20 cobalt corosync[2817]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] The network interface [10.100.30.37] is now up. Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync configuration map access [0] Sep 21 17:07:21 cobalt corosync[2817]: [QB ] server name: cmap Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync configuration service [1] Sep 21 17:07:21 cobalt corosync[2817]: [QB ] server name: cfg Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2] Sep 21 17:07:21 cobalt corosync[2817]: [QB ] server name: cpg Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync profile loading service [4] Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Using quorum provider corosync_votequorum Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5] Sep 21 17:07:21 cobalt corosync[2817]: [QB ] server name: votequorum Sep 21 17:07:21 cobalt corosync[2817]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3] Sep 21 17:07:21 cobalt corosync[2817]: [QB ] server name: quorum Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member {10.100.30.37} Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] adding new UDPU member {10.100.30.38} Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership ( 10.100.30.37:100) was formed. Members joined: 1 Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ] Completed service synchronization, ready to provide service. Sep 21 17:07:21 cobalt corosync[2817]: [TOTEM ] A new membership ( 10.100.30.37:108) was formed. Members joined: 1 Sep 21 17:07:21 cobalt corosync[2817]: [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2 Sep 21 17:07:21 cobalt corosync[2817]: [QUORUM] Members[1]: 1 Sep 21 17:07:21 cobalt corosync[2817]: [MAIN ] Completed service synchronization, ready to provide service. Sep 21 17:08:50 cobalt systemd: corosync.service operation timed out. Terminating. Sep 21 17:08:50 cobalt corosync: Starting Corosync Cluster Engine (corosync): Sep 21 17:08:50 cobalt systemd: Failed to start Corosync Cluster Engine. Sep 21 17:08:50 cobalt systemd: Unit corosync.service entered failed state. Sep 21 17:08:55 cobalt logger: warning: pcs property set no-quorum-policy=ignore failed Sep 21 17:08:55 cobalt logger: warning: pcs property set stonith-enabled=false failed Sep 21 17:08:55 cobalt logger: warning: pcs resource create nfs_start ganesha_nfsd ha_vol_mnt=/var/run/gluster/shared_storage --clone failed Sep 21 17:08:56 cobalt logger: warning: pcs resource delete nfs_start-clone failed Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-mon ganesha_mon --clone failed Sep 21 17:08:56 cobalt logger: warning: pcs resource create nfs-grace ganesha_grace --clone failed Sep 21 17:08:57 cobalt logger: warning pcs resource create cobalt-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor interval=15s failed Sep 21 17:08:57 cobalt logger: warning: pcs resource create cobalt-trigger_ip-1 ocf:heartbeat:Dummy failed Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add cobalt-cluster_ip-1 with cobalt-trigger_ip-1 failed Sep 21 17:08:57 cobalt logger: warning: pcs constraint order cobalt-trigger_ip-1 then nfs-grace-clone failed Sep 21 17:08:57 cobalt logger: warning: pcs constraint order nfs-grace-clone then cobalt-cluster_ip-1 failed Sep 21 17:08:57 cobalt logger: warning pcs resource create iron-cluster_ip-1 ocf:heartbeat:IPaddr ip= cidr_netmask=32 op monitor interval=15s failed Sep 21 17:08:57 cobalt logger: warning: pcs resource create iron-trigger_ip-1 ocf:heartbeat:Dummy failed Sep 21 17:08:57 cobalt logger: warning: pcs constraint colocation add iron-cluster_ip-1 with iron-trigger_ip-1 failed Sep 21 17:08:57 cobalt logger: warning: pcs constraint order iron-trigger_ip-1 then nfs-grace-clone failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint order nfs-grace-clone then iron-cluster_ip-1 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers iron=1000 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location cobalt-cluster_ip-1 prefers cobalt=2000 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 rule score=-INFINITY ganesha-active ne 1 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers cobalt=1000 failed Sep 21 17:08:58 cobalt logger: warning: pcs constraint location iron-cluster_ip-1 prefers iron=2000 failed Sep 21 17:08:58 cobalt logger: warning pcs cluster cib-push /tmp/tmp.nXTfyA1GMR failed Sep 21 17:08:58 cobalt logger: warning: scp ganesha-ha.conf to cobalt failed BTW, I'm using CentOS 7. There are multiple network interfaces on the servers, could that be a problem? On 21 September 2015 at 11:48, Jiffin Tony Thottan <jthot...@redhat.com> wrote: > > > On 21/09/15 13:56, Tiemen Ruiten wrote: > > Hello Soumya, Kaleb, list, > > This Friday I created the gluster_shared_storage volume manually, I just > tried it with the command you supplied, but both have the same result: > > from etc-glusterfs-glusterd.vol.log on the node where I issued the command: > > [2015-09-21 07:59:47.756845] I [MSGID: 106474] > [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha host found > Hostname is cobalt > [2015-09-21 07:59:48.071755] I [MSGID: 106474] > [glusterd-ganesha.c:349:is_ganesha_host] 0-management: ganesha host found > Hostname is cobalt > [2015-09-21 07:59:48.653879] E [MSGID: 106470] > [glusterd-ganesha.c:264:glusterd_op_set_ganesha] 0-management: Initial > NFS-Ganesha set up failed > > > As far as what I understand from the logs, it called setup_cluser()[calls > `ganesha-ha.sh` script ] but script failed. > Can u please provide following details : > -Location of ganesha.sh file?? > -Location of ganesha-ha.conf, ganesha.conf files ? > > > And also can u cross check whether all the prerequisites before HA setup > satisfied ? > > -- > With Regards, > Jiffin > > > [2015-09-21 07:59:48.653912] E [MSGID: 106123] > [glusterd-syncop.c:1404:gd_commit_op_phase] 0-management: Commit of > operation 'Volume (null)' failed on localhost : Failed to set up HA config > for NFS-Ganesha. Please check the log file for details > [2015-09-21 07:59:45.402458] I [MSGID: 106006] > [glusterd-svc-mgmt.c:323:glusterd_svc_common_rpc_notify] 0-management: nfs > has disconnected from glusterd. > [2015-09-21 07:59:48.071578] I [MSGID: 106474] > [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha host found > Hostname is cobalt > > from etc-glusterfs-glusterd.vol.log on the other node: > > [2015-09-21 08:12:50.111877] E [MSGID: 106062] > [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable to > acquire volname > [2015-09-21 08:14:50.548087] E [MSGID: 106062] > [glusterd-op-sm.c:3635:glusterd_op_ac_lock] 0-management: Unable to acquire > volname > [2015-09-21 08:14:50.654746] I [MSGID: 106132] > [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already > stopped > [2015-09-21 08:14:50.655095] I [MSGID: 106474] > [glusterd-ganesha.c:403:check_host_list] 0-management: ganesha host found > Hostname is cobalt > [2015-09-21 08:14:51.287156] E [MSGID: 106062] > [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable to > acquire volname > > > from etc-glusterfs-glusterd.vol.log on the arbiter node: > > [2015-09-21 08:18:50.934713] E [MSGID: 101075] > [common-utils.c:3127:gf_is_local_addr] 0-management: error in getaddrinfo: > Name or service not known > [2015-09-21 08:18:51.504694] E [MSGID: 106062] > [glusterd-op-sm.c:3698:glusterd_op_ac_unlock] 0-management: Unable to > acquire volname > > I have put the hostnames of all servers in my /etc/hosts file, including > the arbiter node. > > > On 18 September 2015 at 16:52, Soumya Koduri <skod...@redhat.com> wrote: > >> Hi Tiemen, >> >> One of the pre-requisites before setting up nfs-ganesha HA is to create >> and mount shared_storage volume. Use below CLI for that >> >> "gluster volume set all cluster.enable-shared-storage enable" >> >> It shall create the volume and mount in all the nodes (including the >> arbiter node). Note this volume shall be mounted on all the nodes of the >> gluster storage pool (though in this case it may not be part of nfs-ganesha >> cluster). >> >> So instead of manually creating those directory paths, please use above >> CLI and try re-configuring the setup. >> >> Thanks, >> Soumya >> >> On 09/18/2015 07:29 PM, Tiemen Ruiten wrote: >> >>> Hello Kaleb, >>> >>> I don't: >>> >>> # Name of the HA cluster created. >>> # must be unique within the subnet >>> HA_NAME="rd-ganesha-ha" >>> # >>> # The gluster server from which to mount the shared data volume. >>> HA_VOL_SERVER="iron" >>> # >>> # N.B. you may use short names or long names; you may not use IP addrs. >>> # Once you select one, stay with it as it will be mildly unpleasant to >>> # clean up if you switch later on. Ensure that all names - short and/or >>> # long - are in DNS or /etc/hosts on all machines in the cluster. >>> # >>> # The subset of nodes of the Gluster Trusted Pool that form the ganesha >>> # HA cluster. Hostname is specified. >>> HA_CLUSTER_NODES="cobalt,iron" >>> #HA_CLUSTER_NODES="server1.lab.redhat.com >>> <http://server1.lab.redhat.com>,server2.lab.redhat.com >>> <http://server2.lab.redhat.com>,..." >>> # >>> # Virtual IPs for each of the nodes specified above. >>> VIP_server1="10.100.30.101" >>> VIP_server2="10.100.30.102" >>> #VIP_server1_lab_redhat_com="10.0.2.1" >>> #VIP_server2_lab_redhat_com="10.0.2.2" >>> >>> hosts cobalt & iron are the data nodes, the arbiter ip/hostname (neon) >>> isn't mentioned anywhere in this config file. >>> >>> >>> On 18 September 2015 at 15:56, Kaleb S. KEITHLEY < <kkeit...@redhat.com> >>> kkeit...@redhat.com >>> <mailto:kkeit...@redhat.com>> wrote: >>> >>> On 09/18/2015 09:46 AM, Tiemen Ruiten wrote: >>> > Hello, >>> > >>> > I have a Gluster cluster with a single replica 3, arbiter 1 volume >>> (so >>> > two nodes with actual data, one arbiter node). I would like to >>> setup >>> > NFS-Ganesha HA for this volume but I'm having some difficulties. >>> > >>> > - I needed to create a directory /var/run/gluster/shared_storage >>> > manually on all nodes, or the command 'gluster nfs-ganesha enable >>> would >>> > fail with the following error: >>> > [2015-09-18 13:13:34.690416] E [MSGID: 106032] >>> > [glusterd-ganesha.c:708:pre_setup] 0-THIS->name: mkdir() failed on >>> path >>> > /var/run/gluster/shared_storage/nfs-ganesha, [No such file or >>> directory] >>> > >>> > - Then I found out that the command connects to the arbiter node as >>> > well, but obviously I don't want to set up NFS-Ganesha there. Is it >>> > actually possible to setup NFS-Ganesha HA with an arbiter node? If >>> it's >>> > possible, is there any documentation on how to do that? >>> > >>> >>> Please send the /etc/ganesha/ganesha-ha.conf file you're using. >>> >>> Probably you have included the arbiter in your HA config; that would >>> be >>> a mistake. >>> >>> -- >>> >>> Kaleb >>> >>> >>> >>> >>> -- >>> Tiemen Ruiten >>> Systems Engineer >>> R&D Media >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users@gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-users >>> >>> > > > -- > Tiemen Ruiten > Systems Engineer > R&D Media > > > _______________________________________________ > Gluster-users mailing > listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > -- Tiemen Ruiten Systems Engineer R&D Media -- Tiemen Ruiten Systems Engineer R&D Media
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users