Re: [Gluster-users] 3 node NFS-Ganesha Cluster

Soumya Koduri Fri, 27 Nov 2015 00:58:51 -0800

Hi,

On 11/27/2015 01:58 PM, ml wrote:

Dear All,


I am trying to get a nfs-ganesha ha cluster running, with 3, CentOS
Linux release 7.1.1503 nodes. I use the package glusterfs-ganesha-3.7.6
-1.el7.x86_64 to get the HA scripts. So far it works fine when i stop
the nfs-ganesha service on one of the node it moves the virtual ip to
one of the other node, altai-dead_ip-1 resource is created properly:


        root@rnas2 ~# pcs status
        Cluster name: ganesha-cluster-dmath
        Last updated: Thu Nov 26 10:41:07 2015          Last
change: Thu Nov 26 10:40:06 2015 by root via cibadmin on altai
        Stack: corosync
        Current DC: rnas2 (version 1.1.13-a14efad) - partition with
quorum
        3 nodes and 13 resources configured

        Online: [ altai kaukasus rnas2 ]

        Full list of resources:

         Clone Set: nfs-mon-clone [nfs-mon]
             Started: [ altai kaukasus rnas2 ]
         Clone Set: nfs-grace-clone [nfs-grace]
             Started: [ altai kaukasus rnas2 ]
         kaukasus-cluster_ip-1  (ocf::heartbeat:IPaddr):        S
tarted kaukasus
         kaukasus-trigger_ip-1  (ocf::heartbeat:Dummy): St
arted kaukasus
         altai-cluster_ip-1     (ocf::heartbeat:IPaddr):        Star
ted kaukasus
         altai-trigger_ip-1     (ocf::heartbeat:Dummy): Start
ed kaukasus
         rnas2-cluster_ip-1     (ocf::heartbeat:IPaddr):        Star
ted rnas2
         rnas2-trigger_ip-1     (ocf::heartbeat:Dummy): Start
ed rnas2
         altai-dead_ip-1        (ocf::heartbeat:Dummy): Started
altai

        PCSD Status:
          kaukasus: Online
          altai: Online
          rnas2: Online

        Daemon Status:
          corosync: active/enabled
          pacemaker: active/enabled
          pcsd: active/enabled


But when i just disconnect the network on one of the node, in this case
altai (or poweroff),


        root@altai ~# ifdown bond0


it takes down the whole cluster. I found the following message in the
logs:


        Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs
-grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms)


I wonder if i just misconfigured something or if this is not supported
yet?

Since its a 3-node cluster, quorum shall be enabled. When any of thosemachine/its IP is down, quorum shall be lost resulting in pacemakershutting down entire cluster. If possible could you check the samescenario with 4-node setup?


Thanks,
Soumya


below the log during the take down:

        Nov 26 10:44:24 rnas2 corosync[8848]: [TOTEM ] A new membership
(129.132.145.5:1048) was formed. Members left: 2
        Nov 26 10:44:24 rnas2 attrd[17253]: notice:
crm_update_peer_proc: Node altai[2] - state is now lost (was member)
        Nov 26 10:44:24 rnas2 attrd[17253]: notice: Removing all altai
attributes for attrd_peer_change_cb
        Nov 26 10:44:25 rnas2 corosync[8848]: [QUORUM] Members[2]: 1 3
        Nov 26 10:44:25 rnas2 corosync[8848]: [MAIN  ] Completed
service synchronization, ready to provide service.
        Nov 26 10:44:25 rnas2 cib[17250]: notice: crm_update_peer_proc:
Node altai[2] - state is now lost (was member)
        Nov 26 10:44:25 rnas2 cib[17250]: notice: Removing altai/2 from
the membership list
        Nov 26 10:44:25 rnas2 cib[17250]: notice: Purged 1 peers with
id=2 and/or uname=altai from the membership cache
        Nov 26 10:44:25 rnas2 pacemakerd[17249]: notice: Node altai[2]
- state is now lost (was member)
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Node altai[2] -
state is now lost (was member)
        Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice:
crm_update_peer_proc: Node altai[2] - state is now lost (was member)
        Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for
shutdown action on 2
        Nov 26 10:44:25 rnas2 attrd[17253]: notice: Removing altai/2
from the membership list
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of
altai not matched
        Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Removing
altai/2 from the membership list
        Nov 26 10:44:25 rnas2 attrd[17253]: notice: Purged 1 peers with
id=2 and/or uname=altai from the membership cache
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
        Nov 26 10:44:25 rnas2 stonith-ng[17251]: notice: Purged 1 peers
with id=2 and/or uname=altai from the membership cache
        Nov 26 10:44:25 rnas2 crmd[17255]: warning: No match for
shutdown action on 2
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Stonith/shutdown of
altai not matched
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs
-grace:0        (Started kaukasus)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart nfs
-grace:1        (Started rnas2)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart kaukasus
-cluster_ip-1        (Started kaukasus)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start   altai
-cluster_ip-1        (kaukasus)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Start   altai
-trigger_ip-1        (kaukasus)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Restart rnas2
-cluster_ip-1        (Started rnas2)
        Nov 26 10:44:25 rnas2 pengine[17254]: notice: Calculated
Transition 85: /var/lib/pacemaker/pengine/pe-input-86.bz2
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
29: stop kaukasus-cluster_ip-1_stop_0 on kaukasus
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
35: start altai-trigger_ip-1_start_0 on kaukasus
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
37: stop rnas2-cluster_ip-1_stop_0 on rnas2 (local)
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
36: monitor altai-trigger_ip-1_monitor_10000 on kaukasus
        Nov 26 10:44:25 rnas2 IPaddr(rnas2-cluster_ip-1)[30797]: INFO:
IP status = ok, IP_CIP=
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation rnas2
-cluster_ip-1_stop_0: ok (node=rnas2, call=82, rc=0, cib-update=210,
confirmed=true)
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
21: stop nfs-grace_stop_0 on kaukasus
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
23: stop nfs-grace_stop_0 on rnas2 (local)
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Operation nfs
-grace_stop_0: ok (node=rnas2, call=84, rc=0, cib-update=211,
confirmed=true)
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
22: start nfs-grace_start_0 on kaukasus
        Nov 26 10:44:25 rnas2 crmd[17255]: notice: Initiating action
24: start nfs-grace_start_0 on rnas2 (local)
        Nov 26 10:44:26 rnas2 ntpd[1700]: Deleting interface #27 bond0,
129.132.145.23#123, interface stats: received=0, sent=0, dropped=0,
active_time=69258 secs
        Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs-grace_start_0
process (PID 30810) timed out
        Nov 26 10:45:05 rnas2 lrmd[17252]: warning: nfs
-grace_start_0:30810 - timed out after 40000ms
        Nov 26 10:45:05 rnas2 crmd[17255]: error: Operation nfs
-grace_start_0: Timed Out (node=rnas2, call=85, timeout=40000ms)
        Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs
-grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition aborted
by nfs-grace_start_0 'modify' on rnas2: Event failed
(magic=2:1;24:85:0:836713e1-c9d3-43f8-bffd-756e023eee8a,...event:381,
0)
        Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 24 (nfs
-grace_start_0) on rnas2 failed (target: 0 vs. rc: 1): Error
        Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs
-grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error
        Nov 26 10:45:05 rnas2 crmd[17255]: warning: Action 22 (nfs
-grace_start_0) on kaukasus failed (target: 0 vs. rc: 1): Error
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 85
(Complete=13, Pending=0, Fired=0, Skipped=3, Incomplete=8,
Source=/var/lib/pacemaker/pengine/pe-input-86.bz2): Stopped
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:0 on kaukasus: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:0 on kaukasus: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:1 on rnas2: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:1 on rnas2: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Recover nfs
-grace:0        (Started kaukasus)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
-grace:1        (rnas2)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   kaukasus
-cluster_ip-1        (kaukasus)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   altai
-cluster_ip-1        (kaukasus)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   rnas2
-cluster_ip-1        (rnas2)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated
Transition 86: /var/lib/pacemaker/pengine/pe-input-87.bz2
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:0 on kaukasus: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:0 on kaukasus: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:1 on rnas2: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Processing
failed op start for nfs-grace:1 on rnas2: unknown error (1)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from kaukasus after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from kaukasus after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from kaukasus after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: warning: Forcing nfs
-grace-clone away from rnas2 after 1000000 failures (max=1000000)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
-grace:0        (kaukasus)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Stop    nfs
-grace:1        (rnas2)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   kaukasus
-cluster_ip-1        (kaukasus - blocked)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   altai
-cluster_ip-1        (kaukasus - blocked)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Start   rnas2
-cluster_ip-1        (rnas2 - blocked)
        Nov 26 10:45:05 rnas2 pengine[17254]: notice: Calculated
Transition 87: /var/lib/pacemaker/pengine/pe-input-88.bz2
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 2:
stop nfs-grace_stop_0 on kaukasus
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Initiating action 6:
stop nfs-grace_stop_0 on rnas2 (local)
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Operation nfs
-grace_stop_0: ok (node=rnas2, call=86, rc=0, cib-update=218,
confirmed=true)
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: Transition 87
(Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-88.bz2): Complete
        Nov 26 10:45:05 rnas2 crmd[17255]: notice: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
origin=notify_crmd ]

Yours,
Rigi
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3 node NFS-Ganesha Cluster

Reply via email to