On Sat, Feb 22, 2014 at 2:28 PM, Fabio M. Di Nitto <[email protected]>wrote:

> On 02/22/2014 11:10 AM, cluster lab wrote:
> > hi,
> >
> > At the middle of cluster activity i received this messages: (cluster
> > is 3 node with SAN ... GFS2 filesystem)
>
> OS? version of the packages? cluster.conf
>

OS: SL (Scientific Linux 6),

Packages:
kernel-2.6.32-71.29.1.el6.x86_64
rgmanager-3.0.12.1-12.el6.x86_64
cman-3.0.12-23.el6.x86_64
corosynclib-1.2.3-21.el6.x86_64
corosync-1.2.3-21.el6.x86_64

Cluster.conf:

<?xml version="1.0"?>
<cluster config_version="224" name="USBackCluster">
        <fence_daemon clean_start="0" post_fail_delay="10"
post_join_delay="3"/>
        <clusternodes>
                <clusternode name="USBack-prox1" nodeid="1" votes="1">
                        <fence>
                                <method name="ilo">
                                        <device name="USBack-prox1-ilo"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="USBack-prox2" nodeid="2" votes="1">
                        <fence>
                                <method name="ilo">
                                        <device name="USBack-prox2-ilo"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="USBack-prox3" nodeid="3" votes="1">
                        <fence>
                                <method name="ilo">
                                        <device name="USBack-prox3-ilo"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
                ... fence config ...
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="VMS-Area" nofailback="0"
ordered="0" restricted="0">
                                <failoverdomainnode name="USBack-prox1"
priority="1"/>
                                <failoverdomainnode name="USBack-prox2"
priority="1"/>
                                <failoverdomainnode name="USBack-prox3"
priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
    ....



>
> >
> > log messages on USBAck-prox2:
> >
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [QUORUM] Members[2]: 2 3
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [TOTEM ] A processor
> > joined or left the membership and a new membership was formed.
> > Feb 21 13:06:41 USBack-prox2 rgmanager[4130]: State change: USBack-prox1
> DOWN
> > Feb 21 13:06:41 USBack-prox2 kernel: dlm: closing connection to node 1
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received
> > left_list: 1
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received
> > left_list: 1
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] chosen downlist
> > from node r(0) ip(--.--.--.22)
> > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [MAIN ] Completed service
> > synchronization, ready to provide service.
> > Feb 21 13:06:41 USBack-prox2 kernel: GFS2:
> > fsid=USBackCluster:VMStorage1.0: jid=1: Trying to acquire journal
> > lock...
> > Feb 21 13:06:41 USBack-prox2 kernel: GFS2:
> > fsid=USBackCluster:VMStorage2.0: jid=1: Trying to acquire journal
> > lock...
> > Feb 21 13:06:51 USBack-prox2 fenced[3957]: fencing node USBack-prox1
> > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 dev 0.0
> > agent fence_ipmilan result: error from agent
> > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 failed
> > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
> > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster node
>
> ^^^ good hint here. something is off.
>

?


>
> Fabio
>
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [TOTEM ] A processor
> > joined or left the membership and a new membership was formed.
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: 1 2 3
> > Feb 21 13:06:55 USBack-prox2 rgmanager[4130]: State change: USBack-prox1
> UP
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
> > left_list: 2
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
> > left_list: 0
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
> > left_list: 0
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] chosen downlist
> > from node r(0) ip(--.--.--.21)
> > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [MAIN ] Completed service
> > synchronization, ready to provide service.
> > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
> > error 12 handle 3a95f87400000000 protocol
> > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
> > error 12 handle 1e7ff52100000001 start
> > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
> > error 12 handle 22221a7000000002 start
> > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
> > error 12 handle 419ac24100000003 start
> > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
> > error 12 handle 3804823e00000004 start
> >
> >
> > -------------------------------------------------
> > Then GFS2 generates error logs (Activities blocked).
> >
> > Logs of cisco switch (Time is UTC):
> >
> > Feb 21 09:37:02.375: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > GigabitEthernet0/11, changed state to down
> > Feb 21 09:37:02.459: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > GigabitEthernet0/4, changed state to down
> > Feb 21 09:37:03.382: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
> > changed state to down
> > Feb 21 09:37:03.541: %LINK-3-UPDOWN: Interface GigabitEthernet0/4,
> > changed state to down
> > Feb 21 09:37:07.283: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
> > changed state to up
> > Feb 21 09:37:07.350: %LINK-3-UPDOWN: Interface GigabitEthernet0/4,
> > changed state to up
> > Feb 21 09:37:08.289: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > GigabitEthernet0/11, changed state to up
> > Feb 21 09:37:09.472: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > GigabitEthernet0/4, changed state to up
> > Feb 21 09:40:20.045: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > GigabitEthernet0/11, changed state to down
> > Feb 21 09:40:21.043: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
> > changed state to down
> > Feb 21 09:40:23.401: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
> > changed state to up
> > _______________________________________________
> > discuss mailing list
> > [email protected]
> > http://lists.corosync.org/mailman/listinfo/discuss
> >
>
> _______________________________________________
> discuss mailing list
> [email protected]
> http://lists.corosync.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to