In your post I didn't see any cluster configuration related to bnx2x only regarding IP address.
On 18/10/16 10:05, Anne Nicolas wrote: > 2016-10-18 9:56 GMT+02:00 Vlad <vo...@vovan.nl>: >> Is something wrong with the network interface? >> >> [34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down >> [34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps >> full duplex, Flow control: ON - receive & transmit >> [34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down >> [34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps >> full duplex, Flow control: ON - receive & transmit > I don't think so. This interface is part of the cluster resource and > up on master only. So it seems this is due to resource restart rather. > >> >> On 14/10/16 17:54, Anne Nicolas wrote: >>> Hi! >>> >>> I'm having trouble with a 2 nodes cluster used for DRBD / Apache / Samba >>> and some other services. >>> >>> Whatever I do, it always goes to the following state: >>> >>> Last updated: Fri Oct 14 17:41:38 2016 >>> Last change: Thu Oct 13 10:42:29 2016 via cibadmin on bzvairsvr >>> Stack: corosync >>> Current DC: bzvairsvr (168430081) - partition with quorum >>> Version: 1.1.8-9.mga5-394e906 >>> 2 Nodes configured, unknown expected votes >>> 13 Resources configured. >>> >>> >>> Online: [ bzvairsvr bzvairsvr2 ] >>> >>> Master/Slave Set: drbdservClone [drbdserv] >>> Slaves: [ bzvairsvr bzvairsvr2 ] >>> Clone Set: fencing [st-ssh] >>> Started: [ bzvairsvr bzvairsvr2 ] >>> >>> When I reboot bzvairsvr2 this one goes primary again. But after a while >>> becomes secondary also. >>> I use a very basic fencing system based on ssh. It's not optimal but >>> enough for the current tests. >>> >>> Here are information about the configuration: >>> >>> node 168430081: bzvairsvr >>> node 168430082: bzvairsvr2 >>> primitive apache apache \ >>> params configfile="/etc/httpd/conf/httpd.conf" \ >>> op start interval=0 timeout=120s \ >>> op stop interval=0 timeout=120s >>> primitive clusterip IPaddr2 \ >>> params ip=192.168.100.1 cidr_netmask=24 nic=eno1 \ >>> meta target-role=Started >>> primitive clusterroute Route \ >>> params destination="0.0.0.0/0" gateway=192.168.100.254 >>> primitive drbdserv ocf:linbit:drbd \ >>> params drbd_resource=server \ >>> op monitor interval=30s role=Slave \ >>> op monitor interval=29s role=Master start-delay=30s >>> primitive fsserv Filesystem \ >>> params device="/dev/drbd/by-res/server" directory="/Server" >>> fstype=ext4 \ >>> op start interval=0 timeout=60s \ >>> op stop interval=0 timeout=60s \ >>> meta target-role=Started >>> primitive libvirt-guests systemd:libvirt-guests >>> primitive libvirtd systemd:libvirtd >>> primitive mysql systemd:mysqld >>> primitive named systemd:named >>> primitive samba systemd:smb >>> primitive st-ssh stonith:external/ssh \ >>> params hostlist="bzvairsvr bzvairsvr2" >>> group iphd clusterip clusterroute \ >>> meta target-role=Started >>> group services libvirtd libvirt-guests apache named mysql samba \ >>> meta target-role=Started >>> ms drbdservClone drbdserv \ >>> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 >>> notify=true target-role=Started >>> clone fencing st-ssh >>> colocation fs_on_drbd inf: fsserv drbdservClone:Master >>> colocation iphd_on_services inf: iphd services >>> colocation services_on_fsserv inf: services fsserv >>> order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start >>> order services_after_fsserv inf: fsserv services >>> property cib-bootstrap-options: \ >>> dc-version=1.1.8-9.mga5-394e906 \ >>> cluster-infrastructure=corosync \ >>> no-quorum-policy=ignore \ >>> stonith-enabled=true \ >>> >>> cluster logs are flooded by : >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_trigger_update: Sending flush op to all hosts for: >>> master-drbdserv (10000) >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update: Sent update master-drbdserv=10000 failed: >>> Transport endpoint is not connected >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update: Sent update -107: master-drbdserv=10000 >>> Oct 14 17:42:28 [3445] bzvairsvr attrd: warning: >>> attrd_cib_callback: Update master-drbdserv=10000 failed: Transport >>> endpoint is not connected >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_trigger_update: Sending flush op to all hosts for: >>> master-drbdserv (10000) >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update: Sent update master-drbdserv=10000 failed: >>> Transport endpoint is not connected >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: notice: >>> attrd_perform_update: Sent update -107: master-drbdserv=10000 >>> Oct 14 17:42:59 [3445] bzvairsvr attrd: warning: >>> attrd_cib_callback: Update master-drbdserv=10000 failed: Transport >>> endpoint is not connected >>> >>> >>> And here is dmesg >>> >>> [34067.547147] block drbd0: peer( Secondary -> Primary ) >>> [34091.023206] block drbd0: peer( Primary -> Secondary ) >>> [34096.616319] drbd server: peer( Secondary -> Unknown ) conn( Connected >>> -> TearDown ) pdsk( UpToDate -> DUnknown ) >>> [34096.616353] drbd server: asender terminated >>> [34096.616358] drbd server: Terminating drbd_a_server >>> [34096.682874] drbd server: Connection closed >>> [34096.682894] drbd server: conn( TearDown -> Unconnected ) >>> [34096.682897] drbd server: receiver terminated >>> [34096.682900] drbd server: Restarting receiver thread >>> [34096.682902] drbd server: receiver (re)started >>> [34096.682915] drbd server: conn( Unconnected -> WFConnection ) >>> [34103.311898] drbd server: Handshake successful: Agreed network >>> protocol version 101 >>> [34103.311903] drbd server: Agreed to support TRIM on protocol level >>> [34103.311997] drbd server: Peer authenticated using 20 bytes HMAC >>> [34103.312046] drbd server: conn( WFConnection -> WFReportParams ) >>> [34103.312062] drbd server: Starting asender thread (from drbd_r_server >>> [4344]) >>> [34103.380311] block drbd0: drbd_sync_handshake: >>> [34103.380318] block drbd0: self >>> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 >>> bits:0 flags:0 >>> [34103.380323] block drbd0: peer >>> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 >>> bits:0 flags:0 >>> [34103.380327] block drbd0: uuid_compare()=0 by rule 40 >>> [34103.380335] block drbd0: peer( Unknown -> Secondary ) conn( >>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) >>> [34114.046443] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down >>> [34123.802580] drbd server: PingAck did not arrive in time. >>> [34123.802617] drbd server: peer( Secondary -> Unknown ) conn( Connected >>> -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) >>> [34123.802773] drbd server: asender terminated >>> [34123.802777] drbd server: Terminating drbd_a_server >>> [34123.932565] drbd server: Connection closed >>> [34123.932585] drbd server: conn( NetworkFailure -> Unconnected ) >>> [34123.932588] drbd server: receiver terminated >>> [34123.932590] drbd server: Restarting receiver thread >>> [34123.932592] drbd server: receiver (re)started >>> [34123.932605] drbd server: conn( Unconnected -> WFConnection ) >>> [34185.719207] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps >>> full duplex, Flow control: ON - receive & transmit >>> [34232.241599] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Down >>> [34268.637861] bnx2x 0000:05:00.0 enp5s0f0: NIC Link is Up, 10000 Mbps >>> full duplex, Flow control: ON - receive & transmit >>> [34318.675122] drbd server: Handshake successful: Agreed network >>> protocol version 101 >>> [34318.675128] drbd server: Agreed to support TRIM on protocol level >>> [34318.675218] drbd server: Peer authenticated using 20 bytes HMAC >>> [34318.675258] drbd server: conn( WFConnection -> WFReportParams ) >>> [34318.675276] drbd server: Starting asender thread (from drbd_r_server >>> [4344]) >>> [34318.738909] block drbd0: drbd_sync_handshake: >>> [34318.738916] block drbd0: self >>> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 >>> bits:0 flags:0 >>> [34318.738921] block drbd0: peer >>> 8B500BD87A5D76D4:0000000000000000:A1860E99AC8107A0:A1850E99AC8107A0 >>> bits:0 flags:0 >>> [34318.738924] block drbd0: uuid_compare()=0 by rule 40 >>> [34318.738933] block drbd0: peer( Unknown -> Secondary ) conn( >>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate ) >>> [34328.812317] block drbd0: peer( Secondary -> Primary ) >>> [37316.065793] usb 3-11: USB disconnect, device number 3 >>> [52246.642265] block drbd0: peer( Primary -> Secondary ) >>> >>> Any help would be appreciated >>> >>> Cheers >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org