On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote: > Le 16/05/2013 21:45, christopher barry a écrit : > > Greetings, > > > > I've setup a new 2-node mysql cluster using > > * drbd 8.3.1.3 > > * corosync 1.4.2 > > * pacemaker 117 > > on Debian Wheezy nodes. > > > > failover seems to be working fine for everything except the ips manually > > configured on the interfaces. > > This sentence makes no sense to me. > The cluster will not failover something that is not clusterized (a > 'manually' configured IP...) > > What are you trying to achieve exactly ? > Also, could you pastebin the output of "crm_mon -Arf1" I find it more > easy to read. > > > > > > see config here: > > http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g > > +g09RcJvhHbgrY1JuN7D+gA4= > > > > If I bring down an interface, when the cluster restarts it, it only > > starts it with the vip - the original ip and route have been removed. > > Makes sense if you added the 'original' IP manually... > You should have non-VIP in /etc/sysconfig/network/ifcfg-* > But then again, please precise what you are trying to achieve. > > > > > not sure what to do to make sure the permanent ip and the routes get > > restored. I'm not all that versed on the cluster commandline yet, and > > I'm using LCMC for most of my usage. > >
(@howard2.rjmetrics.com)-(14:00 / Sat May 18) [-][~]# crm_mon -Arf1 ============ Last updated: Sat May 18 14:00:27 2013 Last change: Thu May 16 17:33:07 2013 via crm_attribute on howard3.rjmetrics.com Stack: openais Current DC: howard3.rjmetrics.com - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ] Full list of resources: Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ howard2.rjmetrics.com ] Slaves: [ howard3.rjmetrics.com ] Resource Group: g_mysql p_fs_mysql (ocf::heartbeat:Filesystem): Started howard2.rjmetrics.com ClusterPrivateIP (ocf::heartbeat:IPaddr2): Started howard2.rjmetrics.com ClusterPublicIP (ocf::heartbeat:IPaddr2): Started howard2.rjmetrics.com p_mysql (ocf::heartbeat:mysql): Started howard2.rjmetrics.com Node Attributes: * Node howard3.rjmetrics.com: + master-p_drbd_mysql:0 : 1000 * Node howard2.rjmetrics.com: + master-p_drbd_mysql:1 : 10000 Migration summary: * Node howard3.rjmetrics.com: p_drbd_mysql:1: migration-threshold=1000000 fail-count=1 * Node howard2.rjmetrics.com: ClusterPublicIP: migration-threshold=1000000 fail-count=1 Failed actions: p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29, rc=-2, status=Timed Out): unknown exec error ClusterPublicIP_monitor_30000 (node=howard2.rjmetrics.com, call=122, rc=7, status=complete): not running howard2 and howard3 are the two clustered servers. During testing, when I ifdown either eth0 or eth1, the cluster starts the vip back up, but the other non-vip IPs and routes do not get started. I'm running Debian, so these are configured in /etc/network/interfaces. Saying 'manually' configured was misleading on my part, sorry about that. eth0 is the public interface, and eth1 is the private interface. eth2 and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled between the nodes. The test I was doing was to pull cables from eth0 and eth1, which hung the cluster. My assumption is that I need to add more configuration elements to manage the other IPs and also setup some ping hosts that when unreachable will initiate failover. What would help me I think is an example config or pointers to how to add these elements. On another note, the test made the drbd link disconnect, with both disks now marked as standalone in the lcmc gui. Right-clicking the disks or the conenction does not allow any action other than view logs, which say: May 16 17:33:08 howard3 kernel: [781360.146362] block drbd0: Split-Brain detected but unresolved, dropping connection! May 16 17:33:08 howard3 kernel: [781360.146451] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 May 16 17:33:08 howard3 kernel: [781360.149042] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) May 16 17:33:08 howard3 kernel: [781360.149051] block drbd0: conn( WFReportParams -> Disconnecting ) May 16 17:33:08 howard3 kernel: [781360.149060] block drbd0: error receiving ReportState, l: 4! May 16 17:33:08 howard3 kernel: [781360.149154] block drbd0: asender terminated May 16 17:33:08 howard3 kernel: [781360.149159] block drbd0: Terminating drbd0_asender May 16 17:33:08 howard3 kernel: [781360.149609] block drbd0: Connection closed May 16 17:33:08 howard3 kernel: [781360.149619] block drbd0: conn( Disconnecting -> StandAlone ) May 16 17:33:08 howard3 kernel: [781360.149811] block drbd0: receiver terminated May 16 17:33:08 howard3 kernel: [781360.149815] block drbd0: Terminating drbd0_receiver I'm really not sure how to proceed. Please let me know any additional information you may need. Thanks for your time Florian, it's much appreciated. Regards, -C _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org