[Pacemaker] can someone post their 2-node mysql/drbd cluster config?

2013-05-22 Thread christopher barry
Hi all,

I'm trying to put together a 2 node mysql cluster using drbd as the db
backing store.

I have 4 interfaces in each node:
* 2 nics bonded and x-over cabled between each node for drbd data sync
and heartbeats
* 1 'public' nic, and
* 1 'private' nic
The public and private nics each have a non-vip address that should
always be running, and their routes should always be set.

I have 2 VIPs:
* 1 public VIP
* 1 private VIP

It all *basically* works, but I'm trying to make sure I have everything
configured in the best possible way.

I would really love to see how others are doing this same basic
configuration. And would love to hear lessons learned in the process of
making it all work.

Thanks,
-C


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] question about interface failover

2013-05-18 Thread christopher barry
On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote:
 Le 16/05/2013 21:45, christopher barry a écrit :
  Greetings,
 
  I've setup a new 2-node mysql cluster using
  * drbd 8.3.1.3
  * corosync 1.4.2
  * pacemaker 117
  on Debian Wheezy nodes.
 
  failover seems to be working fine for everything except the ips manually
  configured on the interfaces.
 
 This sentence makes no sense to me.
 The cluster will not failover something that is not clusterized (a 
 'manually' configured IP...)
 
 What are you trying to achieve exactly ?
 Also, could you pastebin the output of crm_mon -Arf1 I find it more 
 easy to read.
 
 
 
  see config here:
  http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
  +g09RcJvhHbgrY1JuN7D+gA4=
 
  If I bring down an interface, when the cluster restarts it, it only
  starts it with the vip - the original ip and route have been removed.
 
 Makes sense if you added the 'original' IP manually...
 You should have non-VIP in /etc/sysconfig/network/ifcfg-*
 But then again, please precise what you are trying to achieve.
 
 
  not sure what to do to make sure the permanent ip and the routes get
  restored. I'm not all that versed on the cluster commandline yet, and
  I'm using LCMC for most of my usage.
 
 

(@howard2.rjmetrics.com)-(14:00 / Sat May 18)
[-][~]# crm_mon -Arf1

Last updated: Sat May 18 14:00:27 2013
Last change: Thu May 16 17:33:07 2013 via crm_attribute on
howard3.rjmetrics.com
Stack: openais
Current DC: howard3.rjmetrics.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ]

Full list of resources:

 Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
 Masters: [ howard2.rjmetrics.com ]
 Slaves: [ howard3.rjmetrics.com ]
 Resource Group: g_mysql
 p_fs_mysql (ocf::heartbeat:Filesystem):Started
howard2.rjmetrics.com
 ClusterPrivateIP   (ocf::heartbeat:IPaddr2):   Started
howard2.rjmetrics.com
 ClusterPublicIP(ocf::heartbeat:IPaddr2):   Started
howard2.rjmetrics.com
 p_mysql(ocf::heartbeat:mysql): Started howard2.rjmetrics.com

Node Attributes:
* Node howard3.rjmetrics.com:
+ master-p_drbd_mysql:0 : 1000  
* Node howard2.rjmetrics.com:
+ master-p_drbd_mysql:1 : 1 

Migration summary:
* Node howard3.rjmetrics.com: 
   p_drbd_mysql:1: migration-threshold=100 fail-count=1
* Node howard2.rjmetrics.com: 
   ClusterPublicIP: migration-threshold=100 fail-count=1

Failed actions:
p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29,
rc=-2, status=Timed Out): unknown exec error
ClusterPublicIP_monitor_3 (node=howard2.rjmetrics.com, call=122,
rc=7, status=complete): not running


howard2 and howard3 are the two clustered servers.

During testing, when I ifdown either eth0 or eth1, the cluster starts
the vip back up, but the other non-vip IPs and routes do not get
started. I'm running Debian, so these are configured
in /etc/network/interfaces. Saying 'manually' configured was misleading
on my part, sorry about that.

eth0 is the public interface, and eth1 is the private interface. eth2
and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled
between the nodes.

The test I was doing was to pull cables from eth0 and eth1, which hung
the cluster. My assumption is that I need to add more configuration
elements to manage the other IPs and also setup some ping hosts that
when unreachable will initiate failover. What would help me I think is
an example config or pointers to how to add these elements.

On another note, the test made the drbd link disconnect, with both disks
now marked as standalone in the lcmc gui. Right-clicking the disks or
the conenction does not allow any action other than view logs, which
say:

May 16 17:33:08 howard3 kernel: [781360.146362] block drbd0: Split-Brain
detected but unresolved, dropping connection!
May 16 17:33:08 howard3 kernel: [781360.146451] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0
May 16 17:33:08 howard3 kernel: [781360.149042] block drbd0: helper
command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
May 16 17:33:08 howard3 kernel: [781360.149051] block drbd0:
conn( WFReportParams - Disconnecting ) 
May 16 17:33:08 howard3 kernel: [781360.149060] block drbd0: error
receiving ReportState, l: 4!
May 16 17:33:08 howard3 kernel: [781360.149154] block drbd0: asender
terminated
May 16 17:33:08 howard3 kernel: [781360.149159] block drbd0: Terminating
drbd0_asender
May 16 17:33:08 howard3 kernel: [781360.149609] block drbd0: Connection
closed
May 16 17:33:08 howard3 kernel: [781360.149619] block drbd0:
conn( Disconnecting - StandAlone ) 
May 16 17:33:08 howard3 kernel: [781360.149811] block drbd0: receiver
terminated
May 16 17:33:08 howard3 kernel: [781360.149815] block drbd0: Terminating
drbd0_receiver

I'm really

[Pacemaker] question about interface failover

2013-05-16 Thread christopher barry
Greetings,

I've setup a new 2-node mysql cluster using
* drbd 8.3.1.3
* corosync 1.4.2
* pacemaker 117
on Debian Wheezy nodes.

failover seems to be working fine for everything except the ips manually
configured on the interfaces.

see config here:
http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
+g09RcJvhHbgrY1JuN7D+gA4=

If I bring down an interface, when the cluster restarts it, it only
starts it with the vip - the original ip and route have been removed.

not sure what to do to make sure the permanent ip and the routes get
restored. I'm not all that versed on the cluster commandline yet, and
I'm using LCMC for most of my usage.

Thanks for your help,
-C


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] mysql/drbd on wheezy active/passive setup issues

2013-03-14 Thread christopher barry
Hi,

I get this when I run ocf-tester:

ocf-tester -n p_mysql /usr/lib/ocf/resource.d/heartbeat/mysql
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/mysql...
/usr/sbin/ocf-tester: 214: /usr/sbin/ocf-tester: xmllint: not found
* rc=127: Your agent produces meta-data which does not conform to
ra-api-1.dtd
* rc=1: Demoting a start resource should not fail
* rc=1: Promote failed
* rc=1: Demote failed
mysql[6876]: ERROR: ERROR 1045 (28000): Access denied for user
'root'@'localhost' (using password: NO)
mysql[6876]: ERROR: Failed to set read-only
Aborting tests

I'm not even sure I'm running that tester correctly.

Q: why does the agent get shipped if the metadata is incorrect? Or is
that just a warning?

Oddly, I can login as root to mysql without issue, both with and without
a password, so that error appears to be a red herring.

I had a look at the resource script:
/usr/lib/ocf/resource.d/heartbeat/mysql

I noticed it had defaults if nothing was set. I changed the socket
location in the default to match debian's default, removed the instance
attributes from my CIB so it would use defaults, and I get the same
outcome.

How can I get actual error output so I can debug this? I'm sure the
issue is trivial, but I just can't see what the issue is! There has to
be a better way to troubleshoot/debug this kind of thing.

The info in the corosync.log is just not very helpful for me.

Thanks for any insight you can give.

Regards,
Christopher


On Wed, 2013-03-13 at 18:13 -0400, christopher barry wrote:
 Thanks, I'll have a go with that.
 
 -C
 
 On Wed, 2013-03-13 at 22:58 +0100, emmanuel segura wrote:
  Hello
  
  Use ocf-tester to debug your resource
  
  2013/3/13 christopher barry cba...@rjmetrics.com
  Greetings all,
  
  I'm almost there, and figure I just have something small out
  of place. Wondering if you can view my setup here:
  
  
  https://zerobin.permutation.net/?d8664af27a7de3be#Bh3fBAupeEw3RhBWOlvDomyPkoOzvD5ajTCk6+a1MW0=
  
  and let me know what you think.
  
  If you need more data to understand what could be wrong,
  please let me know.
  
  Thanks,
  Christopher
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started:
  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  
  
  -- 
  esta es mi vida e me la vivo hasta que dios quiera 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] mysql/drbd on wheezy active/passive setup issues

2013-03-14 Thread christopher barry

OK, so I think it's all good now!

I ran crm_resource --resource foo --cleanup and that apparently jiggled
the handle enough to make it all work.



On Thu, 2013-03-14 at 12:21 -0400, christopher barry wrote:
 Hi,
 
 I get this when I run ocf-tester:
 
 ocf-tester -n p_mysql /usr/lib/ocf/resource.d/heartbeat/mysql
 Beginning tests for /usr/lib/ocf/resource.d/heartbeat/mysql...
 /usr/sbin/ocf-tester: 214: /usr/sbin/ocf-tester: xmllint: not found
 * rc=127: Your agent produces meta-data which does not conform to
 ra-api-1.dtd
 * rc=1: Demoting a start resource should not fail
 * rc=1: Promote failed
 * rc=1: Demote failed
 mysql[6876]: ERROR: ERROR 1045 (28000): Access denied for user
 'root'@'localhost' (using password: NO)
 mysql[6876]: ERROR: Failed to set read-only
 Aborting tests
 
 I'm not even sure I'm running that tester correctly.
 
 Q: why does the agent get shipped if the metadata is incorrect? Or is
 that just a warning?
 
 Oddly, I can login as root to mysql without issue, both with and without
 a password, so that error appears to be a red herring.
 
 I had a look at the resource script:
 /usr/lib/ocf/resource.d/heartbeat/mysql
 
 I noticed it had defaults if nothing was set. I changed the socket
 location in the default to match debian's default, removed the instance
 attributes from my CIB so it would use defaults, and I get the same
 outcome.
 
 How can I get actual error output so I can debug this? I'm sure the
 issue is trivial, but I just can't see what the issue is! There has to
 be a better way to troubleshoot/debug this kind of thing.
 
 The info in the corosync.log is just not very helpful for me.
 
 Thanks for any insight you can give.
 
 Regards,
 Christopher
 
 
 On Wed, 2013-03-13 at 18:13 -0400, christopher barry wrote:
  Thanks, I'll have a go with that.
  
  -C
  
  On Wed, 2013-03-13 at 22:58 +0100, emmanuel segura wrote:
   Hello
   
   Use ocf-tester to debug your resource
   
   2013/3/13 christopher barry cba...@rjmetrics.com
   Greetings all,
   
   I'm almost there, and figure I just have something small out
   of place. Wondering if you can view my setup here:
   
   
   https://zerobin.permutation.net/?d8664af27a7de3be#Bh3fBAupeEw3RhBWOlvDomyPkoOzvD5ajTCk6+a1MW0=
   
   and let me know what you think.
   
   If you need more data to understand what could be wrong,
   please let me know.
   
   Thanks,
   Christopher
   
   
   
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started:
   http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
   
   
   
   
   -- 
   esta es mi vida e me la vivo hasta que dios quiera 
   ___
   Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
   
   Project Home: http://www.clusterlabs.org
   Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] mysql/drbd on wheezy active/passive setup issues

2013-03-13 Thread christopher barry
Thanks, I'll have a go with that.

-C

On Wed, 2013-03-13 at 22:58 +0100, emmanuel segura wrote:
 Hello
 
 Use ocf-tester to debug your resource
 
 2013/3/13 christopher barry cba...@rjmetrics.com
 Greetings all,
 
 I'm almost there, and figure I just have something small out
 of place. Wondering if you can view my setup here:
 
 
 https://zerobin.permutation.net/?d8664af27a7de3be#Bh3fBAupeEw3RhBWOlvDomyPkoOzvD5ajTCk6+a1MW0=
 
 and let me know what you think.
 
 If you need more data to understand what could be wrong,
 please let me know.
 
 Thanks,
 Christopher
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 
 -- 
 esta es mi vida e me la vivo hasta que dios quiera 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org