Re: [Pacemaker] Remote Access not Working
Hi, this is looking better again: A remote cibadmin -Q is now doing the right thing, however a remote crm_mon is still _not_ working correctly. Let's see, now that I should know where to look ... the function cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to me: - The if (len == 0) check will never be true because len is initialised to 512 and then only grows. - The assumption that a partial read (wrt. the buffer) signals no more data is IMO not valid. With the following patch I can at least get a crm_mon -1rf to do the right thing: diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c Pacemaker-my/lib/common/remote.c --- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c 2009-11-19 21:12:53.0 +0100 +++ Pacemaker-my/lib/common/remote.c2009-11-20 10:52:36.0 +0100 @@ -220,33 +220,29 @@ char* cib_recv_plaintext(int sock) { - int last = 0; char* buf = NULL; - int chunk_size = 512; - int len = chunk_size; + ssize_t buf_size = 512; + ssize_t len = 0; - crm_malloc0(buf, chunk_size); + crm_malloc0(buf, buf_size); while(1) { - int rc = recv(sock, buf+last, chunk_size, 0); + ssize_t rc = recv(sock, buf+len, buf_size-len, 0); if (rc == 0) { if(len == 0) { goto bail; } return buf; - } else if(rc 0 rc chunk_size) { - return buf; - - } else if(rc == chunk_size) { - last = len; - len += chunk_size; - crm_realloc(buf, len); - CRM_ASSERT(buf != NULL); + } else if(rc 0) { + len += rc; + if (len == buf_size) { + crm_realloc(buf, buf_size += 512); /* Should do exponential growth for amortized constant time? */ + CRM_ASSERT(buf != NULL); + } } - if(rc 0 errno != EINTR) { - crm_perror(LOG_ERR,Error receiving message: %d, rc); + crm_perror(LOG_ERR,Error receiving message: %d, (int)rc); goto bail; } } And that is as far as I can get with crm_mon, as it doesn't supports continuous update via remote access? static int cib_remote_set_connection_dnotify( cib_t *cib, void (*dnotify)(gpointer user_data)) { return cib_NOTSUPPORTED; } Regards, Colin ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Remote Access not Working
On Fri, Nov 20, 2009 at 11:17 AM, Colin colin@gmail.com wrote: Hi, this is looking better again: A remote cibadmin -Q is now doing the right thing, however a remote crm_mon is still _not_ working correctly. Let's see, now that I should know where to look ... the function cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to me: - The if (len == 0) check will never be true because len is initialised to 512 and then only grows. - The assumption that a partial read (wrt. the buffer) signals no more data is IMO not valid. It is if you didn't get a signal. But I agree the code needs a cleanup. I went with: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/5acf9f2e9c9e With the following patch I can at least get a crm_mon -1rf to do the right thing: diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c Pacemaker-my/lib/common/remote.c --- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c 2009-11-19 21:12:53.0 +0100 +++ Pacemaker-my/lib/common/remote.c 2009-11-20 10:52:36.0 +0100 @@ -220,33 +220,29 @@ char* cib_recv_plaintext(int sock) { - int last = 0; char* buf = NULL; - int chunk_size = 512; - int len = chunk_size; + ssize_t buf_size = 512; + ssize_t len = 0; - crm_malloc0(buf, chunk_size); + crm_malloc0(buf, buf_size); while(1) { - int rc = recv(sock, buf+last, chunk_size, 0); + ssize_t rc = recv(sock, buf+len, buf_size-len, 0); if (rc == 0) { if(len == 0) { goto bail; } return buf; - } else if(rc 0 rc chunk_size) { - return buf; - - } else if(rc == chunk_size) { - last = len; - len += chunk_size; - crm_realloc(buf, len); - CRM_ASSERT(buf != NULL); + } else if(rc 0) { + len += rc; + if (len == buf_size) { + crm_realloc(buf, buf_size += 512); /* Should do exponential growth for amortized constant time? */ + CRM_ASSERT(buf != NULL); + } } - if(rc 0 errno != EINTR) { - crm_perror(LOG_ERR,Error receiving message: %d, rc); + crm_perror(LOG_ERR,Error receiving message: %d, (int)rc); goto bail; } } And that is as far as I can get with crm_mon, as it doesn't supports continuous update via remote access? static int cib_remote_set_connection_dnotify( cib_t *cib, void (*dnotify)(gpointer user_data)) { return cib_NOTSUPPORTED; } No, thats something else. Remote notifications should work, I'll test that today. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Resource capacity limit
Hi Andrew, On 11/20/09 04:10, Andrew Beekhof wrote: Btw. You're still missing some test cases ;-) Oh, right:-) I created some. Hope I created them in the correct way. Sorry for so many attachments... Thanks, Yan -- y...@novell.com Software Engineer China Server Team, OPS Engineering Novell, Inc. Making IT Work As One™ digraph g { probe_complete host1 - probe_complete [ style = bold] probe_complete host1 [ style=bold color=green fontcolor=black ] probe_complete host2 - probe_complete [ style = bold] probe_complete host2 [ style=bold color=green fontcolor=black ] probe_complete - rsc1_start_0 host2 [ style = bold] probe_complete - rsc2_start_0 host1 [ style = bold] probe_complete [ style=bold color=green fontcolor=orange ] rsc1_monitor_0 host1 - probe_complete host1 [ style = bold] rsc1_monitor_0 host1 [ style=bold color=green fontcolor=black ] rsc1_monitor_0 host2 - probe_complete host2 [ style = bold] rsc1_monitor_0 host2 [ style=bold color=green fontcolor=black ] rsc1_start_0 host2 [ style=bold color=green fontcolor=black ] rsc2_monitor_0 host1 - probe_complete host1 [ style = bold] rsc2_monitor_0 host1 [ style=bold color=green fontcolor=black ] rsc2_monitor_0 host2 - probe_complete host2 [ style = bold] rsc2_monitor_0 host2 [ style=bold color=green fontcolor=black ] rsc2_start_0 host1 [ style=bold color=green fontcolor=black ] } transition_graph cluster-delay=60s stonith-timeout=60s failed-stop-offset=INFINITY failed-start-offset=INFINITY batch-limit=30 transition_id=0 synapse id=0 action_set rsc_op id=4 operation=monitor operation_key=rsc1_monitor_0 on_node=host1 on_node_uuid=host1 primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs/ /synapse synapse id=1 action_set rsc_op id=7 operation=monitor operation_key=rsc1_monitor_0 on_node=host2 on_node_uuid=host2 primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs/ /synapse synapse id=2 action_set rsc_op id=9 operation=start operation_key=rsc1_start_0 on_node=host2 on_node_uuid=host2 primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs trigger pseudo_event id=2 operation=probe_complete operation_key=probe_complete/ /trigger /inputs /synapse synapse id=3 action_set rsc_op id=5 operation=monitor operation_key=rsc2_monitor_0 on_node=host1 on_node_uuid=host1 primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs/ /synapse synapse id=4 action_set rsc_op id=8 operation=monitor operation_key=rsc2_monitor_0 on_node=host2 on_node_uuid=host2 primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs/ /synapse synapse id=5 action_set rsc_op id=10 operation=start operation_key=rsc2_start_0 on_node=host1 on_node_uuid=host1 primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker type=Dummy/ attributes CRM_meta_timeout=2 crm_feature_set=3.0.1/ /rsc_op /action_set inputs trigger pseudo_event id=2 operation=probe_complete operation_key=probe_complete/ /trigger /inputs /synapse synapse id=6 action_set pseudo_event id=2 operation=probe_complete operation_key=probe_complete attributes crm_feature_set=3.0.1/ /pseudo_event /action_set inputs trigger rsc_op id=3 operation=probe_complete operation_key=probe_complete on_node=host1 on_node_uuid=host1/ /trigger trigger rsc_op id=6 operation=probe_complete operation_key=probe_complete on_node=host2 on_node_uuid=host2/ /trigger /inputs /synapse synapse id=7 priority=100 action_set rsc_op id=3 operation=probe_complete operation_key=probe_complete on_node=host1 on_node_uuid=host1 attributes CRM_meta_op_no_wait=true crm_feature_set=3.0.1/ /rsc_op /action_set inputs trigger rsc_op id=4 operation=monitor operation_key=rsc1_monitor_0 on_node=host1 on_node_uuid=host1/ /trigger trigger rsc_op id=5 operation=monitor operation_key=rsc2_monitor_0 on_node=host1 on_node_uuid=host1/ /trigger /inputs /synapse synapse id=8 priority=100 action_set rsc_op id=6 operation=probe_complete operation_key=probe_complete on_node=host2
Re: [Pacemaker] Remote Access not Working
On Fri, Nov 20, 2009 at 1:05 PM, Colin colin@gmail.com wrote: PS: I believe this CRM_ASSERT() in lib/common/remote.c can never trigger. Its designed to detect if somehow we asked for an encrypted message when Pacemaker wasn;t built with gnutls. Its a sanity check, its not supposed to go off. if(encrypted) { #ifdef HAVE_GNUTLS_GNUTLS_H reply = cib_recv_tls(session); #else CRM_ASSERT(encrypted == FALSE); #endif } else { ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] resource dependency
Hi list, I'm building a 4 node cluster where 2 nodes will export drbd devices via ietd iscsi target (storage nodes) and other 2 nodes will run xen vm (app nodes) stored in lvm partition accessed via open-iscsi initiator, using multipath to failover. Configuring the cluster resources order I came up with a situation that I don't find a solution. The xen vm resources depends of iscsi initiator resource to run, I have two iscsi initiator resources, one for each storage node, how can I make the vm resources dependent on any iscsi initiator resources ? I think in create a clone of the iscsi initiator resource, use rules to change the clone options in a way that I can have two clones per app node with different portal parameter. This way I could make the vm resouces dependency on this clone. Is this possible ? I'm using debian-lenny with the packages described at http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo Excuse me for the bad english. Best Regards, Alexandre ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] resource dependency
On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote: I'm building a 4 node cluster where 2 nodes will export drbd devices via ietd iscsi target (storage nodes) and other 2 nodes will run xen vm (app nodes) stored in lvm partition accessed via open-iscsi initiator, using multipath to failover. Configuring the cluster resources order I came up with a situation that I don't find a solution. The xen vm resources depends of iscsi initiator resource to run, I have two iscsi initiator resources, one for each storage node, how can I make the vm resources dependent on any iscsi initiator resources ? Personally, I think you've got the wrong design. I'd prefer to loosely couple the storage and VM clusters, with the storage cluster exporting iSCSI initiators which the VM cluster then attaches to the VMs as required. Put the error handling for the case where the iSCSI initiator isn't available for a VM into the resource agent for the VM. To me, this seems like a more robust solution. Tying everything up together feels like you're asking for trouble whenever any failover happens -- everything gets recalculated and the cluster spends the next several minutes jiggling resources around before everything settles back down again. - Matt ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] resource dependency
On Fri, Nov 20, 2009 at 2:53 PM, Matthew Palmer mpal...@hezmatt.org wrote: On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote: I'm building a 4 node cluster where 2 nodes will export drbd devices via ietd iscsi target (storage nodes) and other 2 nodes will run xen vm (app nodes) stored in lvm partition accessed via open-iscsi initiator, using multipath to failover. Configuring the cluster resources order I came up with a situation that I don't find a solution. The xen vm resources depends of iscsi initiator resource to run, I have two iscsi initiator resources, one for each storage node, how can I make the vm resources dependent on any iscsi initiator resources ? Personally, I think you've got the wrong design. I'd prefer to loosely couple the storage and VM clusters, with the storage cluster exporting iSCSI initiators which the VM cluster then attaches to the VMs as required. Put the error handling for the case where the iSCSI initiator isn't available for a VM into the resource agent for the VM. To me, this seems like a more robust solution. Tying everything up together feels like you're asking for trouble whenever any failover happens -- everything gets recalculated and the cluster spends the next several minutes jiggling resources around before everything settles back down again. Hi Matt, thank you for the reply. Ok. But if I go with your suggestion I end with the same question. Having the 2 node storage cluster exporting the block device via iSCSI, how can I make the VM resource at the VM cluster depend on *any* iSCSI target exported ? The standard order configuration just allow dependency on *one* resource. The only way I see is configure a ip resource on storage cluster and use this as portal on iSCSI initiator resource of VM cluster. I don't want to do this way because I think use multipath, for a quicked failover. Alexandre ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?
On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof and...@beekhof.net wrote: I need a resource to do something on event of node join/left. Is it possible to receive node join/left event at resource level? If its a clone, then you can ask for notifications. Otherwise, you need to look in the cib or ask pacemaker for membership updates (ie. you'd need to modify your daemon) Thank very much, I will try it by using clone. What you mean your daemon? Pacemaker? Thanks ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] NFS server shutdown issue
On Fri, Nov 20, 2009 at 7:12 PM, Judd Tracy j...@thetracys.net wrote: It is a weird issue. Redhat kills the nfs daemons with a kill -2 in their init script. But when pacemaker starts the nfs daemons the only way to kill the daemons is a kill -9. I don't have a clue what pacemaker is doing that would cause this behavior. Pacemaker isn't doing anything, it just calls the scripts you tell it. As far as the mysql order constraint since the file system is in the group mysql won't it be brought after the drbd master is promoted? And don't the members of the group get started in order? I am a newbie so please correct me if I am wrong. Oh, I may have missed the group part. Judd On Fri, Nov 20, 2009 at 2:29 AM, Andrew Beekhof and...@beekhof.net wrote: On Thu, Nov 19, 2009 at 10:24 PM, Judd Tracy j...@thetracys.net wrote: I am trying to setup a drbd/nfs server in pacemaker on RHEL5 and am experiencing some wierd issues the the server is started using pacemaker. Pacemaker starts the daemons just fine, but when it tries to shutdown it cannot. It calls /etc/init.d/nfs to shutdown the daemons, but they do not respond. I guess the scripts have a problem then. Or perhaps you need some more ordering constraints so that the services talking to nfs are shut down first. When I start the daemons mysql using the same script I am able to kill them using the init.d script. The logs show success trying to shutdown the daemons even though they do not. I was wondering if anyone else has seen this issue? btw. Shouldn't order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start be between mysql and the filesystem? Judd Configuration: node filer1 node filer2 node mysql1 node mysql2 primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource=mysql \ op monitor interval=15s primitive fs1_drbd ocf:linbit:drbd \ params drbd_resource=fs1 primitive fs1_lvm ocf:heartbeat:LVM \ params volgrpname=data_vg primitive fs1_nfs ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs nfs_notify_cmd=/sbin/rpc.statd nfs_shared_infodir=/var/lib/nfs/ nfs_ip=fs1 primitive fs1_nfs_recovery_fs ocf:heartbeat:Filesystem \ params device=/dev/data_vg/v4recovery directory=/var/lib/nfs/v4recovery/ fstype=ext3 primitive mysql_daemon ocf:heartbeat:mysql \ params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid primitive mysql_fs ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/mysql directory=/var/lib/mysql fstype=ext3 primitive mysql_ip ocf:heartbeat:IPaddr2 \ params ip=mysql nic=eth0:0 group fs1 fs1_lvm fs1_nfs_recovery_fs fs1_nfs \ params target_role=stopped group mysql mysql_fs mysql_ip mysql_daemon ms ms_drbd_mysql drbd_mysql \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false ms ms_fs1_drbd fs1_drbd \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false location drbd_mysql_on_mysql1 ms_drbd_mysql 10: mysql1 location drbd_mysql_on_mysql2 ms_drbd_mysql 10: mysql2 location fs1_drbd_on_filer1 ms_fs1_drbd 10: filer1 location fs1_drbd_on_filer2 ms_fs1_drbd 10: filer2 location fs1_on_filer1 fs1 10: filer1 location fs1_on_filer2 fs1 10: filer2 location mysql_on_mysql1 mysql 10: mysql1 location mysql_on_mysql2 mysql 10: mysql2 colocation fs1_on_fs1_drbd inf: fs1 ms_fs1_drbd:Master colocation mysql_on_drbd_mysql inf: mysql ms_drbd_mysql:Master order fs1_after_fs1_drbd inf: ms_fs1_drbd:promote fs1:start order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start property $id=cib-bootstrap-options \ dc-version=1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 \ cluster-infrastructure=openais \ expected-quorum-votes=4 \ symmetric-cluster=false \ stonith-enabled=false \ last-lrm-refresh=1258664854 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?
Will enabling traps do what you want? From: hj lee [mailto:kerd...@gmail.com] Sent: Friday, November 20, 2009 11:16 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left? On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof and...@beekhof.netmailto:and...@beekhof.net wrote: I need a resource to do something on event of node join/left. Is it possible to receive node join/left event at resource level? If its a clone, then you can ask for notifications. Otherwise, you need to look in the cib or ask pacemaker for membership updates (ie. you'd need to modify your daemon) Thank very much, I will try it by using clone. What you mean your daemon? Pacemaker? Thanks ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?
On Fri, Nov 20, 2009 at 7:15 PM, hj lee kerd...@gmail.com wrote: On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof and...@beekhof.net wrote: I need a resource to do something on event of node join/left. Is it possible to receive node join/left event at resource level? If its a clone, then you can ask for notifications. Otherwise, you need to look in the cib or ask pacemaker for membership updates (ie. you'd need to modify your daemon) Thank very much, I will try it by using clone. What you mean your daemon? Pacemaker? No, _yours_. The one you want to know about node up/down events. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] resource dependency
On Fri, Nov 20, 2009 at 5:42 PM, Alexandre Biancalana biancal...@gmail.com wrote: Hi list, I'm building a 4 node cluster where 2 nodes will export drbd devices via ietd iscsi target (storage nodes) and other 2 nodes will run xen vm (app nodes) stored in lvm partition accessed via open-iscsi initiator, using multipath to failover. Configuring the cluster resources order I came up with a situation that I don't find a solution. The xen vm resources depends of iscsi initiator resource to run, I have two iscsi initiator resources, one for each storage node, how can I make the vm resources dependent on any iscsi initiator resources ? The cluster can't express this case yet. But its on the to-doo list. I think in create a clone of the iscsi initiator resource, use rules to change the clone options in a way that I can have two clones per app node with different portal parameter. This way I could make the vm resouces dependency on this clone. Is this possible ? I'm using debian-lenny with the packages described at http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo Excuse me for the bad english. Best Regards, Alexandre ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Remote Access not Working
On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof and...@beekhof.net wrote: Remote notifications should work, I'll test that today. As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d they finally work for clear-text connections. Testing encrypted ones now. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] resource dependency
On Fri, Nov 20, 2009 at 4:35 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Nov 20, 2009 at 5:42 PM, Alexandre Biancalana biancal...@gmail.com wrote: Hi list, I'm building a 4 node cluster where 2 nodes will export drbd devices via ietd iscsi target (storage nodes) and other 2 nodes will run xen vm (app nodes) stored in lvm partition accessed via open-iscsi initiator, using multipath to failover. Configuring the cluster resources order I came up with a situation that I don't find a solution. The xen vm resources depends of iscsi initiator resource to run, I have two iscsi initiator resources, one for each storage node, how can I make the vm resources dependent on any iscsi initiator resources ? The cluster can't express this case yet. But its on the to-doo list. Thank you for the answer Andrew and congratulations for this great peace of software. Best Regards, Alexandre ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing
Nik, Any chance you have a backtrace of the core files? That might be helpful in pinpointing the issue. To do this run gdb binaryname corefilename gdb bt Regards -steve On Thu, 2009-11-19 at 17:50 +0100, Nikola Ciprich wrote: Hi Andrew, sorry to bother again, do You have some idea what else might be wrong? Does it make sense to CC openais or cluster maillist? Is there some other debugging You would recommend? with best regards nik On Wed, Nov 18, 2009 at 03:26:28PM +0100, Nikola Ciprich wrote: I've packaged those myself, all are based on clean sources without any additional patches. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Remote Access not Working
On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof and...@beekhof.net wrote: Remote notifications should work, I'll test that today. As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d they finally work for clear-text connections. Testing encrypted ones now. And now TLS as of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/83f81a1219f1 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] virtual IP
H i guys, I'm trying to bring up my vitrual ip on eth1 interface but I'm getting when I do crm_mon I get :invalid parameter error That part of the config is : primitive id=vip class=ocf type=IPaddr2 provider=heartbeat operations op id=op-vip-1 name=monitor timeout=30s interval=10s/ /operations instance_attributes id=ia-vip nvpair id=vip-addr name=ip value=172.30.0.117/ nvpair id=vip-intf name=nic value=eth1/ nvpair id=vip-bcast name=broadcast value=172.30.255.255/ /instance_attributes /primitive My eth1 ip address 172.30.0.145. eth1's broadcast address 172.30.255.255. Please let me know if there is anything I have missed. Sincerely Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] NFS server shutdown issue
On Fri, Nov 20, 2009 at 1:29 PM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Nov 20, 2009 at 7:12 PM, Judd Tracy j...@thetracys.net wrote: It is a weird issue. Redhat kills the nfs daemons with a kill -2 in their init script. But when pacemaker starts the nfs daemons the only way to kill the daemons is a kill -9. I don't have a clue what pacemaker is doing that would cause this behavior. Pacemaker isn't doing anything, it just calls the scripts you tell it. I know, I have stepped through the script and have not found anything out of the ordinary. But I still have the problem that when pacemaker starts the nfs daemons they cannot be killed without using -9. So something is going on that I cannot figure out. I was just hoping that someone else has seen this issue before and figured out what was going on. As far as the mysql order constraint since the file system is in the group mysql won't it be brought after the drbd master is promoted? And don't the members of the group get started in order? I am a newbie so please correct me if I am wrong. Oh, I may have missed the group part. Judd On Fri, Nov 20, 2009 at 2:29 AM, Andrew Beekhof and...@beekhof.net wrote: On Thu, Nov 19, 2009 at 10:24 PM, Judd Tracy j...@thetracys.net wrote: I am trying to setup a drbd/nfs server in pacemaker on RHEL5 and am experiencing some wierd issues the the server is started using pacemaker. Pacemaker starts the daemons just fine, but when it tries to shutdown it cannot. It calls /etc/init.d/nfs to shutdown the daemons, but they do not respond. I guess the scripts have a problem then. Or perhaps you need some more ordering constraints so that the services talking to nfs are shut down first. When I start the daemons mysql using the same script I am able to kill them using the init.d script. The logs show success trying to shutdown the daemons even though they do not. I was wondering if anyone else has seen this issue? btw. Shouldn't order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start be between mysql and the filesystem? Judd Configuration: node filer1 node filer2 node mysql1 node mysql2 primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource=mysql \ op monitor interval=15s primitive fs1_drbd ocf:linbit:drbd \ params drbd_resource=fs1 primitive fs1_lvm ocf:heartbeat:LVM \ params volgrpname=data_vg primitive fs1_nfs ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs nfs_notify_cmd=/sbin/rpc.statd nfs_shared_infodir=/var/lib/nfs/ nfs_ip=fs1 primitive fs1_nfs_recovery_fs ocf:heartbeat:Filesystem \ params device=/dev/data_vg/v4recovery directory=/var/lib/nfs/v4recovery/ fstype=ext3 primitive mysql_daemon ocf:heartbeat:mysql \ params binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid primitive mysql_fs ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/mysql directory=/var/lib/mysql fstype=ext3 primitive mysql_ip ocf:heartbeat:IPaddr2 \ params ip=mysql nic=eth0:0 group fs1 fs1_lvm fs1_nfs_recovery_fs fs1_nfs \ params target_role=stopped group mysql mysql_fs mysql_ip mysql_daemon ms ms_drbd_mysql drbd_mysql \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false ms ms_fs1_drbd fs1_drbd \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false location drbd_mysql_on_mysql1 ms_drbd_mysql 10: mysql1 location drbd_mysql_on_mysql2 ms_drbd_mysql 10: mysql2 location fs1_drbd_on_filer1 ms_fs1_drbd 10: filer1 location fs1_drbd_on_filer2 ms_fs1_drbd 10: filer2 location fs1_on_filer1 fs1 10: filer1 location fs1_on_filer2 fs1 10: filer2 location mysql_on_mysql1 mysql 10: mysql1 location mysql_on_mysql2 mysql 10: mysql2 colocation fs1_on_fs1_drbd inf: fs1 ms_fs1_drbd:Master colocation mysql_on_drbd_mysql inf: mysql ms_drbd_mysql:Master order fs1_after_fs1_drbd inf: ms_fs1_drbd:promote fs1:start order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start property $id=cib-bootstrap-options \ dc-version=1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 \ cluster-infrastructure=openais \ expected-quorum-votes=4 \ symmetric-cluster=false \ stonith-enabled=false \ last-lrm-refresh=1258664854 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] basic configuration question
Hi All, In a simple two node cluster, I load the enclosed xml file. I expect that this is the simplest syntax to specify two resources, eyes and clock, where eyes and clock run on the same node. Actual behavior is that eyes and clock run on opposite nodes. Is my xml file wrong? Also, shouldn't the sequence=true in the collocation tag control the startup sequence of eyes and clock? Thanks in advance, -Frank cib validate-with=pacemaker-1.0 crm_feature_set=3.0.1 have-quorum=0 admin_epoch=1 epoch=8 num_updates=1 cib-last-written=Fri Nov 20 07:46:43 2009 dc-uuid=ubuntu_2 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=option-1 name=symmetric-cluster value=true/ nvpair id=option-2 name=no-quorum-policy value=ignore/ nvpair id=option-3 name=stonith-enabled value=false/ nvpair id=option-4 name=cluster-delay value=5s/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ nvpair id=cib-bootstrap-options-dc-deadtime name=dc-deadtime value=10s/ nvpair id=cib-bootstrap-options-election-timeout name=election-timeout value=10s/ /cluster_property_set /crm_config nodes node id=ubuntu_2 uname=ubuntu_2 type=normal/ node id=ubuntu_1 uname=ubuntu_1 type=normal/ /nodes resources primitive id=eyes class=ocf type=eyestest provider=bbnd operations op id=eyescheck name=monitor interval=10s requires=nothing on-fail=restart/ /operations /primitive primitive id=clock class=ocf type=clocktest provider=bbnd operations op id=clockcheck name=monitor interval=10s requires=nothing on-fail=restart/ /operations /primitive /resources constraints rsc_colocation id=coloc-1 resource_set id=coloc-1-set-1 sequential=true resource_ref id=eyes/ resource_ref id=clock/ /resource_set /rsc_colocation /constraints /configuration /cib ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Fwd: virtual IP
This is my exact output: Last updated: Fri Nov 20 18:20:51 2009 Stack: openais Current DC: node1.itactics.com - partition with quorum Version: 1.0.5-9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ node1.itactics.com node2.itactics.com ] Master/Slave Set: ms-drbd Masters: [ node1.itactics.com ] Slaves: [ node2.itactics.com ] node1.itactics.com-stonith (stonith:external/safe/ipmi): Started node2.itactics.com node2.itactics.com-stonith (stonith:external/safe/ipmi): Started node1.itactics.com Resource Group: svcs_grp fs0 (ocf::heartbeat:Filesystem):Started node1.itactics.com safe_svcs (ocf::itactics:safe): Started node1.itactics.com vip (ocf::heartbeat:IPaddr2): Stopped Failed actions: vip_monitor_0 (node=node1.itactics.com, call=7, rc=2, status=complete): invalid parameter vip_monitor_0 (node=node2.itactics.com, call=7, rc=2, status=complete): invalid parameter The config this time I tried was primitive id=vip class=ocf type=IPaddr2 provider=heartbeat operations op id=op-vip-1 name=monitor timeout=30s interval=10s/ /operations instance_attributes id=ia-vip nvpair id=vip-addr name=ip value=172.30.0.17 / nvpair id=vip-intf name=nic value=eth1/ nvpair id=vip-bcast name=broadcast value=172.30.255.255/ nvpair id=vip-cidr_netmask name=cidr_netmask value=16/ /instance_attributes /primitive Can somebody help me what's the problem here. Thanks Shravan -- Forwarded message -- From: Shravan Mishra shravan.mis...@gmail.com Date: Fri, Nov 20, 2009 at 3:30 PM Subject: virtual IP To: pacemaker@oss.clusterlabs.org H i guys, I'm trying to bring up my vitrual ip on eth1 interface but I'm getting when I do crm_mon I get :invalid parameter error That part of the config is : primitive id=vip class=ocf type=IPaddr2 provider=heartbeat operations op id=op-vip-1 name=monitor timeout=30s interval=10s/ /operations instance_attributes id=ia-vip nvpair id=vip-addr name=ip value=172.30.0.117/ nvpair id=vip-intf name=nic value=eth1/ nvpair id=vip-bcast name=broadcast value=172.30.255.255/ /instance_attributes /primitive My eth1 ip address 172.30.0.145. eth1's broadcast address 172.30.255.255. Please let me know if there is anything I have missed. Sincerely Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Fwd: virtual IP
On 11/21/2009 at 06:25 AM, Shravan Mishra shravan.mis...@gmail.com wrote: This is my exact output: Last updated: Fri Nov 20 18:20:51 2009 Stack: openais Current DC: node1.itactics.com - partition with quorum Version: 1.0.5-9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ node1.itactics.com node2.itactics.com ] Master/Slave Set: ms-drbd Masters: [ node1.itactics.com ] Slaves: [ node2.itactics.com ] node1.itactics.com-stonith (stonith:external/safe/ipmi): Started node2.itactics.com node2.itactics.com-stonith (stonith:external/safe/ipmi): Started node1.itactics.com Resource Group: svcs_grp fs0 (ocf::heartbeat:Filesystem):Started node1.itactics.com safe_svcs (ocf::itactics:safe): Started node1.itactics.com vip (ocf::heartbeat:IPaddr2): Stopped Failed actions: vip_monitor_0 (node=node1.itactics.com, call=7, rc=2, status=complete): invalid parameter vip_monitor_0 (node=node2.itactics.com, call=7, rc=2, status=complete): invalid parameter The config this time I tried was primitive id=vip class=ocf type=IPaddr2 provider=heartbeat operations op id=op-vip-1 name=monitor timeout=30s interval=10s/ /operations instance_attributes id=ia-vip nvpair id=vip-addr name=ip value=172.30.0.17 / nvpair id=vip-intf name=nic value=eth1/ nvpair id=vip-bcast name=broadcast value=172.30.255.255/ nvpair id=vip-cidr_netmask name=cidr_netmask value=16/ /instance_attributes /primitive Can somebody help me what's the problem here. You're probably suffering from https://bugzilla.novell.com/show_bug.cgi?id=553753 which is fixed by http://hg.linux-ha.org/agents/rev/5d341d5dc96a Try explicitly adding the parameter clusterip_hash=sourceip-sourceport to the IP address. This will add something like the following to the instance_attributes: nvpair id=vip-clusterip_hash name=clusterip_hash value=sourceip-sourceport/ Regards, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, Novell Inc. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Demo SystemHealth RA
Hi, During a seminar we delevoped a Resource agent that measures CPU load and writes the according state into the #health-cpu attribute. The agent measures the percentage of the cpu idle. If it drops below red_limit the attribute will be red. Yellow according. Please review my agent and eventually include in the distribution. Greetings, -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 HealthCPU Description: application/shellscript ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker