Re: [Pacemaker] Remote Access not Working

2009-11-20 Thread Colin
Hi,

this is looking better again: A remote cibadmin -Q is now doing the
right thing, however a remote crm_mon is still _not_ working
correctly.

Let's see, now that I should know where to look ... the function
cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to
me:

- The if (len == 0) check will never be true because len is
initialised to 512 and then only grows.
- The assumption that a partial read (wrt. the buffer) signals no more
data is IMO not valid.

With the following patch I can at least get a crm_mon -1rf to do the
right thing:

diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c
Pacemaker-my/lib/common/remote.c
--- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c  2009-11-19
21:12:53.0 +0100
+++ Pacemaker-my/lib/common/remote.c2009-11-20 10:52:36.0 +0100
@@ -220,33 +220,29 @@
 char*
 cib_recv_plaintext(int sock)
 {
-   int last = 0;
char* buf = NULL;
-   int chunk_size = 512;
-   int len = chunk_size;
+   ssize_t buf_size = 512;
+   ssize_t len = 0;

-   crm_malloc0(buf, chunk_size);
+   crm_malloc0(buf, buf_size);

while(1) {
-   int rc = recv(sock, buf+last, chunk_size, 0);
+   ssize_t rc = recv(sock, buf+len, buf_size-len, 0);
if (rc == 0) {
if(len == 0) {
goto bail;
}
return buf;

-   } else if(rc  0  rc  chunk_size) {
-   return buf;
-
-   } else if(rc == chunk_size) {
-   last = len;
-   len += chunk_size;
-   crm_realloc(buf, len);
-   CRM_ASSERT(buf != NULL);
+   } else if(rc  0) {
+ len += rc;
+ if (len == buf_size) {
+   crm_realloc(buf, buf_size += 512);  /* Should do
exponential growth for amortized constant time? */
+   CRM_ASSERT(buf != NULL);
+ }
}
-
if(rc  0  errno != EINTR) {
-   crm_perror(LOG_ERR,Error receiving message: %d, rc);
+ crm_perror(LOG_ERR,Error receiving message: %d, (int)rc);
goto bail;
}
}

And that is as far as I can get with crm_mon, as it doesn't supports
continuous update via remote access?

static int cib_remote_set_connection_dnotify(
cib_t *cib, void (*dnotify)(gpointer user_data))
{
return cib_NOTSUPPORTED;
}


Regards, Colin

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Remote Access not Working

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 11:17 AM, Colin colin@gmail.com wrote:
 Hi,

 this is looking better again: A remote cibadmin -Q is now doing the
 right thing, however a remote crm_mon is still _not_ working
 correctly.

 Let's see, now that I should know where to look ... the function
 cib_recv_plaintext() in lib/common/remote.c looks a bit suspicious to
 me:

 - The if (len == 0) check will never be true because len is
 initialised to 512 and then only grows.
 - The assumption that a partial read (wrt. the buffer) signals no more
 data is IMO not valid.

It is if you didn't get a signal.
But I agree the code needs a cleanup.

I went with: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/5acf9f2e9c9e

 With the following patch I can at least get a crm_mon -1rf to do the
 right thing:

 diff -ur Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c
 Pacemaker-my/lib/common/remote.c
 --- Pacemaker-1-0-f7a8250d23fc/lib/common/remote.c      2009-11-19
 21:12:53.0 +0100
 +++ Pacemaker-my/lib/common/remote.c    2009-11-20 10:52:36.0 +0100
 @@ -220,33 +220,29 @@
  char*
  cib_recv_plaintext(int sock)
  {
 -       int last = 0;
        char* buf = NULL;
 -       int chunk_size = 512;
 -       int len = chunk_size;
 +       ssize_t buf_size = 512;
 +       ssize_t len = 0;

 -       crm_malloc0(buf, chunk_size);
 +       crm_malloc0(buf, buf_size);

        while(1) {
 -               int rc = recv(sock, buf+last, chunk_size, 0);
 +               ssize_t rc = recv(sock, buf+len, buf_size-len, 0);
                if (rc == 0) {
                        if(len == 0) {
                                goto bail;
                        }
                        return buf;

 -               } else if(rc  0  rc  chunk_size) {
 -                       return buf;
 -
 -               } else if(rc == chunk_size) {
 -                       last = len;
 -                       len += chunk_size;
 -                       crm_realloc(buf, len);
 -                       CRM_ASSERT(buf != NULL);
 +               } else if(rc  0) {
 +                 len += rc;
 +                 if (len == buf_size) {
 +                   crm_realloc(buf, buf_size += 512);  /* Should do
 exponential growth for amortized constant time? */
 +                   CRM_ASSERT(buf != NULL);
 +                 }
                }
 -
                if(rc  0  errno != EINTR) {
 -                       crm_perror(LOG_ERR,Error receiving message: %d, rc);
 +                 crm_perror(LOG_ERR,Error receiving message: %d, (int)rc);
                        goto bail;
                }
        }

 And that is as far as I can get with crm_mon, as it doesn't supports
 continuous update via remote access?

 static int cib_remote_set_connection_dnotify(
    cib_t *cib, void (*dnotify)(gpointer user_data))
 {
    return cib_NOTSUPPORTED;
 }

No, thats something else.
Remote notifications should work, I'll test that today.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Resource capacity limit

2009-11-20 Thread Yan Gao
Hi Andrew,

On 11/20/09 04:10, Andrew Beekhof wrote:
 Btw. You're still missing some test cases ;-)
Oh, right:-) I created some. Hope I created them in the correct way.
Sorry for so many attachments...

Thanks,
  Yan
-- 
y...@novell.com
Software Engineer
China Server Team, OPS Engineering

Novell, Inc.
Making IT Work As One™
digraph g {
probe_complete host1 - probe_complete [ style = bold]
probe_complete host1 [ style=bold color=green fontcolor=black  ]
probe_complete host2 - probe_complete [ style = bold]
probe_complete host2 [ style=bold color=green fontcolor=black  ]
probe_complete - rsc1_start_0 host2 [ style = bold]
probe_complete - rsc2_start_0 host1 [ style = bold]
probe_complete [ style=bold color=green fontcolor=orange  ]
rsc1_monitor_0 host1 - probe_complete host1 [ style = bold]
rsc1_monitor_0 host1 [ style=bold color=green fontcolor=black  ]
rsc1_monitor_0 host2 - probe_complete host2 [ style = bold]
rsc1_monitor_0 host2 [ style=bold color=green fontcolor=black  ]
rsc1_start_0 host2 [ style=bold color=green fontcolor=black  ]
rsc2_monitor_0 host1 - probe_complete host1 [ style = bold]
rsc2_monitor_0 host1 [ style=bold color=green fontcolor=black  ]
rsc2_monitor_0 host2 - probe_complete host2 [ style = bold]
rsc2_monitor_0 host2 [ style=bold color=green fontcolor=black  ]
rsc2_start_0 host1 [ style=bold color=green fontcolor=black  ]
}
transition_graph cluster-delay=60s stonith-timeout=60s 
failed-stop-offset=INFINITY failed-start-offset=INFINITY batch-limit=30 
transition_id=0
  synapse id=0
action_set
  rsc_op id=4 operation=monitor operation_key=rsc1_monitor_0 
on_node=host1 on_node_uuid=host1
primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 
crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs/
  /synapse
  synapse id=1
action_set
  rsc_op id=7 operation=monitor operation_key=rsc1_monitor_0 
on_node=host2 on_node_uuid=host2
primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 
crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs/
  /synapse
  synapse id=2
action_set
  rsc_op id=9 operation=start operation_key=rsc1_start_0 
on_node=host2 on_node_uuid=host2
primitive id=rsc1 long-id=rsc1 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_timeout=2 crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs
  trigger
pseudo_event id=2 operation=probe_complete 
operation_key=probe_complete/
  /trigger
/inputs
  /synapse
  synapse id=3
action_set
  rsc_op id=5 operation=monitor operation_key=rsc2_monitor_0 
on_node=host1 on_node_uuid=host1
primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 
crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs/
  /synapse
  synapse id=4
action_set
  rsc_op id=8 operation=monitor operation_key=rsc2_monitor_0 
on_node=host2 on_node_uuid=host2
primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_op_target_rc=7 CRM_meta_timeout=2 
crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs/
  /synapse
  synapse id=5
action_set
  rsc_op id=10 operation=start operation_key=rsc2_start_0 
on_node=host1 on_node_uuid=host1
primitive id=rsc2 long-id=rsc2 class=ocf provider=pacemaker 
type=Dummy/
attributes CRM_meta_timeout=2 crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs
  trigger
pseudo_event id=2 operation=probe_complete 
operation_key=probe_complete/
  /trigger
/inputs
  /synapse
  synapse id=6
action_set
  pseudo_event id=2 operation=probe_complete 
operation_key=probe_complete
attributes crm_feature_set=3.0.1/
  /pseudo_event
/action_set
inputs
  trigger
rsc_op id=3 operation=probe_complete 
operation_key=probe_complete on_node=host1 on_node_uuid=host1/
  /trigger
  trigger
rsc_op id=6 operation=probe_complete 
operation_key=probe_complete on_node=host2 on_node_uuid=host2/
  /trigger
/inputs
  /synapse
  synapse id=7 priority=100
action_set
  rsc_op id=3 operation=probe_complete operation_key=probe_complete 
on_node=host1 on_node_uuid=host1
attributes CRM_meta_op_no_wait=true crm_feature_set=3.0.1/
  /rsc_op
/action_set
inputs
  trigger
rsc_op id=4 operation=monitor operation_key=rsc1_monitor_0 
on_node=host1 on_node_uuid=host1/
  /trigger
  trigger
rsc_op id=5 operation=monitor operation_key=rsc2_monitor_0 
on_node=host1 on_node_uuid=host1/
  /trigger
/inputs
  /synapse
  synapse id=8 priority=100
action_set
  rsc_op id=6 operation=probe_complete operation_key=probe_complete 
on_node=host2 

Re: [Pacemaker] Remote Access not Working

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 1:05 PM, Colin colin@gmail.com wrote:
 PS: I believe this CRM_ASSERT() in lib/common/remote.c can never trigger.

Its designed to detect if somehow we asked for an encrypted message
when Pacemaker wasn;t built with gnutls.
Its a sanity check, its not supposed to go off.


    if(encrypted) {
 #ifdef HAVE_GNUTLS_GNUTLS_H
        reply = cib_recv_tls(session);
 #else
        CRM_ASSERT(encrypted == FALSE);
 #endif
    } else {

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] resource dependency

2009-11-20 Thread Alexandre Biancalana
Hi list,

 I'm building a 4 node cluster where 2 nodes will export drbd devices
via ietd iscsi target (storage nodes) and other 2 nodes will run xen
vm (app nodes) stored in lvm partition accessed via open-iscsi
initiator, using multipath to failover.

 Configuring the cluster resources order I came up with a situation
that I don't find a solution. The xen vm resources depends of iscsi
initiator resource to run, I have two iscsi initiator resources, one
for each storage node, how can I make the vm resources dependent on
any iscsi initiator resources ?

 I think in create a clone of the iscsi initiator resource, use rules
to change the clone options in a way that I can have two clones per
app node with different portal parameter. This way I could make the vm
resouces dependency on this clone. Is this possible ?

 I'm using debian-lenny with the packages described at
http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo

Excuse me for the bad english.

Best Regards,

Alexandre

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Matthew Palmer
On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote:
  I'm building a 4 node cluster where 2 nodes will export drbd devices
 via ietd iscsi target (storage nodes) and other 2 nodes will run xen
 vm (app nodes) stored in lvm partition accessed via open-iscsi
 initiator, using multipath to failover.
 
  Configuring the cluster resources order I came up with a situation
 that I don't find a solution. The xen vm resources depends of iscsi
 initiator resource to run, I have two iscsi initiator resources, one
 for each storage node, how can I make the vm resources dependent on
 any iscsi initiator resources ?

Personally, I think you've got the wrong design.  I'd prefer to loosely
couple the storage and VM clusters, with the storage cluster exporting iSCSI
initiators which the VM cluster then attaches to the VMs as required.  Put
the error handling for the case where the iSCSI initiator isn't available
for a VM into the resource agent for the VM.  To me, this seems like a more
robust solution.  Tying everything up together feels like you're asking for
trouble whenever any failover happens -- everything gets recalculated and the
cluster spends the next several minutes jiggling resources around before
everything settles back down again.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Alexandre Biancalana
On Fri, Nov 20, 2009 at 2:53 PM, Matthew Palmer mpal...@hezmatt.org wrote:
 On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote:
  I'm building a 4 node cluster where 2 nodes will export drbd devices
 via ietd iscsi target (storage nodes) and other 2 nodes will run xen
 vm (app nodes) stored in lvm partition accessed via open-iscsi
 initiator, using multipath to failover.

  Configuring the cluster resources order I came up with a situation
 that I don't find a solution. The xen vm resources depends of iscsi
 initiator resource to run, I have two iscsi initiator resources, one
 for each storage node, how can I make the vm resources dependent on
 any iscsi initiator resources ?

 Personally, I think you've got the wrong design.  I'd prefer to loosely
 couple the storage and VM clusters, with the storage cluster exporting iSCSI
 initiators which the VM cluster then attaches to the VMs as required.  Put
 the error handling for the case where the iSCSI initiator isn't available
 for a VM into the resource agent for the VM.  To me, this seems like a more
 robust solution.  Tying everything up together feels like you're asking for
 trouble whenever any failover happens -- everything gets recalculated and the
 cluster spends the next several minutes jiggling resources around before
 everything settles back down again.

Hi Matt, thank you for the reply.

Ok. But if I go with your suggestion I end with the same question.

Having the 2 node storage cluster exporting the block device via
iSCSI, how can I make the VM resource at the VM cluster depend on
*any* iSCSI target exported ? The standard order configuration just
allow dependency on *one* resource.

The only way I see is configure a ip resource on storage cluster and
use this as portal on iSCSI initiator resource of VM cluster. I don't
want to do this way because I think use multipath, for a quicked
failover.

Alexandre

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?

2009-11-20 Thread hj lee
On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof and...@beekhof.net wrote:

 
  I need a resource to do something on event of node join/left. Is it
 possible
  to receive node join/left event at resource level?

 If its a clone, then you can ask for notifications.
 Otherwise, you need to look in the cib or ask pacemaker for membership
 updates (ie. you'd need to modify your daemon)

 Thank very much, I will try it by using clone. What you mean your daemon?
Pacemaker?

Thanks
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] NFS server shutdown issue

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 7:12 PM, Judd Tracy j...@thetracys.net wrote:
 It is a weird issue.  Redhat kills the nfs daemons with a kill -2 in their
 init script.  But when pacemaker starts the nfs daemons the only way to kill
 the daemons is a kill -9.  I don't have a clue what pacemaker is doing that
 would cause this behavior.

Pacemaker isn't doing anything, it just calls the scripts you tell it.


 As far as the mysql order constraint since the file system is in the group
 mysql won't it be brought after the drbd master is promoted?  And don't the
 members of the group get started in order?  I am a newbie so please correct
 me if I am wrong.

Oh, I may have missed the group part.


 Judd

 On Fri, Nov 20, 2009 at 2:29 AM, Andrew Beekhof and...@beekhof.net wrote:

 On Thu, Nov 19, 2009 at 10:24 PM, Judd Tracy j...@thetracys.net wrote:
  I am trying to setup a drbd/nfs server in pacemaker on RHEL5 and am
  experiencing some wierd issues the the server is started using
  pacemaker.
  Pacemaker starts the daemons just fine, but when it tries to shutdown it
  cannot.  It calls /etc/init.d/nfs to shutdown the daemons, but they do
  not
  respond.

 I guess the scripts have a problem then.
 Or perhaps you need some more ordering constraints so that the
 services talking to nfs are shut down first.

  When I start the daemons mysql using the same script I am able to
  kill them using the init.d script.  The logs show success trying to
  shutdown
  the daemons even though they do not.  I was wondering if anyone else has
  seen this issue?

 btw. Shouldn't
   order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start
 be between mysql and the filesystem?

 
 
  Judd
 
  Configuration:
 
  node filer1
  node filer2
  node mysql1
  node mysql2
  primitive drbd_mysql ocf:linbit:drbd \
      params drbd_resource=mysql \
      op monitor interval=15s
  primitive fs1_drbd ocf:linbit:drbd \
      params drbd_resource=fs1
  primitive fs1_lvm ocf:heartbeat:LVM \
      params volgrpname=data_vg
  primitive fs1_nfs ocf:heartbeat:nfsserver \
      params nfs_init_script=/etc/init.d/nfs
  nfs_notify_cmd=/sbin/rpc.statd nfs_shared_infodir=/var/lib/nfs/
  nfs_ip=fs1
  primitive fs1_nfs_recovery_fs ocf:heartbeat:Filesystem \
      params device=/dev/data_vg/v4recovery
  directory=/var/lib/nfs/v4recovery/ fstype=ext3
  primitive mysql_daemon ocf:heartbeat:mysql \
      params binary=/usr/bin/mysqld_safe
  pid=/var/run/mysqld/mysqld.pid
  primitive mysql_fs ocf:heartbeat:Filesystem \
      params device=/dev/drbd/by-res/mysql
  directory=/var/lib/mysql
  fstype=ext3
  primitive mysql_ip ocf:heartbeat:IPaddr2 \
      params ip=mysql nic=eth0:0
  group fs1 fs1_lvm fs1_nfs_recovery_fs fs1_nfs \
      params target_role=stopped
  group mysql mysql_fs mysql_ip mysql_daemon
  ms ms_drbd_mysql drbd_mysql \
      meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true globally-unique=false
  ms ms_fs1_drbd fs1_drbd \
      meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true globally-unique=false
  location drbd_mysql_on_mysql1 ms_drbd_mysql 10: mysql1
  location drbd_mysql_on_mysql2 ms_drbd_mysql 10: mysql2
  location fs1_drbd_on_filer1 ms_fs1_drbd 10: filer1
  location fs1_drbd_on_filer2 ms_fs1_drbd 10: filer2
  location fs1_on_filer1 fs1 10: filer1
  location fs1_on_filer2 fs1 10: filer2
  location mysql_on_mysql1 mysql 10: mysql1
  location mysql_on_mysql2 mysql 10: mysql2
  colocation fs1_on_fs1_drbd inf: fs1 ms_fs1_drbd:Master
  colocation mysql_on_drbd_mysql inf: mysql ms_drbd_mysql:Master
  order fs1_after_fs1_drbd inf: ms_fs1_drbd:promote fs1:start
  order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start
  property $id=cib-bootstrap-options \
      dc-version=1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 \
      cluster-infrastructure=openais \
      expected-quorum-votes=4 \
      symmetric-cluster=false \
      stonith-enabled=false \
      last-lrm-refresh=1258664854
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?

2009-11-20 Thread Lundgren, Andrew
Will enabling traps do what you want?

From: hj lee [mailto:kerd...@gmail.com]
Sent: Friday, November 20, 2009 11:16 AM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive 
an event of node join or left?


On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof 
and...@beekhof.netmailto:and...@beekhof.net wrote:

 I need a resource to do something on event of node join/left. Is it possible
 to receive node join/left event at resource level?
If its a clone, then you can ask for notifications.
Otherwise, you need to look in the cib or ask pacemaker for membership
updates (ie. you'd need to modify your daemon)
Thank very much, I will try it by using clone. What you mean your daemon? 
Pacemaker?

Thanks
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] {Pacemaker] Is there a way for a resource to receive an event of node join or left?

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 7:15 PM, hj lee kerd...@gmail.com wrote:


 On Fri, Nov 20, 2009 at 12:42 AM, Andrew Beekhof and...@beekhof.net wrote:

 
  I need a resource to do something on event of node join/left. Is it
  possible
  to receive node join/left event at resource level?

 If its a clone, then you can ask for notifications.
 Otherwise, you need to look in the cib or ask pacemaker for membership
 updates (ie. you'd need to modify your daemon)

 Thank very much, I will try it by using clone. What you mean your daemon?
 Pacemaker?

No, _yours_. The one you want to know about node up/down events.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 5:42 PM, Alexandre Biancalana
biancal...@gmail.com wrote:
 Hi list,

  I'm building a 4 node cluster where 2 nodes will export drbd devices
 via ietd iscsi target (storage nodes) and other 2 nodes will run xen
 vm (app nodes) stored in lvm partition accessed via open-iscsi
 initiator, using multipath to failover.

  Configuring the cluster resources order I came up with a situation
 that I don't find a solution. The xen vm resources depends of iscsi
 initiator resource to run, I have two iscsi initiator resources, one
 for each storage node, how can I make the vm resources dependent on
 any iscsi initiator resources ?

The cluster can't express this case yet.
But its on the to-doo list.


  I think in create a clone of the iscsi initiator resource, use rules
 to change the clone options in a way that I can have two clones per
 app node with different portal parameter. This way I could make the vm
 resouces dependency on this clone. Is this possible ?

  I'm using debian-lenny with the packages described at
 http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo

 Excuse me for the bad english.

 Best Regards,

 Alexandre

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Remote Access not Working

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof and...@beekhof.net wrote:
 Remote notifications should work, I'll test that today.

As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
they finally work for clear-text connections.
Testing encrypted ones now.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Alexandre Biancalana
On Fri, Nov 20, 2009 at 4:35 PM, Andrew Beekhof and...@beekhof.net wrote:
 On Fri, Nov 20, 2009 at 5:42 PM, Alexandre Biancalana
 biancal...@gmail.com wrote:
 Hi list,

  I'm building a 4 node cluster where 2 nodes will export drbd devices
 via ietd iscsi target (storage nodes) and other 2 nodes will run xen
 vm (app nodes) stored in lvm partition accessed via open-iscsi
 initiator, using multipath to failover.

  Configuring the cluster resources order I came up with a situation
 that I don't find a solution. The xen vm resources depends of iscsi
 initiator resource to run, I have two iscsi initiator resources, one
 for each storage node, how can I make the vm resources dependent on
 any iscsi initiator resources ?

 The cluster can't express this case yet.
 But its on the to-doo list.

Thank you for the answer Andrew and congratulations for this great
peace of software.

Best Regards,
Alexandre

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] **** SPAM **** Re: pacemaker-1.0.6 + corosync 1.1.2 crashing

2009-11-20 Thread Steven Dake
Nik,

Any chance you have a backtrace of the core files?  That might be
helpful in pinpointing the issue.

To do this run
gdb binaryname corefilename
gdb bt

Regards
-steve

On Thu, 2009-11-19 at 17:50 +0100, Nikola Ciprich wrote:
 Hi Andrew,
 sorry to bother again, do You have some idea what else might be wrong?
 Does it make sense to CC openais or cluster maillist?
 Is there some other debugging You would recommend?
 with best regards
 nik
 
 On Wed, Nov 18, 2009 at 03:26:28PM +0100, Nikola Ciprich wrote:
  I've packaged those myself, all are based on clean sources without any
  additional patches.


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Remote Access not Working

2009-11-20 Thread Andrew Beekhof
On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof and...@beekhof.net wrote:
 On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof and...@beekhof.net wrote:
 Remote notifications should work, I'll test that today.

 As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
 they finally work for clear-text connections.
 Testing encrypted ones now.


And now TLS as of
http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/83f81a1219f1

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] virtual IP

2009-11-20 Thread Shravan Mishra
H i guys,

I'm trying to bring up my vitrual ip on eth1 interface but I'm getting

when I do crm_mon I get
 :invalid parameter error

That part of the config is :

 primitive id=vip class=ocf type=IPaddr2 provider=heartbeat
  operations
  op id=op-vip-1 name=monitor timeout=30s interval=10s/
/operations
instance_attributes id=ia-vip
  nvpair id=vip-addr name=ip value=172.30.0.117/
  nvpair id=vip-intf name=nic value=eth1/
  nvpair id=vip-bcast name=broadcast value=172.30.255.255/
/instance_attributes
  /primitive


My eth1 ip address 172.30.0.145.
eth1's broadcast address 172.30.255.255.

Please let me know if there is anything I have missed.

Sincerely
Shravan

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] NFS server shutdown issue

2009-11-20 Thread Judd Tracy
On Fri, Nov 20, 2009 at 1:29 PM, Andrew Beekhof and...@beekhof.net wrote:

 On Fri, Nov 20, 2009 at 7:12 PM, Judd Tracy j...@thetracys.net wrote:
  It is a weird issue.  Redhat kills the nfs daemons with a kill -2 in
 their
  init script.  But when pacemaker starts the nfs daemons the only way to
 kill
  the daemons is a kill -9.  I don't have a clue what pacemaker is doing
 that
  would cause this behavior.

 Pacemaker isn't doing anything, it just calls the scripts you tell it.


I know, I have stepped through the script and have not found anything out of
the ordinary.  But I still have the problem that when pacemaker starts the
nfs daemons they cannot be killed without using -9.  So something is going
on that I cannot figure out.  I was just hoping that someone else has seen
this issue before and figured out what was going on.



 
  As far as the mysql order constraint since the file system is in the
 group
  mysql won't it be brought after the drbd master is promoted?  And don't
 the
  members of the group get started in order?  I am a newbie so please
 correct
  me if I am wrong.

 Oh, I may have missed the group part.

 
  Judd
 
  On Fri, Nov 20, 2009 at 2:29 AM, Andrew Beekhof and...@beekhof.net
 wrote:
 
  On Thu, Nov 19, 2009 at 10:24 PM, Judd Tracy j...@thetracys.net
 wrote:
   I am trying to setup a drbd/nfs server in pacemaker on RHEL5 and am
   experiencing some wierd issues the the server is started using
   pacemaker.
   Pacemaker starts the daemons just fine, but when it tries to shutdown
 it
   cannot.  It calls /etc/init.d/nfs to shutdown the daemons, but they do
   not
   respond.
 
  I guess the scripts have a problem then.
  Or perhaps you need some more ordering constraints so that the
  services talking to nfs are shut down first.
 
   When I start the daemons mysql using the same script I am able to
   kill them using the init.d script.  The logs show success trying to
   shutdown
   the daemons even though they do not.  I was wondering if anyone else
 has
   seen this issue?
 
  btw. Shouldn't
order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start
  be between mysql and the filesystem?
 
  
  
   Judd
  
   Configuration:
  
   node filer1
   node filer2
   node mysql1
   node mysql2
   primitive drbd_mysql ocf:linbit:drbd \
   params drbd_resource=mysql \
   op monitor interval=15s
   primitive fs1_drbd ocf:linbit:drbd \
   params drbd_resource=fs1
   primitive fs1_lvm ocf:heartbeat:LVM \
   params volgrpname=data_vg
   primitive fs1_nfs ocf:heartbeat:nfsserver \
   params nfs_init_script=/etc/init.d/nfs
   nfs_notify_cmd=/sbin/rpc.statd nfs_shared_infodir=/var/lib/nfs/
   nfs_ip=fs1
   primitive fs1_nfs_recovery_fs ocf:heartbeat:Filesystem \
   params device=/dev/data_vg/v4recovery
   directory=/var/lib/nfs/v4recovery/ fstype=ext3
   primitive mysql_daemon ocf:heartbeat:mysql \
   params binary=/usr/bin/mysqld_safe
   pid=/var/run/mysqld/mysqld.pid
   primitive mysql_fs ocf:heartbeat:Filesystem \
   params device=/dev/drbd/by-res/mysql
   directory=/var/lib/mysql
   fstype=ext3
   primitive mysql_ip ocf:heartbeat:IPaddr2 \
   params ip=mysql nic=eth0:0
   group fs1 fs1_lvm fs1_nfs_recovery_fs fs1_nfs \
   params target_role=stopped
   group mysql mysql_fs mysql_ip mysql_daemon
   ms ms_drbd_mysql drbd_mysql \
   meta master-max=1 master-node-max=1 clone-max=2
   clone-node-max=1 notify=true globally-unique=false
   ms ms_fs1_drbd fs1_drbd \
   meta master-max=1 master-node-max=1 clone-max=2
   clone-node-max=1 notify=true globally-unique=false
   location drbd_mysql_on_mysql1 ms_drbd_mysql 10: mysql1
   location drbd_mysql_on_mysql2 ms_drbd_mysql 10: mysql2
   location fs1_drbd_on_filer1 ms_fs1_drbd 10: filer1
   location fs1_drbd_on_filer2 ms_fs1_drbd 10: filer2
   location fs1_on_filer1 fs1 10: filer1
   location fs1_on_filer2 fs1 10: filer2
   location mysql_on_mysql1 mysql 10: mysql1
   location mysql_on_mysql2 mysql 10: mysql2
   colocation fs1_on_fs1_drbd inf: fs1 ms_fs1_drbd:Master
   colocation mysql_on_drbd_mysql inf: mysql ms_drbd_mysql:Master
   order fs1_after_fs1_drbd inf: ms_fs1_drbd:promote fs1:start
   order mysql_after_drbd_mysql inf: ms_drbd_mysql:promote mysql:start
   property $id=cib-bootstrap-options \
   dc-version=1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 \
   cluster-infrastructure=openais \
   expected-quorum-votes=4 \
   symmetric-cluster=false \
   stonith-enabled=false \
   last-lrm-refresh=1258664854
  
   ___
   Pacemaker mailing list
   Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
  

[Pacemaker] basic configuration question

2009-11-20 Thread Frank DiMeo
Hi All,

 

In a simple two node cluster, I load the enclosed xml file.  I expect that this 
is the simplest syntax to specify two resources, eyes and clock, where eyes 
and clock run on the same node.  

 

Actual behavior is that eyes and clock run on opposite nodes.

 

Is my xml file wrong?

 

Also, shouldn't the sequence=true in the collocation tag control the startup 
sequence of eyes and clock?

 

Thanks in advance,

-Frank

cib validate-with=pacemaker-1.0 crm_feature_set=3.0.1 have-quorum=0 admin_epoch=1 epoch=8 num_updates=1 cib-last-written=Fri Nov 20 07:46:43 2009 dc-uuid=ubuntu_2
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=option-1 name=symmetric-cluster value=true/
nvpair id=option-2 name=no-quorum-policy value=ignore/
nvpair id=option-3 name=stonith-enabled value=false/
nvpair id=option-4 name=cluster-delay value=5s/
nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/
nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/
nvpair id=cib-bootstrap-options-dc-deadtime name=dc-deadtime value=10s/
nvpair id=cib-bootstrap-options-election-timeout name=election-timeout value=10s/
  /cluster_property_set
/crm_config
nodes
  node id=ubuntu_2 uname=ubuntu_2 type=normal/
  node id=ubuntu_1 uname=ubuntu_1 type=normal/
/nodes
resources
  primitive id=eyes class=ocf type=eyestest provider=bbnd
operations
  op id=eyescheck name=monitor interval=10s requires=nothing on-fail=restart/
/operations
  /primitive
  primitive id=clock class=ocf type=clocktest provider=bbnd
operations
  op id=clockcheck name=monitor interval=10s requires=nothing on-fail=restart/
/operations
  /primitive
/resources
constraints
  rsc_colocation id=coloc-1
	resource_set id=coloc-1-set-1 sequential=true
	  resource_ref id=eyes/
	  resource_ref id=clock/
	/resource_set
  /rsc_colocation
/constraints
  /configuration
/cib
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Fwd: virtual IP

2009-11-20 Thread Shravan Mishra
This is my exact output:


Last updated: Fri Nov 20 18:20:51 2009
Stack: openais
Current DC: node1.itactics.com - partition with quorum
Version: 1.0.5-9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601
2 Nodes configured, 2 expected votes
4 Resources configured.


Online: [ node1.itactics.com node2.itactics.com ]

Master/Slave Set: ms-drbd
Masters: [ node1.itactics.com ]
Slaves: [ node2.itactics.com ]
node1.itactics.com-stonith  (stonith:external/safe/ipmi):
Started node2.itactics.com
node2.itactics.com-stonith  (stonith:external/safe/ipmi):
Started node1.itactics.com
Resource Group: svcs_grp
fs0 (ocf::heartbeat:Filesystem):Started node1.itactics.com
safe_svcs   (ocf::itactics:safe):   Started node1.itactics.com
vip (ocf::heartbeat:IPaddr2):   Stopped

Failed actions:
vip_monitor_0 (node=node1.itactics.com, call=7, rc=2,
status=complete): invalid parameter
vip_monitor_0 (node=node2.itactics.com, call=7, rc=2,
status=complete): invalid parameter


The config this time I tried was


primitive id=vip class=ocf type=IPaddr2 provider=heartbeat
  operations
op id=op-vip-1 name=monitor timeout=30s interval=10s/
  /operations
  instance_attributes id=ia-vip
nvpair id=vip-addr name=ip value=172.30.0.17 /
nvpair id=vip-intf name=nic value=eth1/
nvpair id=vip-bcast name=broadcast value=172.30.255.255/
nvpair id=vip-cidr_netmask name=cidr_netmask value=16/
  /instance_attributes
/primitive



Can somebody help me what's the problem here.

Thanks
Shravan





-- Forwarded message --
From: Shravan Mishra shravan.mis...@gmail.com
Date: Fri, Nov 20, 2009 at 3:30 PM
Subject: virtual IP
To: pacemaker@oss.clusterlabs.org


H i guys,

I'm trying to bring up my vitrual ip on eth1 interface but I'm getting

when I do crm_mon I get
 :invalid parameter error

That part of the config is :

    primitive id=vip class=ocf type=IPaddr2 provider=heartbeat
         operations
         op id=op-vip-1 name=monitor timeout=30s interval=10s/
       /operations
       instance_attributes id=ia-vip
         nvpair id=vip-addr name=ip value=172.30.0.117/
         nvpair id=vip-intf name=nic value=eth1/
         nvpair id=vip-bcast name=broadcast value=172.30.255.255/
       /instance_attributes
     /primitive


My eth1 ip address 172.30.0.145.
eth1's broadcast address 172.30.255.255.

Please let me know if there is anything I have missed.

Sincerely
Shravan

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Fwd: virtual IP

2009-11-20 Thread Tim Serong
On 11/21/2009 at 06:25 AM, Shravan Mishra shravan.mis...@gmail.com wrote: 
 This is my exact output: 
  
  
 Last updated: Fri Nov 20 18:20:51 2009 
 Stack: openais 
 Current DC: node1.itactics.com - partition with quorum 
 Version: 1.0.5-9e9faaab40f3f97e3c0d623e4a4c47ed83fa1601 
 2 Nodes configured, 2 expected votes 
 4 Resources configured. 
  
  
 Online: [ node1.itactics.com node2.itactics.com ] 
  
 Master/Slave Set: ms-drbd 
 Masters: [ node1.itactics.com ] 
 Slaves: [ node2.itactics.com ] 
 node1.itactics.com-stonith  (stonith:external/safe/ipmi): 
 Started node2.itactics.com 
 node2.itactics.com-stonith  (stonith:external/safe/ipmi): 
 Started node1.itactics.com 
 Resource Group: svcs_grp 
 fs0 (ocf::heartbeat:Filesystem):Started node1.itactics.com 
 safe_svcs   (ocf::itactics:safe):   Started node1.itactics.com 
 vip (ocf::heartbeat:IPaddr2):   Stopped 
  
 Failed actions: 
 vip_monitor_0 (node=node1.itactics.com, call=7, rc=2, 
 status=complete): invalid parameter 
 vip_monitor_0 (node=node2.itactics.com, call=7, rc=2, 
 status=complete): invalid parameter 
  
  
 The config this time I tried was 
  
  
 primitive id=vip class=ocf type=IPaddr2 provider=heartbeat 
   operations 
 op id=op-vip-1 name=monitor timeout=30s interval=10s/ 
   /operations 
   instance_attributes id=ia-vip 
 nvpair id=vip-addr name=ip value=172.30.0.17 / 
 nvpair id=vip-intf name=nic value=eth1/ 
 nvpair id=vip-bcast name=broadcast value=172.30.255.255/ 
 nvpair id=vip-cidr_netmask name=cidr_netmask value=16/ 
   /instance_attributes 
 /primitive 
  
  
  
 Can somebody help me what's the problem here. 

You're probably suffering from 
https://bugzilla.novell.com/show_bug.cgi?id=553753 which is fixed by 
http://hg.linux-ha.org/agents/rev/5d341d5dc96a

Try explicitly adding the parameter clusterip_hash=sourceip-sourceport to the 
IP address.  This will add something like the following to the 
instance_attributes:

  nvpair id=vip-clusterip_hash name=clusterip_hash 
value=sourceip-sourceport/ 

Regards,

Tim


-- 
Tim Serong tser...@novell.com
Senior Clustering Engineer, Novell Inc.




___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Demo SystemHealth RA

2009-11-20 Thread Michael Schwartzkopff
Hi,

During a seminar we delevoped a Resource agent that measures CPU load and 
writes the according state into the #health-cpu attribute.

The agent measures the percentage of the cpu idle. If it drops below red_limit 
the attribute will be red. Yellow according.

Please review my agent and eventually include in the distribution.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42


HealthCPU
Description: application/shellscript
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker