[Pacemaker] Installation problems

2010-03-07 Thread Erich Weiler

Hi Y'all,

I'm having some issues getting things running on a stock CentOS 5.4 
install, and I was hoping someone could point me in the right direction...


Through the epel and clusterlabs repos that are referenced in the wiki, 
I installed:


corosync-1.2.0-1.el5
openais-1.1.0-1.el5
pacemaker-1.0.7-4.el5
(and all dependencies, via yum)

and it all installed fine, according to yum.  I installed 
/etc/corosync/corosync.conf as follows:


-
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
   user:   root
   group:  root
}

totem {
   version: 2

   # How long before declaring a token lost (ms)
   token:  5000

   # How many token retransmits before forming a new configuration
   token_retransmits_before_loss_const: 20

   # How long to wait for join messages in the membership protocol (ms)
   join:   1000

   # How long to wait for consensus to be achieved before starting 
a new round of membership configuration (ms)

   consensus:  7500

   # Turn off the virtual synchrony filter
   vsftype:none

   # Number of messages that may be sent by one processor on 
receipt of the token

   max_messages:   20

   # Disable encryption
   secauth:off

   # How many threads to use for encryption/decryption
   threads:0

   # Limit generated nodeids to 31-bits (positive signed integers)
   clear_node_high_bit: yes

   # Optionally assign a fixed node id (integer)
   # nodeid: 1234
   interface {
   ringnumber: 0
bindnetaddr: 10.1.0.255
mcastaddr: 226.94.1.90
mcastport: 4000
   }
}

logging {
   fileline: off
   to_stderr: yes
   to_logfile: yes
   to_syslog: yes
   logfile: /var/log/corosync.log
   debug: off
   timestamp: on
   logger_subsys {
   subsys: AMF
   debug: off
   }
}

amf {
   mode: disabled
}

service {
   # Load the Pacemaker Cluster Resource Manager
   name: pacemaker
   ver:  0
}
-

Then I tried:

# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync):   [  OK  ]

but then when I run crm_mon, it hangs here:

"Attempting connection to the cluster"

and nothing happens.  A 'ps' shows corosync in a weird state:

[r...@server ~]# ps -afe | grep coro
root 12942 1  0 08:20 ?00:00:00 corosync
root 12947 12942  0 08:20 ?00:00:00 [corosync] 
root 12955 12858  0 08:20 pts/000:00:00 grep coro

I also tried starting corosync via '/etc/init.d/openais start' after 
changing the line in the /etc/init.d/openais script:


export 
COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser"


and it seems to start, but crm_mon still can't connect and I still get 
"Attempting connection to the cluster" and corosync is in a defunct 
state.  Has anyone else had this problem?  Are the rpms from 
epel/clusterlabs not jiving with each other in some way perhaps?


Here is a clip from /var/log/corosync.log:

Mar 07 08:20:04 corosync [MAIN  ] Corosync Cluster Engine ('1.2.0'): 
started and ready to provide service.

Mar 07 08:20:04 corosync [MAIN  ] Corosync built-in features: nss rdma
Mar 07 08:20:04 corosync [MAIN  ] Successfully read main configuration 
file '/etc/corosync/corosync.conf'.

Mar 07 08:20:04 corosync [TOTEM ] Initializing transport (UDP/IP).
Mar 07 08:20:04 corosync [TOTEM ] Initializing transmit/receive 
security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 07 08:20:04 corosync [MAIN  ] Compatibility mode set to whitetank. 
Using V1 and V2 of the synchronization engine.
Mar 07 08:20:04 corosync [TOTEM ] The network interface [10.1.1.84] is 
now up.

Mar 07 08:20:04 corosync [pcmk  ] info: process_ais_conf: Reading configure
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_init: Local handle: 
5650605097994944514 for logging
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_next: Processing 
additional logging options...
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Found 'off' for 
option: debug
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'off' for option: to_file
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'daemon' for option: syslog_facility
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_init: Local handle: 
273040974342371 for service
Mar 07 08:20:04 corosync [pcmk  ] info: config_find_next: Processing 
additional service options...
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'pcmk' for option: clustername
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_logd
Mar 07 08:20:04 corosync [pcmk  ] info: get_config_opt: Defaulting to 
'no' for option: use_mgmtd

Mar 07 08:20:04 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Mar 07 08:20:04 corosync [pcmk  ] Logging: Initialized pcmk_startup
M

Re: [Pacemaker] Installation problems

2010-03-07 Thread Erich Weiler
Mar 07 08:20:04 corosync [pcmk  ] ERROR: pcmk_startup: Cluster user 
hacluster does not exist


Heh, I answered my own question...  I was overwriting the passwd file 
and nuking the hacluster user, which was causing problems...  I got it 
sorted.  Thanks for reading, in any case...  ;)


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Pacemaker Digest, Vol 28, Issue 16

2010-03-07 Thread 适兕
HI,

Based this log:
Mar 07 08:20:04 corosync [pcmk  ] ERROR: pcmk_startup: Cluster user
hacluster does not exist

Are you sure user "hacluster" is already to your system?or
/var/run/corosync/crm/ have are right authorizing?







-- 
独立之思想,自由之精神。
   --陈寅恪
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] [PATCH]The change of the output level of the log.(for stonithd)

2010-03-07 Thread renayama19661014
Hi,

We confirmed log of stonithd by the setting that a period of the operation of 
stonith was long.

When STONITH is carried out in the case of the setting that a period of the 
operation of stonith is
long, the following error is output by log.

Jan 29 14:00:53 cgl60 stonithd: [7524]: ERROR: has_this_callid: scenario value 
error.
Jan 29 14:00:53 cgl60 stonithd: [7524]: ERROR: has_this_callid: scenario value 
error.

While operation is left on the practice table of stonithd(executing_queue), the 
log of these errors
seems to appear when STONITH is completed.

We think that this is not an error for movement.

Many operators watch use in ERROR log.
Will you change log in warning from an error so that an operator is not upset?

Best Regards,
Hideo Yamauchi.


problem995.patch
Description: 3462153649-problem995.patch
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Fencing with iDrac 6 Enterprise

2010-03-07 Thread Martin Aspeli

Hi,

We have a two-node cluster of Dell servers. They have an iDRAC 6 
Enterprise each. The cluster is also backed up by a UPS with a diesel 
generator.


I realise on-board devices like the DRAC are not ideal for fencing, but 
it's probably the best we're going to be able to do. However, I've read 
in some manuals that DRAC 6 is troublesome, and that the drac STONITH 
agent in Pacemaker only deals with version 5.


Is this still current? Can anyone point me to any documentation or 
examples of configuring iDRAC 6 Enterprise for STONITH, if indeed it's 
possible?


Thanks!
Martin


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Utilization support for crm_attribute and crm_resource

2010-03-07 Thread Yan Gao
Hi Andrew,
>>>On 3/5/2010 at 10:45 PM, Andrew Beekhof  wrote: 
> On Thu, Mar 4, 2010 at 5:53 PM, Yan Gao  wrote: 
> > Hi Andrew, 
> > You were reading the earliest mail in the thread. Strange... 
>  
> /me blames gmail :-) 
:-)

>  
> > So I'm starting a new thread. 
> > 
> > What I said in the latest mail: 
> > 
> > I added utilization support for crm_attribute and crm_resource. 
> > Attached the patch. Please let me know if you have any comments or 
> > suggestions on that. 
>  
> It looks pretty straight forward. How well has it been tested to 
> verify it doesn't break existing use-cases? 
I've tried all the cases that I could imagine including the existing ones, and 
it worked.

> Have you tried setting a normal parameter with the same name as a 
> utilization field for example? 
Yes, and it worked properly either.

Regards,
  Yan


Yan Gao 
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Failover with multiple services on one node

2010-03-07 Thread Martin Aspeli

Hi,

This question was sort of implied in my thread last week, but I'm going 
to re-ask it properly, to reduce my own confusion if nothing else.


We have two servers, master and slave. In the cluster, we have:

 - A shared IP address (192.168.245.10)
 - HAProxy (active on master, may fail over to slave)
 - Postgres (active on master, may fail over to slave)
 - memcached (active on master, may fail over to slave)
 - DRBD and OCFS2
 - Zope (8 instances on each server, all active)

HAProxy, memcached and Postgres are all standard clustered resources. By 
default, they'll be active on master, but may fail over to slave.


Zope is the exception. Here, we have 8 processes on each machine, all of 
which are "active", i.e. part of the load balancing performed by 
HAProxy. They may go up or down, but HAProxy will handle that without 
too much problem. They're not managed by the cluster (at least that's 
the plan).


Each Zope instance is configured with a database connection string for 
Postgres (e.g. postgres://192.168.245.10:5432) and a similar connection 
string for memcached (e.g. 192.168.245.10:11211).


My question is this: Do we need to group all the clustered resources 
(the IP address, HAProxy, Postgres, memcached) so that if any one of 
them fails, they all fail over to slave?


If we don't do this, how can we manage the connection strings in Zope? 
Since Zope needs a domain name or IP address as part of the connection 
string, it'd be no good if, e.g. memcached failed over to slave, but the 
IP address stayed with master, because Zope would still be looking for 
it on master.


What is the normal way to handle this? Do people have one floating IP 
address per service? Use groups to consider all those services together 
at all times? Use some kind of hosts file trickery? Rely on the 
application to handle e.g. a primary and a fallback connection string 
(which may be tricky).


Martin


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Failover with multiple services on one node

2010-03-07 Thread Matthew Palmer
On Mon, Mar 08, 2010 at 01:34:01PM +0800, Martin Aspeli wrote:
> This question was sort of implied in my thread last week, but I'm going  
> to re-ask it properly, to reduce my own confusion if nothing else.
>
> We have two servers, master and slave. In the cluster, we have:

[bunchteen services, some HA, some not, one service IP]

> Each Zope instance is configured with a database connection string for  
> Postgres (e.g. postgres://192.168.245.10:5432) and a similar connection  
> string for memcached (e.g. 192.168.245.10:11211).
>
> My question is this: Do we need to group all the clustered resources  
> (the IP address, HAProxy, Postgres, memcached) so that if any one of  
> them fails, they all fail over to slave?
>
> If we don't do this, how can we manage the connection strings in Zope?  
> Since Zope needs a domain name or IP address as part of the connection  
> string, it'd be no good if, e.g. memcached failed over to slave, but the  
> IP address stayed with master, because Zope would still be looking for  
> it on master.
>
> What is the normal way to handle this? Do people have one floating IP  
> address per service?

This is how I prefer to do it.  RFC1918 IP addresses are cheap, IPv6 address
quintuply so.  Having everything tied to one address causes some mayhem when
something falls over (failover is quick, but it ain't instantaneous), so
it's far better to give everything it's own address and let them drift about
independently.  Also, it makes load-spreading (run PgSQL on one machine,
memcached on the other, for instance) much easier.

> Use groups to consider all those services together  
> at all times?

I wouldn't recommend it, given that there's no logical reason that they all
have to be together.

> Use some kind of hosts file trickery?

You *know* you're doing something wrong when hacking the hosts file is the
answer.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Failover with multiple services on one node

2010-03-07 Thread Martin Aspeli

Matthew Palmer wrote:

On Mon, Mar 08, 2010 at 01:34:01PM +0800, Martin Aspeli wrote:

This question was sort of implied in my thread last week, but I'm going
to re-ask it properly, to reduce my own confusion if nothing else.

We have two servers, master and slave. In the cluster, we have:


[bunchteen services, some HA, some not, one service IP]


Each Zope instance is configured with a database connection string for
Postgres (e.g. postgres://192.168.245.10:5432) and a similar connection
string for memcached (e.g. 192.168.245.10:11211).

My question is this: Do we need to group all the clustered resources
(the IP address, HAProxy, Postgres, memcached) so that if any one of
them fails, they all fail over to slave?

If we don't do this, how can we manage the connection strings in Zope?
Since Zope needs a domain name or IP address as part of the connection
string, it'd be no good if, e.g. memcached failed over to slave, but the
IP address stayed with master, because Zope would still be looking for
it on master.

What is the normal way to handle this? Do people have one floating IP
address per service?


This is how I prefer to do it.  RFC1918 IP addresses are cheap, IPv6 address
quintuply so.  Having everything tied to one address causes some mayhem when
something falls over (failover is quick, but it ain't instantaneous), so
it's far better to give everything it's own address and let them drift about
independently.  Also, it makes load-spreading (run PgSQL on one machine,
memcached on the other, for instance) much easier.


Pardon my Linux ignorance, but does mean we need one NIC per service as 
well, or can we bind multiple (floating) IPs to each interface?


Martin

--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker