Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Andrew Beekhof
On Wed, Aug 4, 2010 at 3:05 AM, Michael Fung  wrote:
> Thanks to all who helped give hints.
>
>
> I switched to Debian Squeeze.
>
> I don't want to spend time to study RHCS of RHEL 5 if Pacemaker/Corosync
> is the future. Life is short.

F-13 or the RHEL-6 betas also have all the bits you need (including pacemaker).

>
>
> Rgds,
> Michael
>
>
> On 2010/8/3 下午 03:29, Brett Delle Grazie wrote:
>> Hi Mike,
>>
>> In RHEL 5.x and CentOS 5.x you must use CMAN and the RedHat Cluster
>> Suite (RHCS) if you are going to used clustered LVM.
>>
>> This is because clvmd currently uses the CMAN interface to the cluster.
>> In later versions, RedHat is moving towards Corosync / OpenAIS /
>> (Pacemaker | RgManager) solution but this will take a long time.
>>
>> Christine Caufield (from RedHat) wrote an excellent document describing
>> the change process here:
>> http://people.redhat.com/ccaulfie/docs/Whither%20cman.pdf
>>
>> I guess your options are:
>> (a) Switch to RHCS based cluster, at least for those nodes with
>> clustered LVM requirements (and GFS, GFS2 etc)
>> (b) Switch to RHEL 6.x Beta
>> (c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it
>> should be possible, maybe.
>> (d) Try compiling current source of lvm2-cluster packages from Fedora or
>> Rawhide as they can use current versions of OpenAIS. The RHEL 5.x
>> versions of lvm2-cluster are fixed at using CMAN interface, not OpenAIS
>> (e) Switch to Debian based distro - Lenny is production ready and has
>> CLVM / Pacemaker / Corosync in backports ;)
>> (f) Something someone else on the list with more experience comes up
>> with :)
>>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [Problem]A compilation error of Pacemaker1.1.

2010-08-03 Thread renayama19661014
Hi,

I compiled Pacemaker1.1.

But, the next error happened.

[r...@srv01 Pacemaker-1-1-5ce5b34cf3ab]# export PREFIX=/usr;export
LCRSODIR=$PREFIX/libexec/lcrso;export CLUSTER_USER=hacluster;export 
CLUSTER_GROUP=haclient 

[r...@srv01 Pacemaker-1-1-5ce5b34cf3ab]# ./autogen.sh && ./configure 
--prefix=$PREFIX
--localstatedir=/var --with-lcrso-dir=$LCRSODIR

[r...@srv01 Pacemaker-1-1-5ce5b34cf3ab]# make install

s -Werror -fPIC -MT utils.lo -MD -MP -MF .deps/utils.Tpo -c utils.c  -fPIC 
-DPIC -o .libs/utils.o
cc1: warnings being treated as errors
utils.c:65: warning: 'common' defined but not used
gmake[2]: *** [utils.lo] Error 1
gmake[2]: Leaving directory `/opt/Pacemaker-1-1-5ce5b34cf3ab/lib/common'
gmake[1]: *** [install-recursive] Error 1
gmake[1]: Leaving directory `/opt/Pacemaker-1-1-5ce5b34cf3ab/lib'
make: *** [install-recursive] Error 1



My environment is as follows.

 * RHEL5.5(x64)
 * corosync 1.2.7
 * Pacemaker-1-1-5ce5b34cf3ab.tar
 * Cluster-Resource-Agents-bfcc4e050a07.tar
 * Reusable-Cluster-Components-8286b46c91e3.tar


Is there a problem in my compilation procedure?

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [PATCH]A redundant if sentence.

2010-08-03 Thread renayama19661014
Hi, 

It is the patch of a redundant if sentence for pengine.

void
unpack_operation(
action_t *action, xmlNode *xml_obj, pe_working_set_t* data_set)
{
(snip)
if(safe_str_eq(class, "stonith")) {
action->needs = rsc_req_nothing;
value = "nothing (fencing op)";

} else if(value == NULL && safe_str_neq(action->task, 
CRMD_ACTION_START)) {
(snip)  
} else if(data_set->no_quorum_policy == no_quorum_ignore
|| safe_str_eq(class, "stonith")) { *** ---> A redundant if 
sentence. 
action->needs = rsc_req_nothing;
value = "nothing (default)";

} else if(data_set->no_quorum_policy == no_quorum_freeze
  && is_set(data_set->flags, pe_flag_stonith_enabled)) {
(snip)

Best Regards,
Hideo Yamauchi.


redundant_if_sentence.patch
Description: 205491641-redundant_if_sentence.patch
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]The changing of the log level of pengine process.

2010-08-03 Thread renayama19661014
Hi Andrew,

Thank you for comment.

It is difficult for me to illustrate by English.

This patch is a considerably special demand of our user.

Even if an STONITH resource duplicated, node and STONITH done STONITH of are 
that the log of the node
to do wants to output it by warning.


Last updated: Thu Jul 29 09:58:36 2010
Stack: openais
Current DC: srv01 - partition WITHOUT quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
2 Resources configured.


Online: [ srv01 srv02 ]

 Resource Group: group-1
 prmDummy1  (ocf::heartbeat:Dummy): Started srv01
 stonith0   (stonith:external/ssh): Started srv01

For example, STONITH moves to srv02 when srv01 is done STONITH of in the case 
of resource placement
such as the above.

Our user wanted to change the log of this time.

However, I understand that it is a very special demand.

I wish that I get possible to appoint such a special demand with a start option 
of Pacemaker if
possible.

Best Regards,
Hideo Yamauchi.


--- Andrew Beekhof  wrote:

> 2010/7/29  :
> > Hi All,
> >
> > Our user showed a demand in a level of log output after handling of pengine.
> >
> > When STONITH is carried out, pengine wants to output log at a warning level 
> > if a repeating
> resource is
> > only an STONITH resource.
> >
> > Because plural STONITH may be started when STONITH is carried out.
> > However, it is because the importance of the problem is different from the 
> > plural start of the
> normal
> > resource.
> 
> I'm having trouble understanding the purpose of this patch...
> 
> If the only resource on a node to be fenced is a stonith resource, and
> that resource is also running on another node, then unset
> "was_processing_error"... is that right?
> 
> Why do that?
> 
> >
> > I wrote the patch which operated a was_processing_error flag to answer the 
> > demand of our user.
> >
> > This patch may be considerably special.
> > I do not think that all users need this patch.
> >
> > Please talk to me an opinion for this patch.
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> >
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Michael Fung
Thanks to all who helped give hints.


I switched to Debian Squeeze.

I don't want to spend time to study RHCS of RHEL 5 if Pacemaker/Corosync
is the future. Life is short.


Rgds,
Michael


On 2010/8/3 下午 03:29, Brett Delle Grazie wrote:
> Hi Mike,
> 
> In RHEL 5.x and CentOS 5.x you must use CMAN and the RedHat Cluster
> Suite (RHCS) if you are going to used clustered LVM.
> 
> This is because clvmd currently uses the CMAN interface to the cluster.
> In later versions, RedHat is moving towards Corosync / OpenAIS /
> (Pacemaker | RgManager) solution but this will take a long time.
> 
> Christine Caufield (from RedHat) wrote an excellent document describing
> the change process here:
> http://people.redhat.com/ccaulfie/docs/Whither%20cman.pdf
> 
> I guess your options are:
> (a) Switch to RHCS based cluster, at least for those nodes with
> clustered LVM requirements (and GFS, GFS2 etc)
> (b) Switch to RHEL 6.x Beta
> (c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it
> should be possible, maybe.
> (d) Try compiling current source of lvm2-cluster packages from Fedora or
> Rawhide as they can use current versions of OpenAIS. The RHEL 5.x
> versions of lvm2-cluster are fixed at using CMAN interface, not OpenAIS
> (e) Switch to Debian based distro - Lenny is production ready and has
> CLVM / Pacemaker / Corosync in backports ;)
> (f) Something someone else on the list with more experience comes up
> with :)
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-03 Thread renayama19661014
Hi Andrew,

Thank you for comment.

> No need to wait, the current tip of Pacemaker 1.1 is perfectly stable
> (and included for RHEL6.0).
> Almost all the testing has been done for 1.1.3, I've just been busy
> helping out with some other projects at Red Hat and haven't had time
> to do the actual release.
> 
> To make use of CPG-based communication, remove the "service" section
> for pacemaker from corosync.conf and instead run:
>service pacemaker start
> after starting corosync.
> 
> Once the 1.1.3 packages are out, this will be the official advice for
> anyone experiencing startup/shutdown issues when using Pacemaker with
> Corosync.
> Calling fork() in a multi-threaded environment (corosync) is just far
> too problematic.

All right.

My problem of all may be broken off with 1.1.3.
I try Pacemaker1.1.3 from now on.
 
What time does the release of Pacemaker1.1 seem to become it?

Best Regards,
Hideo Yamauchi.



--- Andrew Beekhof  wrote:

> On Mon, Aug 2, 2010 at 3:17 AM,   wrote:
> > Hi,
> >
> > I confirmed movement when corosync1.2.7 combined Pacemaker.
> >
> > The combination is as follows.
> >
> > �* corosync 1.2.7
> > �* Pacemaker-1-0-74392a28b7f3.tar
> > �* Cluster-Resource-Agents-bfcc4e050a07.tar
> > �* Reusable-Cluster-Components-8286b46c91e3.tar
> >
> >
> > I confirmed the next movement in two nodes of a virtual machine(RHEL5.5 
> > x84) and the real
> > machine(RHEL5.5 x64).
> > The resource arranged nothing.
> >
> > 1) When it started only in corosync, a node do not be hung up.(and when 
> > stopped)
> > 2) When I put Pacemaker and corosync together and started, a node do not be 
> > hung up.(and when
> stopped)
> >
> > Only 20 number of times carried out the confirmation in each 
> > environment.(x86 and x64)
> >
> > Unfortunately the following problem occurred.
> > �* The problem did not happen by the start only for corosync this 
> > time.(and when stopped)
> >
> > Problem 1) By the start of the virtual machine, a virtual machine is 
> > sometimes hungup.
> > � � � � � Like a former problem, it is 
> > used nearly 100% for the
CPU.
> >
> > Problem 2) There was the case that cannot constitute a cluster after start.
> >
> > Problem 3) There is a case to fail in the start of a cib process and the 
> > attrd process.
> >
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: Receiving 
> > message body failed: (2)
> Library
> > error: Resource temporarily unavailable (11)
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: AIS connection 
> > failed
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: Receiving message 
> > body failed: (2)
> Library
> > error: Resource temporarily unavailable (11)
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: AIS connection 
> > failed
> > Jul 30 14:25:46 x3650g attrd: [26258]: CRIT: attrd_ais_destroy: Lost 
> > connection to OpenAIS
> service!
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: cib_ais_destroy: AIS connection 
> > terminated
> > Jul 30 14:25:46 x3650g attrd: [26258]: info: main: Exiting...
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: attrd_cib_connection_destroy: 
> > Connection to the
> CIB
> > terminated...
> > Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch: Receiving 
> > message body failed:
> (2)
> > Library error: Success (0)
> > Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch
> >
> > Can this problem be settled in Pacemaker1.0 and corosync1.2.7?
> >
> > I know that a revision to replace communication with CPG in structure of 
> > new Pacemaker begins.
> > When we combine corosync and use it, should we wait for a revision of CPG 
> > to be over?
> > (Should we wait for Pacemaker1.1 system?)
> 
> No need to wait, the current tip of Pacemaker 1.1 is perfectly stable
> (and included for RHEL6.0).
> Almost all the testing has been done for 1.1.3, I've just been busy
> helping out with some other projects at Red Hat and haven't had time
> to do the actual release.
> 
> To make use of CPG-based communication, remove the "service" section
> for pacemaker from corosync.conf and instead run:
>service pacemaker start
> after starting corosync.
> 
> Once the 1.1.3 packages are out, this will be the official advice for
> anyone experiencing startup/shutdown issues when using Pacemaker with
> Corosync.
> Calling fork() in a multi-threaded environment (corosync) is just far
> too problematic.
> 
> >
> > Because log is big, I contact it again after registering this problem with 
> > bugzilla.
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> __

Re: [Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-03 Thread renayama19661014
Hi Vladislav,

Thank you for comment.

> This is probably connected to
> http://marc.info/?l=openais&m=127977785007234&w=2
> 
> Steven promised to look at that issue after his vacation.

I wait for a revision of Steven. 

Meanwhile, I use Pacemaker1.1 to recommend of Andrew.

Best Regards,
Hideo Yamauchi.


--- Vladislav Bogdanov  wrote:

> 02.08.2010 04:17, renayama19661...@ybb.ne.jp wrote:
> 
> ...
> 
> > Problem 3) There is a case to fail in the start of a cib process and the 
> > attrd process.
> > 
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: Receiving 
> > message body failed: (2)
> Library
> > error: Resource temporarily unavailable (11)
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: AIS connection 
> > failed
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: Receiving message 
> > body failed: (2)
> Library
> > error: Resource temporarily unavailable (11)
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: ais_dispatch: AIS connection 
> > failed
> > Jul 30 14:25:46 x3650g attrd: [26258]: CRIT: attrd_ais_destroy: Lost 
> > connection to OpenAIS
> service!
> > Jul 30 14:25:46 x3650g cib: [26256]: ERROR: cib_ais_destroy: AIS connection 
> > terminated
> > Jul 30 14:25:46 x3650g attrd: [26258]: info: main: Exiting...
> > Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: attrd_cib_connection_destroy: 
> > Connection to the
> CIB
> > terminated...
> > Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch: Receiving 
> > message body failed:
> (2)
> > Library error: Success (0)
> > Jul 30 14:25:46 x3650g stonithd: [26255]: ERROR: ais_dispatch
> > 
> > Can this problem be settled in Pacemaker1.0 and corosync1.2.7?
> > 
> > I know that a revision to replace communication with CPG in structure of 
> > new Pacemaker begins.
> > When we combine corosync and use it, should we wait for a revision of CPG 
> > to be over?
> > (Should we wait for Pacemaker1.1 system?)
> > 
> > Because log is big, I contact it again after registering this problem with 
> > bugzilla.
> >
> 
> 
> This is probably connected to
> http://marc.info/?l=openais&m=127977785007234&w=2
> 
> Steven promised to look at that issue after his vacation.
> 
> 
> Best,
> Vladislav
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] what's the deal with 1.0.9 init_ais_connection?

2010-08-03 Thread Andrew Beekhof
On Tue, Aug 3, 2010 at 7:18 PM, Alan Jones  wrote:
> I'm trying to work a cib seg fault in init_ais_connection() for pacemaker
> 1.0.9.

Don't. Use 1.0.9.1 which already has the patch below.

> The 1.0.8 version of this function is pretty stright forward, calling one of
> the
> comm stack's connect functions depending on the config.
> In 1.0.9, however, it appears to be a recursive call that never ends.
> There is also a init_ais_connection_once() below that appears to be the
> intended function to call within this function.
> Is it safe for me to make this change?
> Alan
> ---
> ajo...@ajones-dl:~/hasrc/Pacemaker-1-0-Pacemaker-1.0.9/lib/common$ diff -c
> ais.c.org ais.c
> *** ais.c.org    2010-06-23 03:25:30.0 -0700
> --- ais.c    2010-08-03 10:20:38.320875334 -0700
> ***
> *** 582,588 
>   {
>   int retries = 0;
>   while(retries++ < 30) {
> !     int rc = init_ais_connection(dispatch, destroy, our_uuid, our_uname,
> nodeid);
>       switch(rc) {
>           case CS_OK:
>           return TRUE;
> --- 582,588 
>   {
>   int retries = 0;
>   while(retries++ < 30) {
> !     int rc = init_ais_connection_once(dispatch, destroy, our_uuid,
> our_uname, nodeid);
>       switch(rc) {
>           case CS_OK:
>           return TRUE;
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] what's the deal with 1.0.9 init_ais_connection?

2010-08-03 Thread Alan Jones
I'm trying to work a cib seg fault in init_ais_connection() for pacemaker
1.0.9.
The 1.0.8 version of this function is pretty stright forward, calling one of
the
comm stack's connect functions depending on the config.
In 1.0.9, however, it appears to be a recursive call that never ends.
There is also a init_ais_connection_once() below that appears to be the
intended function to call within this function.
Is it safe for me to make this change?
Alan
---
ajo...@ajones-dl:~/hasrc/Pacemaker-1-0-Pacemaker-1.0.9/lib/common$ diff -c
ais.c.org ais.c
*** ais.c.org2010-06-23 03:25:30.0 -0700
--- ais.c2010-08-03 10:20:38.320875334 -0700
***
*** 582,588 
  {
  int retries = 0;
  while(retries++ < 30) {
! int rc = init_ais_connection(dispatch, destroy, our_uuid, our_uname,
nodeid);
  switch(rc) {
  case CS_OK:
  return TRUE;
--- 582,588 
  {
  int retries = 0;
  while(retries++ < 30) {
! int rc = init_ais_connection_once(dispatch, destroy, our_uuid,
our_uname, nodeid);
  switch(rc) {
  case CS_OK:
  return TRUE;
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Two cloned VM, only one of the both shows online when starting corosync/pacemaker

2010-08-03 Thread Guillaume Chanaud

 On Mon, Jul 26, 2010 at 7:08 PM, Guillaume Chanaud

  wrote:

  Le 26/07/2010 14:38, Andrew Beekhof a écrit :

On Fri, Jul 16, 2010 at 5:44 PM, Guillaume Chanaud
wrote:

Hello,

[snip]


 # Optionally assign a fixed node id (integer)
 nodeid: 30283707487

In addition to changing the node's ip address, did you change this too?


Yes, as i said in the mail i changed the nodeid and bindnetaddress for each
node. Even non fixed values doesn't work.

Looks like corosync crashed.  Possibly related to the duplicate nodeid?
Was there a core file in /var/run/corosync?
what does "ulimit -c" say?

There is a /var/run/corosync.pid (whith the correct pid) which stay there
even after the corosync crash.
For ulimit :
[r...@www01 run]# ulimit -c
0

These values are the same on the second node running fine (and if i stop
this second node, the first node will run fine, but not the second...)

Yes, but that doesn't help us find out why the first one is crashing.
Please run "ulimit -c unlimited" before starting corosync so that we
can get a core file and stack trace.

Hello,
sorry for the delay it took, july is not the best month to get things 
working fast.


Here is the core dump file (55MB) :
http://www.connecting-nature.com/corosync/core
corosync version is 1.2.3

thanks for your help


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Cluster Amnesia problem in Pacemaker

2010-08-03 Thread chajo

Thanks for clarification






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] testing resources (was: Crazy idea #1)

2010-08-03 Thread Dejan Muhamedagic
Hi,

On Mon, Aug 02, 2010 at 11:07:46PM +0200, Lars Marowsky-Bree wrote:
> On 2010-08-02T19:05:53, Dejan Muhamedagic  wrote:
> 
> > Testing/starting/etc resources is easy, but the shell doesn't
> > know about dependencies.
> 
> I think this might not even be needed for the first step. Clearly, a
> mode to run a specific single resource would be required.
> 
> I do think "ra test" is the right level for this; but maybe the "ra"
> command hierarchy should be available from configure too?

It is, because it is useful to get the RA documentation while
defining resources.

> ra test   

The reason to have it running from the configure level is that
that is the place where resources are defined: resource names and
their parameters. The ra level has no such infrastructure.

> If node is not given, run locally. If  is not given, run
> ocf-tester.

The shell doesn't have capability to run operations on other
nodes.

> (And really, I mean run ocf-tester - I don't want to duplicate the test
> case logic in more than one place.)
> 
> In the first step, the user would be required to ensure all dependencies
> are up and available, or brought online manually before.
> 
> > We can introduce a new sublevel at configure to allow fiddling with
> > resource operations at will, but it would be better to have, say,
> > ptest provide information about the order of resource operations.
> 
> I can see the value here too - but this gets complex really quickly, if
> the ordering/locations are not local only, or if clones get involved.

It is true that it would be difficult to cover all the cases.

> (We end up single stepping through the transition graph, basically. That
> may be quite worthwhile to implement, but seems to be a quite different
> scope, and may best be implemented via a "debugger" interface to the
> crmd, so that the shell can interactively trace + modify the transition?
> Word up for complexity! ;-)
> 
> > order. For instance, after creating new resources, the user would
> > just say "test" before commit and the shell would run the
> > following:
> > 
> > test A (ocf-tester)
> > start A
> > test B
> > start B
> > test C
> > stop B
> > stop A
> 
> I _think_ we are fine if we allow the "ra test" command to operate on
> groups too.
> 
> But if we try to implement the "test" mode so that it resolves
> dependencies, we will end up creating an expectation that it always
> works, and that the shell figures out where to run stuff etc. I'd
> rather have something simple that we can guarantee always works, than
> something complex that will lead to many bug reports ;-)

Not sure, but I suspect that the implementation of testing groups
wouldn't be much different from gathering dependencies from some
external source. At any rate, we may start with something less
ambitious.

Cheers,

Dejan

P.S. Moving the discussion to the user list and adjusting the
subject.

> Regards,
> Lars
> 
> -- 
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> 
> ___
> Pcmk-devel mailing list
> pcmk-de...@oss.clusterlabs.org
> http://oss-2.clusterlabs.org/mailman/listinfo/pcmk-devel

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Martijn Sprengers
Hi,

The clvm resource agent is upcoming in Fedora 13 as part of an updated
heartbeat/pacemaker package. FC 13 has already moved to
Corosync/OpenAIS/Pacemaker/DRBD and FC 13 should be familiar to you
regarding CentOS.

Regards,
Martijn Sprengers



Disclaimer:
Dit bericht is alleen bestemd voor de geadresseerden. Aan dit bericht kunnen
geen rechten worden ontleend.

-Oorspronkelijk bericht-
Van: Brett Delle Grazie [mailto:brett.dellegra...@intact-is.com] 
Verzonden: dinsdag 3 augustus 2010 9:30
Aan: m...@3open.org; pacemaker@oss.clusterlabs.org
Onderwerp: Re: [Pacemaker] error installing CentOS clvm after using
clusterlabs repository

Hi Mike,

In RHEL 5.x and CentOS 5.x you must use CMAN and the RedHat Cluster
Suite (RHCS) if you are going to used clustered LVM.

This is because clvmd currently uses the CMAN interface to the cluster.
In later versions, RedHat is moving towards Corosync / OpenAIS /
(Pacemaker | RgManager) solution but this will take a long time.

Christine Caufield (from RedHat) wrote an excellent document describing
the change process here:
http://people.redhat.com/ccaulfie/docs/Whither%20cman.pdf

I guess your options are:
(a) Switch to RHCS based cluster, at least for those nodes with
clustered LVM requirements (and GFS, GFS2 etc)
(b) Switch to RHEL 6.x Beta
(c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it
should be possible, maybe.
(d) Try compiling current source of lvm2-cluster packages from Fedora or
Rawhide as they can use current versions of OpenAIS. The RHEL 5.x
versions of lvm2-cluster are fixed at using CMAN interface, not OpenAIS
(e) Switch to Debian based distro - Lenny is production ready and has
CLVM / Pacemaker / Corosync in backports ;)
(f) Something someone else on the list with more experience comes up
with :)

Good luck, please let us know how you get on.

Best Regards,

Brett


On Tue, 2010-08-03 at 10:29 +0800, Michael Fung wrote:
> Hi all,
> 
> 
> I am using the following repository to install pacemaker and corosync:
> 
>   [clusterlabs]
>   name=High Availability/Clustering server technologies (epel-5)
>   baseurl=http://www.clusterlabs.org/rpm/epel-5
>   ...
> 
> The cluster is working good.
> 
> Later, I want to use clvm, that is the lvm2-cluster package. yum get it
> from the CentOS repository, but the dependencies are broken. It seems
> the openais package from clusterlabs is different from the CentOS.
> 
> I skipped the dependencies and force installed related library files.
> Finally I got clvmd to run but it complains:
> 
>   Starting clvmd: clvmd could not connect to cluster manager
>   Consult syslog for more information
> 
> 
> Any ideas please?
> 
> 
> Rgds,
> Michael
> 
> 
> 

-- 
Best Regards,

Brett Delle Grazie

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Vladislav Bogdanov
03.08.2010 10:29, Brett Delle Grazie wrote:

...

> (c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it
> should be possible, maybe.

To use OCFS2, GFS2 or CLVM with corosync one needs support for userspace
cluster stack in DLM, which is missing from EL5 kernel, so this would
not help. Backporting that feature doesn't seem possible.

Best,
Vladislav


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Brett Delle Grazie
Hi Mike,

In RHEL 5.x and CentOS 5.x you must use CMAN and the RedHat Cluster
Suite (RHCS) if you are going to used clustered LVM.

This is because clvmd currently uses the CMAN interface to the cluster.
In later versions, RedHat is moving towards Corosync / OpenAIS /
(Pacemaker | RgManager) solution but this will take a long time.

Christine Caufield (from RedHat) wrote an excellent document describing
the change process here:
http://people.redhat.com/ccaulfie/docs/Whither%20cman.pdf

I guess your options are:
(a) Switch to RHCS based cluster, at least for those nodes with
clustered LVM requirements (and GFS, GFS2 etc)
(b) Switch to RHEL 6.x Beta
(c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it
should be possible, maybe.
(d) Try compiling current source of lvm2-cluster packages from Fedora or
Rawhide as they can use current versions of OpenAIS. The RHEL 5.x
versions of lvm2-cluster are fixed at using CMAN interface, not OpenAIS
(e) Switch to Debian based distro - Lenny is production ready and has
CLVM / Pacemaker / Corosync in backports ;)
(f) Something someone else on the list with more experience comes up
with :)

Good luck, please let us know how you get on.

Best Regards,

Brett


On Tue, 2010-08-03 at 10:29 +0800, Michael Fung wrote:
> Hi all,
> 
> 
> I am using the following repository to install pacemaker and corosync:
> 
>   [clusterlabs]
>   name=High Availability/Clustering server technologies (epel-5)
>   baseurl=http://www.clusterlabs.org/rpm/epel-5
>   ...
> 
> The cluster is working good.
> 
> Later, I want to use clvm, that is the lvm2-cluster package. yum get it
> from the CentOS repository, but the dependencies are broken. It seems
> the openais package from clusterlabs is different from the CentOS.
> 
> I skipped the dependencies and force installed related library files.
> Finally I got clvmd to run but it complains:
> 
>   Starting clvmd: clvmd could not connect to cluster manager
>   Consult syslog for more information
> 
> 
> Any ideas please?
> 
> 
> Rgds,
> Michael
> 
> 
> 

-- 
Best Regards,

Brett Delle Grazie

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker