Re: [Pacemaker] Remote Access not Working

2009-11-27 Thread Colin
On Mon, Nov 23, 2009 at 9:59 AM, Colin  wrote:
> On Fri, Nov 20, 2009 at 8:05 PM, Andrew Beekhof  wrote:
>> On Fri, Nov 20, 2009 at 12:36 PM, Andrew Beekhof  wrote:
>>> Remote notifications should work, I'll test that today.
>>
>> As of http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/a6d70b1b479d
>> they finally work for clear-text connections.
>
> Downloading ... Compiling ... Testing ... Success!
>
> (Although there's still the following message from crm_mon:
> "Notification setup failed, won't be able to reconnect after failure",
> it does seem to hang on and update itself correctly when the CIB
> changes...)

On my other test cluster, with 32bit systems, the notification does
not work, i.e. crm_mon gives me the correct status and then doesn't
ever update.

Colin

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Pacemaker 1.0.x, Debian, and upgrades from Heartbeat 2.1.x

2009-11-27 Thread Dejan Muhamedagic
Hi,

On Thu, Nov 26, 2009 at 08:16:26PM +0100, Andrew Beekhof wrote:
> On Thu, Nov 26, 2009 at 2:28 PM, Florian Haas  wrote:
> > Andrew and everyone,
> >
> > apologies upfront if this is turning into a rant. This has been somewhat
> > bothering me for a while.
> >
> > A bit of backdrop.
> >
> > - The docs
> > (http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/apes03s02.html)
> > have claimed for a while that Pacemaker 1.0.x is compatible with
> > Heartbeat 2.1.3 (aka Pacemaker 0.6). Thus it ought to be safe to expect
> > to be able to do rolling upgrades from 2.1.3/2.1.4 to 1.0.x.
> 
> Yes, I anticipated it too when I was releasing 1.0.0
> In fairness the wiki has been correct since April when I encountered the 
> issue.
> 
> [snip]
> 
> > Because, quoting from the documentation, rolling upgrades are "currently
> > broken between Pacemaker 0.6.x and 1.0.x. If there is sufficient demand,
> > the work to repair 0.6 -> 1.0 compatibility will be carried out."
> >
> > I firmly believe there is sufficient demand. I therefore ask that this
> > breakage be fixed. Perhaps other Debian users can second that request of
> > mine.
> 
> What I don't understand, if such demand exists, is why I'm not hearing
> more about it.
> Since 1.0.0 came out over a year ago, I've had exactly 4 people
> complain about the problem (and only half of those had actually
> performed an upgrade and encountered the problem).
> 
> I even explicitly pointed out the problem and asked for people's
> feedback as to whether it was important.
> To date that thread has zero replies in 7 months.
> 
> The occams-razor explanation would seem to be that cluster admins
> simply don't do rolling upgrades between major versions.

> Perhaps you can convince lmb to fix it, I think he had thoughts of
> using that capability.
> But hey, if hoards of people suddenly turn up saying they simply must
> have rolling upgrades to 1.0 I will of course work on it myself.

Perhaps they simply never tried to upgrade. What Florian was
saying is that once people start upgrading to the new Debian
release, they will run into problems here. So, hoards may turn
up, but it may already be too late then.

BTW, can't we modify hb2openais and use that? It should have most
bits in.

Thanks,

Dejan

> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] cLVM in Pacemaker

2009-11-27 Thread Dejan Muhamedagic
Hi,

On Thu, Nov 26, 2009 at 02:42:54PM +0100, Sander van Vugt wrote:
> Hi,
> 
> I'm trying to set up cLVM resources as described here:
> http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/book_sleha.html
> Following this procedure in different environments, I always
> end up with the same: a dlm that comes up and a cLVM that
> doesn't. I don't know what I'm missing and the logs don't tell
> me what goes wrong.

If a resource doesn't start there really must be a message in the
logs. If there isn't one, that's already a bug. Can you please
take a look again.

Thanks,

Dejan

> I know I'm not very specific about what
> exactly goes wrong, but the problem is that I can't find any
> more information on why it does go wrong. Did anyone manage to
> get this working? If so, I would greatly appreciate some hints
> about this. 
> Thanks,
> Sander
> 
> 
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Node crash when 'ifdown eth0'

2009-11-27 Thread Oscar Remí­rez de Ganuza Satrústegui

Good morning,

We are testing a cluster configuration on RHEL5 (x86_64) with pacemaker 
1.0.5 and openais (0.80.5).

Two node cluster, active-passive, with the following resources:
Mysql service resource and a NFS filesystem resource (shared storage in 
a SAN).


In our tests, when we bring down the network interface (ifdown eth0), 
the openais service (aisexec process) and other processes (stonithd, 
cib, attrd and crmd) crash, and just some processes are still running:

[r...@herculespre ~]# ps -fea |grep "ais\|heartbeat"
root  2343  2335  0 Nov26 pts/000:00:18 /usr/lib64/heartbeat/lrmd
102   2345  2335  0 Nov26 pts/000:00:01 /usr/lib64/heartbeat/pengine
root 30347  2287  0 11:15 pts/000:00:00 grep ais\|heartbeat

We have to start again the openais service in order to bring up the node 
into the cluster.

That is not happening if we just unplug the ethernet wire (through vmware).

Is this a known bug?
(I didn't want to spam the list with the full log, but if it is needed i 
can post it)


I wanted to upgrade the packages in order to check if this is has been 
resolved in the new versions (pacemaker 1.0.6 and openais 1.1.0), but i 
couldn't find the new packages for RHEL5 on 
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_5/x86_64/

Does anybody know if they are coming soon or where can we get them?

Thank you very much for your work on this software!
And for your help!

Regards,

---
Oscar Remírez de Ganuza
Servicios Informáticos
Universidad de Navarra
Ed. de Derecho, Campus Universitario
31080 Pamplona (Navarra), Spain
tfno: +34 948 425600 Ext. 3130
http://www.unav.es/SI





smime.p7s
Description: S/MIME Cryptographic Signature
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Debian packages, OCFS2, high CPU load

2009-11-27 Thread Stefan Förster
Hello world,

out of curiosity, I tried to set up OCFS2 and a
Pacemaker/Corosync/OpenAIS cluster stack ohn Debian/lenny.

I copy/pasted the cluster configuration from Michael Schwartzkopff's
HOWTO (which works flawlessly on Fedora Core 11).

With Debian, apart from some minor glitches (path to controld.pcmk,
old udev, old kernel) everything went well, but as soon as I commit
the configuration containing the O2CB resources, both nodes become
unresponsive, cluster communication fails and corosync (which was
started as "aisexec") is at about 100% CPU.

I'm a little bit stuck here, without any idea where I could try to
begin with any debugging efforts.

Kernel is 2.6.30-bpo, the packages are those from madkiss' repository.

I'd really appreciate any hints on where to start debugging or which
information I have to provide to this mailing list.


Cheers
Stefan

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Node crash when 'ifdown eth0'

2009-11-27 Thread Mark Horton
I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using openais) with
centos 5.4.  The packages are from here:
http://www.clusterlabs.org/rpm/epel-5/

Mark

On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remí­rez de Ganuza Satrústegui
 wrote:
> Good morning,
>
> We are testing a cluster configuration on RHEL5 (x86_64) with pacemaker
> 1.0.5 and openais (0.80.5).
> Two node cluster, active-passive, with the following resources:
> Mysql service resource and a NFS filesystem resource (shared storage in a
> SAN).
>
> In our tests, when we bring down the network interface (ifdown eth0), the
> openais service (aisexec process) and other processes (stonithd, cib, attrd
> and crmd) crash, and just some processes are still running:
> [r...@herculespre ~]# ps -fea |grep "ais\|heartbeat"
> root      2343  2335  0 Nov26 pts/0    00:00:18 /usr/lib64/heartbeat/lrmd
> 102       2345  2335  0 Nov26 pts/0    00:00:01 /usr/lib64/heartbeat/pengine
> root     30347  2287  0 11:15 pts/0    00:00:00 grep ais\|heartbeat
>
> We have to start again the openais service in order to bring up the node
> into the cluster.
> That is not happening if we just unplug the ethernet wire (through vmware).
>
> Is this a known bug?
> (I didn't want to spam the list with the full log, but if it is needed i can
> post it)
>
> I wanted to upgrade the packages in order to check if this is has been
> resolved in the new versions (pacemaker 1.0.6 and openais 1.1.0), but i
> couldn't find the new packages for RHEL5 on
> http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_5/x86_64/
> Does anybody know if they are coming soon or where can we get them?
>
> Thank you very much for your work on this software!
> And for your help!
>
> Regards,
>
> ---
> Oscar Remírez de Ganuza
> Servicios Informáticos
> Universidad de Navarra
> Ed. de Derecho, Campus Universitario
> 31080 Pamplona (Navarra), Spain
> tfno: +34 948 425600 Ext. 3130
> http://www.unav.es/SI
>
>
>
>
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
>

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Node crash when 'ifdown eth0'

2009-11-27 Thread Dejan Muhamedagic
Hi,

On Fri, Nov 27, 2009 at 12:01:17PM +0100, Oscar Remí­rez de Ganuza Satrústegui 
wrote:
> Good morning,
> 
> We are testing a cluster configuration on RHEL5 (x86_64) with
> pacemaker 1.0.5 and openais (0.80.5).
> Two node cluster, active-passive, with the following resources:
> Mysql service resource and a NFS filesystem resource (shared storage
> in a SAN).
> 
> In our tests, when we bring down the network interface (ifdown
> eth0), the openais service (aisexec process) and other processes

Yes, openais gets nervous if the network interface disappears. I
think you'll find a core dump in /var/lib/openais. At any rate,
better make sure that the interface stays up. And don't use dhcp
but static addresses.

> (stonithd, cib, attrd and crmd) crash, and just some processes are
> still running:
> [r...@herculespre ~]# ps -fea |grep "ais\|heartbeat"
> root  2343  2335  0 Nov26 pts/000:00:18 /usr/lib64/heartbeat/lrmd
> 102   2345  2335  0 Nov26 pts/000:00:01 /usr/lib64/heartbeat/pengine

Processes which are not talking to aisexec.

Thanks,

Dejan

> root 30347  2287  0 11:15 pts/000:00:00 grep ais\|heartbeat
> 
> We have to start again the openais service in order to bring up the
> node into the cluster.
> That is not happening if we just unplug the ethernet wire (through vmware).
> 
> Is this a known bug?
> (I didn't want to spam the list with the full log, but if it is
> needed i can post it)
> 
> I wanted to upgrade the packages in order to check if this is has
> been resolved in the new versions (pacemaker 1.0.6 and openais
> 1.1.0), but i couldn't find the new packages for RHEL5 on 
> http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_5/x86_64/
> Does anybody know if they are coming soon or where can we get them?
> 
> Thank you very much for your work on this software!
> And for your help!
> 
> Regards,
> 
> ---
> Oscar Remírez de Ganuza
> Servicios Informáticos
> Universidad de Navarra
> Ed. de Derecho, Campus Universitario
> 31080 Pamplona (Navarra), Spain
> tfno: +34 948 425600 Ext. 3130
> http://www.unav.es/SI
> 
> 
> 



> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Debian packages, OCFS2, high CPU load

2009-11-27 Thread Dejan Muhamedagic
Hi,

On Fri, Nov 27, 2009 at 01:05:41PM +0100, Stefan Förster wrote:
> Hello world,
> 
> out of curiosity, I tried to set up OCFS2 and a
> Pacemaker/Corosync/OpenAIS cluster stack ohn Debian/lenny.
> 
> I copy/pasted the cluster configuration from Michael Schwartzkopff's
> HOWTO (which works flawlessly on Fedora Core 11).
> 
> With Debian, apart from some minor glitches (path to controld.pcmk,
> old udev, old kernel) everything went well, but as soon as I commit
> the configuration containing the O2CB resources, both nodes become
> unresponsive, cluster communication fails and corosync (which was
> started as "aisexec") is at about 100% CPU.

corosync runs as corosync. aisexec is from the older openais
(0.8x).

Otherwise, perhaps you found a bug. See if it's reproducible
without o2cb.

Thanks,

Dejan

> I'm a little bit stuck here, without any idea where I could try to
> begin with any debugging efforts.
> 
> Kernel is 2.6.30-bpo, the packages are those from madkiss' repository.
> 
> I'd really appreciate any hints on where to start debugging or which
> information I have to provide to this mailing list.
> 
> 
> Cheers
> Stefan
> 
> ___
> Pacemaker mailing list
> Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Debian packages, OCFS2, high CPU load

2009-11-27 Thread Stefan Förster
* Dejan Muhamedagic :
> On Fri, Nov 27, 2009 at 01:05:41PM +0100, Stefan Förster wrote:
>> With Debian, apart from some minor glitches (path to controld.pcmk,
>> old udev, old kernel) everything went well, but as soon as I commit
>> the configuration containing the O2CB resources, both nodes become
>> unresponsive, cluster communication fails and corosync (which was
>> started as "aisexec") is at about 100% CPU.
> 
> corosync runs as corosync. aisexec is from the older openais
> (0.8x).

With the Debian packages from http://people.debian.org/~madkiss/ha/,
openais contains "/usr/sbin/aisexec", which is a shellscript calling:

export 
COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser"
corosync "$@"

The Debian openais package also contains /usr/lib/lcrso/service_ckpt.lcrso
which isn't loaded without the above environemnt settings. Amongst
others, it contains:

/usr/lib/lcrso/service_msg.lcrso
/usr/lib/lcrso/service_lck.lcrso
/usr/lib/lcrso/service_clm.lcrso
/usr/lib/lcrso/service_evt.lcrso
/usr/lib/lcrso/openaisserviceenable.lcrso
/usr/lib/lcrso/service_ckpt.lcrso
/usr/lib/lcrso/service_amf.lcrso
/usr/lib/lcrso/service_tmr.lcrso

> Otherwise, perhaps you found a bug. See if it's reproducible
> without o2cb.

I'm unsure on how to do this. Perhaps simply using another service
which relies on CKPT would trigger that bug?


Ciao
Stefan
-- 
Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9
Bedauerlicher Weise kann man den Sturz aus großer Höhe nicht trainieren.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Node crash when 'ifdown eth0'

2009-11-27 Thread Steven Dake
On Fri, 2009-11-27 at 11:32 -0200, Mark Horton wrote:
> I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using openais) with
> centos 5.4.  The packages are from here:
> http://www.clusterlabs.org/rpm/epel-5/
> 
> Mark
> 
> On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remí­rez de Ganuza Satrústegui
>  wrote:
> > Good morning,
> >
> > We are testing a cluster configuration on RHEL5 (x86_64) with pacemaker
> > 1.0.5 and openais (0.80.5).
> > Two node cluster, active-passive, with the following resources:
> > Mysql service resource and a NFS filesystem resource (shared storage in a
> > SAN).
> >
> > In our tests, when we bring down the network interface (ifdown eth0), the

What is the use case for ifdown eth0 (ie what are you trying to verify)?

I recommend using latest pacemaker and corosync as well if your doing a
new deployment.

Regards
-steve


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker