Re: [Pacemaker] Cluster type is: corosync

2011-07-27 Thread Proskurin Kirill

27.07.2011 6:41, Andrew Beekhof пишет:


Ok. And did you add the pacemaker configuration options to corosync's
config file?



I attach our corosync.conf. It is same on all nodes except IP addr.


You missed a step from:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/s-configure-corosync.html


Witch one?
At previous conversation Steven Dake said what I can use exact ip addr 
if I wish(And I do - because some node may have more then one ip addr on 
the same network).

And I run pacemakerd after corosync start.

I can`t say for sure but seems to I fix it by turning compatibility: 
none. After this it start to tell me what here are Cluster type: openais.



Pacemaker is black now - no configuration at all.

Online nodes:
[root@mysender1 ~]# crm configure show
node mysender1.example.com
node mysender2.example.com
node mysender3.example.com
node mysender4.example.com
node mysender5.example.com
node mysender6.example.com
node mysender7.example.com
property $id=cib-bootstrap-options \
dc-version=1.1.5-3-01e86afaaa6d4a8c4836f68df80ababd6ca3902f \
cluster-infrastructure=openais \
expected-quorum-votes=6


Offline nodes(Cluster type is: corosync)
[root@mysender2 ~]# crm configure show
[root@mysender2 ~]#





pacemaker-1.1.5
corosync-1.4.0
cluster-glue-1.0.6
openais-1.1.2

All nodes have same rpms.


On Fri, Jul 22, 2011 at 7:47 PM, Proskurin Kirill
k.prosku...@corp.example.com  wrote:


Hello again!

Hope I`m not flooding too much here but I have another problem.

I install same rpm of corosync, openais, pacemaker, cluster_glue on all
nodes. I check it twice.

And then I start some of they - they can`t connect to cluster and stays
offline. In logs I see what they see other nodes and connectivity is
ok.
But
I found the difference:

Online nodes in cluster have:
[root@mysender39 ~]# grep 'Cluster type is' /var/log/corosync.log
Jul 22 20:38:58 mysender39.example.com stonith-ng: [3499]: info:
get_cluster_type: Cluster type is: 'openais'.
Jul 22 20:38:58 mysender39.example.com attrd: [3502]: info:
get_cluster_type:
Cluster type is: 'openais'.
Jul 22 20:38:58 mysender39.example.com cib: [3500]: info:
get_cluster_type:
Cluster type is: 'openais'.
Jul 22 20:38:59 mysender39.example.com crmd: [3504]: info:
get_cluster_type:
Cluster type is: 'openais'.

Offline have:
[root@mysender2 ~]# grep 'Cluster type is' /var/log/corosync.log
Jul 22 13:39:17 mysender2.example.com stonith-ng: [9028]: info:
get_cluster_type: Cluster type is: 'corosync'.
Jul 22 13:39:17 mysender2.example.com attrd: [9031]: info:
get_cluster_type:
Cluster type is: 'corosync'.
Jul 22 13:39:17 mysender2.example.com cib: [9029]: info:
get_cluster_type:
Cluster type is: 'corosync'.
Jul 22 13:39:18 mysender2.example.com crmd: [9033]: info:
get_cluster_type:
Cluster type is: 'corosync'.

What`s wrong and how can I fix it?



--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Upgrading from 1.0 to 1.1

2011-07-27 Thread Proskurin Kirill

27.07.2011 5:56, Andrew Beekhof пишет:

On Tue, Jul 19, 2011 at 5:40 PM, Proskurin Kirill
k.prosku...@corp.mail.ru  wrote:

On 07/19/2011 03:22 AM, Andrew Beekhof wrote:


On Fri, Jul 15, 2011 at 10:33 PM, Proskurin Kirill
k.prosku...@corp.mail.ruwrote:


Hello all.

I found what I using corosync with pacemaker ver:0 with installed
pacemaker 1.1.5 - eg without start a pacemakerd.

Sounds wrong. :-)
So I try to upgrade.
I shutdown one node. Change 0 to 1 on service.d/pcmk
Start corosync and then start pacemakerd via init script.

But this node stays online and on clusters DC I see:
cib: [18392]: WARN: cib_peer_callback: Discarding cib_sync_one message
(255)
from mysender10.example.com: not in our membership


Thats odd.  The only you changed was ver: 0 to ver: 1 ?


Yes, only this. To make it more clear - I have 4 nodes with ver 0 and try to
add one with ver 1 and got this.

Well I shutdown all nodes change all to 1 and star them up add all was ok.
Not a really good way to upgrade but I don`t have time.


Do you still have the logs for the failure case?
I'd really like to see them.


No I don`t. But some time ago I got same error on vise-versa situation - 
then I try to add node with ver: 0 to cluster there all nodes are ver: 1


Anyway my cluster are down now so I can do some test. I will sent logs 
to maillist if I reproduce this situation again.


--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-07-27 Thread Florian Haas
On 2011-07-27 03:46, Andrew Beekhof wrote:
 On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof and...@beekhof.net wrote:
 Hmm.  Interesting. I will investigate.
 
 This is an unfortunate side-effect of my history compression patch.
 
 Since we only store the last successful and last failed operation, we
 don't have the md5 of the start operation around to check when a
 resource's definition is changed.
 
 Solutions appear to be either:
 a) give up the space savings and revert the history compression patch
 b) always restart a resource if a non-matching md5 is detected - even
 if the operation was a recurring monitor
 
 I'd favor b) along with dropping the per-operation parameters.
 The only valid use-case I've heard for those is setting OCF_LEVEL or
 depth or whatever it was called - and I think we're in basic agreement
 that we need a better solution for that anyway.

We are, and you know my opinion that OCF_CHECK_LEVEL is hideous
(although lmb, for one, seems to disagree). But dropping it now does
clearly count as a regression and I'd really hate to see that happen unless

a) there is a replacement method for tuning the thoroughness of checking
the resource state during monitor, _and_
b) there is an automated or semi-automated (cibadmin --upgrade?) means
of transitioning off OCF_CHECK_LEVEL and replacing it with its successor
feature.

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Postgresql and Fedora with DRBD and Pacemaker

2011-07-27 Thread Benjamin Knoth
Hi all,
i have a Fedora which communicates with Postgresql.
I would like to have a High Availability solution for this scenario.

My first idea was, i use drbd on two VMs to synchronize both partions
and use Pacemaker to control the DRBD, IP, Fedora and Postgresql.
This part is done and works.

But now i think it's not enough to get a HA solution.

The problem is that Fedora firstly write the data on the filesystem,
than it writes the link and indexies in the DB.

For Example if the drbd crashs from Fedora it is possible that the
operation couldn't be finished. So the data is written but and the link
is written in the DB but not the Index of this object.

What should i do? If Fedora DRBD crashs, Pacemaker will detect it, shut
down the services in order and start them on the other VM. But at this
the operation is incomplete. The data is written on the Filesystem and
the Link too, but the Index not. In this moment i have a inconsistence
System.

The normal way on fedora is to resolve this problem, to reindex and
recache it. But we need 4 days to recache and reindex all items. And
this isn't possible to have a downtime of 4 days.

My idea is to solve this problem, to delete the last written file on the
filesystem in Fedora on time x and using WAL on Postgresql to restore
the database to time x. After that i should have a consistent system again.

But is my idea realistic or do i need pgpool II, slony or only DRBD with
Pacemaker?

Do someone have experience with a scenario like this?

Best regards

-- 
Benjamin Knoth




smime.p7s
Description: S/MIME Cryptographic Signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm resource restart does not work on the DC node with crmd-transtion-delay=2s

2011-07-27 Thread NAKAHIRA Kazutomo
Hi, all

I configured crmd-transition-delay=2s to address the following problem.

 http://www.gossamer-threads.com/lists/linuxha/pacemaker/68504
 http://developerbugs.linux-foundation.org/show_bug.cgi?id=2528

And then, crm resource restart command get become less able to
restart any resources on the DC node.
# crm resource restart works fine on the non-DC node.
# Please see attached hb_report generated on the simple environment.

How can I use crm resource restart command on the DC node
with crmd-transtion-delay=2s?

I confirmed that I can avoid this problem by the following procedure
 1. crm resource stop rsc-ID
 2. wait crmd-transtion-delay(2) scond
 3. crm resource start rsc-ID
but this behavior(restart does not works on the DC node)
may be confuse users.

Best regards,

-- 
NAKAHIRA Kazutomo
Infrastructure Software Technology Unit
NTT Open Source Software Center


hb_report-Wed-27-Jul-2011.tar.bz2
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problem with colocation

2011-07-27 Thread Yingliang Yang
Hi,
I'm glad to help you.

2011/7/26 Taneli Lepp wrote:
 Hello,

 On 25.7.11 13:28, Yingliang Yang wrote:
 constraints
 rsc_colocation id=Sphinx_with_IP rsc=Sphinx score-attribute=INF
 with-rsc=Sphinx_IP/
 /constraints
 There is a problem in your config.
 The score-attribute should be score and its value should be INFINITY.

 Thanks, you were correct. The Cluster from Scratch manual uses inf
 shorthand all the time, so I thought it would work.

 Should this kind of error pass the schema check anyways?


score-attribute=INF can pass the schema chek, score=INF can't.
score-attribute is unused in colocation.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Announcing Pacemaker Cloud 0.4.1 - Available now for download!

2011-07-27 Thread Steven Dake
Angus and I announced a project to apply high availability best known
practice to the field of cloud computing in late March 2011.  We reuse
the policy engine of Pacemaker.  Our first tarball is available today
containing a functional prototype demonstrating these best known practices.

Today the software supports a deployable/assembly model.  Assemblies
represent a virtual machine and deployables represent a collection of
virtual machines.  Resources within a virtual machine can be monitored
for failure and recovered.  Assemblies and deployables are also
monitored for failure and recovered.

Currently the significant limitation with the software is that it
operates single node.  As a result it is not suitable for deployment
today.  We plan to address this in the future by integrating with other
cloud infrastructure systems such as Aeolus (developer ml on CC list).

The software will be available in Fedora 16 for all to evaluate that run
Fedora.  Your feedback is greatly appreciated.  To provide feedback,
join the mailing list:

http://oss.clusterlabs.org/mailman/listinfo/pcmk-cloud/

If you have interest in developing for cloud environments around the
topic of high availability, please feel free to download our git repo
and submit patches.  We also are interested in user feedback!

To get the software, check out:

http://pacemaker-cloud.org/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker