Re: [Pacemaker] Best stonith-action, reset or poweroff

2010-04-14 Thread Matthew Palmer
On Wed, Apr 14, 2010 at 09:11:43AM +0200, Ivan Coronado wrote:
> I was wondering what is the better stonith-action. If I set reboot but
> the node doesn't restart (damaged motherboard or not power, for example)
> resources do not ever migrate. So it would be better set stonith-action
> to poweroff, right?

I prefer poweroff, for the simple reason that if something's exploded to the
point that it needed STONITH, I'd much prefer to run without redunancy until
I can take a look at the situation and make sure that there's not going to
be any more problems.  Having a machine come back up, then explode again,
repeatedly, is far more disruptive to service.

- Matt

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf


Re: [Pacemaker] DRBD 2 node cluster and STONITH configuration help?required.

2010-03-24 Thread Matthew Palmer
On Wed, Mar 24, 2010 at 07:59:26PM +, Mario Giammarco wrote:
> Andrew Beekhof  writes:
> > Have you seen:
> >http://www.clusterlabs.org/doc/crm_fencing.html
> > I have been led to believe that STONITH
> > > will help prevent split brain situations, but the LINBIT instructions do 
> > > not
> > > provide any guidance on how to conifgure STONITH in the pacemaker cluster.
> 
> Probably the 10 million dollar question is: does drbd really need stonith?

The $3.50 question is: is your data important?  If it is, you need STONITH. 
If it isn't, then you don't need DRBD either.

- Matt

-- 
If you are a trauma surgeon and someone dies on your table, [...] everyone
would know you "did your best".  When someone does something truly stupid
with their system and it dies and you can't resuscitate it, you must be
incompetent or an idiot.  -- Julian Macassey, in the Monastery

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Building an active/passive dhcp server

2010-03-19 Thread Matthew Palmer
On Fri, Mar 19, 2010 at 10:47:59PM +0100, Emmanuel Lesouef wrote:
> I'm trying to make a active/passive dhcp server.

[...]

> The problem is that when node1 come online again, there's a difference
> in the dhcp lease file.
> 
> I think that using rsync to synchronize the lease file is not the best
> solution and that a clustered file system is the best solution.

Yes, rsyncing your leases file around isn't going to be a win.  However, a
clustered filesystem is a really bad idea, as the complexity is far more
than you need.  Instead, a small DRBD (http://www.drbd.org/) volume with a
regular filesystem such as ext3 will work Just Fine And Dandy.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] node states

2010-03-17 Thread Matthew Palmer
On Wed, Mar 17, 2010 at 07:16:16AM -0500, Schaefer, Diane E wrote:
>   We were wondering what the node state of UNCLEAN, with the three
>   variations of online, offline and pending returned in crm_mon mean.  We
>   had the heartbeat service off on one of our nodes and the other node
>   reported UNCLEAN (online).  We seem to get it when the nodes are not
>   communicating.  Thanks for any clarification.

Unclean (online) means that the STONITH resource for that node had some
failures, and so the cluster isn't confident that when it comes time to
shoot that node (if required), it'll actually work.

Unclean (offline) means that communication has been lost, but STONITH failed
(or hasn't run yet -- should only be for a fraction of a second) and so the
"deadness" of the presumed-dead node isn't assured.

> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.

ONOES!  A grue!

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD Recovery Policies

2010-03-12 Thread Matthew Palmer
On Fri, Mar 12, 2010 at 09:48:57AM -, darren.mans...@opengi.co.uk wrote:
> /proc/drbd on the slave said Secondary/Primary UpToDate/Inconsistent
> while it was syncing data back - so it was able to mount the
> inconsistent data on the primary node and access the files that hadn't
> yet sync'd over?! I mounted a 4GB ISO that shouldn't have been able to
> be there yet and was able to access data inside it..
> 
> Is my understanding of DRBD limited and it's actually able to provide
> access to not fully sync'd files over the network link or something?

Yes, it can.  The primary knows what blocks it is inconsistent for, and if
an IO request comes in for one of those it asks for it from the other node. 
A wonderful bit of magic.

> If so - wow.

Indeed, that was my reaction too when I realised what it was doing.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-12 Thread Matthew Palmer
On Thu, Mar 11, 2010 at 05:26:19PM +0800, Martin Aspeli wrote:
> Matthew Palmer wrote:
>> On Thu, Mar 11, 2010 at 03:34:50PM +0800, Martin Aspeli wrote:
>>> I was wondering, though, if fencing at the DRBD level would get around
>>> the possible problem with a full power outage taking the fencing device
>>> down.
>>>
>>> In my poor understanding of things, it'd work like this:
>>>
>>>   - Pacemaker runs on master and slave
>>>   - Master loses all power
>>>   - Pacemaker on slave notices something is wrong, and prepares to start
>>> up postgres on slave, which will now also be the one writing to the DRBD
>>> disk
>>>   - Before it can do that, it wants to fence off DRBD
>>>   - It does that by saying to the local DRBD, "even if the other node
>>> tries to send you stuff, ignore it". This would avoid the risk of data
>>> corruption on slave. Before master could came back up, it'd need to wipe
>>> its local partition and re-sync from slave (which is now the new
>>> primary).
>>
>> The old master shouldn't need to "wipe" anything, as it should have no data
>> that the new master didn't have at the time of the power failure.
>
> I was just thinking that if the failure was, e.g., the connection  
> between master and the rest of the cluster, postgres on the old master  
> could stay up and merrily keep writing to the filesystem on the DRBD.

That can't happen, because the cluster manager should fence the "failed"
node before it starts mounting on the other node.

> In the case of power failure, that wouldn't happen, of course. But in  
> case of total power failure, the fencing device (an IPMI device, Dell  
> DRAC) would be inaccessible too, so the cluster would not fail postgres  
> over.

Hence why you want a real STONITH device if you want true reliability.

>> In the case you suggest, where the whole of node "A" disappears, you may
>> well have a fencing problem: because node "B" can't positively confirm that
>> "A" is, in fact, dead (because the DRAC went away too), it may refuse to
>> confirm the fencing operation (this is why using DRAC/IPMI as a STONITH
>> device isn't such a win).
>
> From what I'm reading, the only fencing device that's truly good is a  
> UPS that can cut power to an individual device. Unfortunately, we don't  
> have such a device and can't get one. We do have a UPS with a backup  
> generator, and dual PSUs, so total power outage is unlikely. But someone  
> could also just pull the (two) cables out of the UPS and pacemaker would  
> be none the wiser.

Managed power rails are also pretty good STONITH devices.

> What I don't get is, if this happens, why can't slave just say, "I'm  
> going to assume master is gone and take over postgres, and I'm not going  
> to let anyone else write anything to my disk". In my mind, this is  
> similar to having a shared SAN and having the fencing operation be "node  
> master is no longer allowed to mount or write to the SAN disk, even if  
> it tries".

You can't do that because it is the very definition of "split brain" --
without positive confirmation that the other node is, actually, dead, both
nodes can think the other one is dead and that it is the only living node. 
The shared SAN is a completely different situation, because you have a
*single* device that is capable of deciding who can use it, whereas there is
no single device with DRBD (which has the benefit of having no single point
of failure).

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-11 Thread Matthew Palmer
On Thu, Mar 11, 2010 at 03:34:50PM +0800, Martin Aspeli wrote:
> I was wondering, though, if fencing at the DRBD level would get around  
> the possible problem with a full power outage taking the fencing device  
> down.
>
> In my poor understanding of things, it'd work like this:
>
>  - Pacemaker runs on master and slave
>  - Master loses all power
>  - Pacemaker on slave notices something is wrong, and prepares to start  
> up postgres on slave, which will now also be the one writing to the DRBD  
> disk
>  - Before it can do that, it wants to fence off DRBD
>  - It does that by saying to the local DRBD, "even if the other node  
> tries to send you stuff, ignore it". This would avoid the risk of data  
> corruption on slave. Before master could came back up, it'd need to wipe  
> its local partition and re-sync from slave (which is now the new 
> primary).

The old master shouldn't need to "wipe" anything, as it should have no data
that the new master didn't have at the time of the power failure.

The piece of the puzzle I think you're missing is that DRBD will never be
ready for service on a node unless one of the following conditions is true:

* Both nodes have talked to each other and agreed that they're ready to
  exchange data (either because of a clean start on both sides, because
  you've manually prodded a rebooted node into operation again, or because a
  split-brain handler dealt with any issues); or

* A failed node has been successfully fenced and the cluster manager has
  notified DRBD of this fact.

In the case you suggest, where the whole of node "A" disappears, you may
well have a fencing problem: because node "B" can't positively confirm that
"A" is, in fact, dead (because the DRAC went away too), it may refuse to
confirm the fencing operation (this is why using DRAC/IPMI as a STONITH
device isn't such a win).  On the other hand, the DRAC STONITH handler may
assume that if it can't talk to a DRAC unit, that the machine is fenced (I
don't know which way it goes, I haven't looked).

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Matthew Palmer
On Wed, Mar 10, 2010 at 11:10:31PM +0800, Martin Aspeli wrote:
> Dejan Muhamedagic wrote:
>> ocfs2 introduces an extra level of complexity. You don't want
>> that unless really necessary.
>
> How would that complexity manifest?

Have you noticed the number of extra daemons and kernel bits that have to be
running in order for OCFS2 to work?  Do you know what each of those parts
does, what it's likely failure modes are, where it logs to, how to diagnose
which bit has gone wrong, exactly what has happened, and how to recover?

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Matthew Palmer
[Up-front disclaimer: I'm not a fan of cluster filesystems, having had large
chunks of my little remaining sanity shredded by GFS.  So what I say is
likely tinged with lingering loathing, although I do *try* to stay factual]

On Wed, Mar 10, 2010 at 09:01:01PM +0800, Martin Aspeli wrote:
> Matthew Palmer wrote:
>> On Wed, Mar 10, 2010 at 02:32:05PM +0800, Martin Aspeli wrote:
>>> Florian Haas wrote:
>>>> On 03/09/2010 06:07 AM, Martin Aspeli wrote:
>>>>> Hi folks,
>>>>>
>>>>> Let's say have a two-node cluster with DRBD and OCFS2, with a database
>>>>> server that's supposed to be active on one node at a time, using the
>>>>> OCFS2 partition for its data store.
>>>> *cringe* Which database is this?
>>> Postgres.
>>>
>>> Why are you cringing? From my reading, I had gathered this was a pretty
>>> common setup to support failover of Postgres without the luxury of a
>>> SAN. Are you saying it's a bad idea?
>>
>> PgSQL on top of DRBD is OK.  PgSQL on top of OCFS2 is a disaster waiting to
>> gnaw your leg off.
>
> Hah. I'm glad someone told me. ;-)
>
> Why is this?

Well, for a start you've got the problem that you could end up accidentally
running two copies of PostgreSQL on two separate machines against the same
chunk of data.  I don't know for sure, but my suspicion is that it's not
built to detect that particular case, and you'd quite possibly end up with
nasty database corruption.

Then there's the specific problems with IO on cluster filesystems.  Whilst
they do a reasonable job of doing what they do (shared access to filesystem
data), the nature of what they do is such that you can never expect the same
performance from them as a local filesystem.  Every IO operation effectively
has to be OK'd by the other machine(s) in the cluster, which is guaranteed
to slow things down.  This isn't a problem for regular file accesses --
they're rarely all that time critical -- but the sheer volume of IO
operations issued from a database (even a read-mostly DB) is going to be a
bit of a sticking point.

>>> Also note that this database will see relatively few write transactions
>>> compared to read transactions, if that makes a difference.
>>
>> Cluster filesystems suck at high IO request rates, regardless of whether
>> they're reads or writes.
>
> Gotcha - so it's mainly a performance issue?

For a rarely-used database, the performance isn't going to bite you --
although it's a ready-made and hard-to-work-around scaling bottleneck (of
which I am violently allergic), and should be avoided if you want to be
confident of any sort of reasonable performance into the future.  The thing
that would more keep me up at night would be the risk of two PgSQL instances
coming up on separate machines and lunching my data.  Data corruption gives
me the willies even more than my data centre burning down.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Matthew Palmer
On Thu, Mar 11, 2010 at 08:30:29AM +0800, Martin Aspeli wrote:
> Martin Aspeli wrote:
>> Hi folks,
>>
>> Let's say have a two-node cluster with DRBD and OCFS2, with a database
>> server that's supposed to be active on one node at a time, using the
>> OCFS2 partition for its data store.
>>
>> If we detect a failure on the active node and fail the database over to
>> the other node, we need to fence off the shared storage in case the
>> active node is still writing to it.
>>
>> Can this be done in such a way that the local DRBD/OCFS2 refuses to
>> accept writes from the now-presumed-dead node? I guess this would be
>> similar to putting an access rule on a SAN to block off the previously
>> active node from attempting to read or write any data.
>>
>> Is this feasible?
>
> We went off on a side-track, I think, but I'd still like to know the  
> answer: Can one "fence" at the DRBD level?
>
> From the thread, it sounds like we'll not use OCFS2 for the Postgres  
> data store, but would still use DRBD, e.g. with ext4 or whatever. The  
> fencing problem would then be equally, if not more, acute.
>
> It's basically between doing something at the DRBD level, if that's  
> feasible, or using the DRAC IPMI device on our server to shoot it.

Enable STONITH in Pacemaker and configure appropriate STONITH devices.  I
don't have a config in front of me to give you exact magic beans, but it's
pretty well documented.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Matthew Palmer
On Wed, Mar 10, 2010 at 11:26:41AM -, darren.mans...@opengi.co.uk wrote:
> 
> On Wed, Mar 10, 2010 at 02:32:05PM +0800, Martin Aspeli wrote:
> > Florian Haas wrote:
> >> On 03/09/2010 06:07 AM, Martin Aspeli wrote:
> >>> Hi folks,
> >>>
> >>> Let's say have a two-node cluster with DRBD and OCFS2, with a
> database
> >>> server that's supposed to be active on one node at a time, using the
> >>> OCFS2 partition for its data store.
> >> *cringe* Which database is this?
> >
> > Postgres.
> >
> > Why are you cringing? From my reading, I had gathered this was a
> pretty
> > common setup to support failover of Postgres without the luxury of a
> > SAN. Are you saying it's a bad idea?
> 
> PgSQL on top of DRBD is OK.  PgSQL on top of OCFS2 is a disaster waiting
> to
> gnaw your leg off.
> 
> 
> --
> 
> Please forgive my ignorance, I seem to have missed the specifics about
> using OCFS2 on DRBD dual-primary but what are the main issues? How can
> you use PgSQL on dual-primary without OCFS2?

You don't.  Switch to single primary and let your cluster manager take care
of DRBD demote/promote and filesystem mounting.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD and fencing

2010-03-10 Thread Matthew Palmer
On Wed, Mar 10, 2010 at 02:32:05PM +0800, Martin Aspeli wrote:
> Florian Haas wrote:
>> On 03/09/2010 06:07 AM, Martin Aspeli wrote:
>>> Hi folks,
>>>
>>> Let's say have a two-node cluster with DRBD and OCFS2, with a database
>>> server that's supposed to be active on one node at a time, using the
>>> OCFS2 partition for its data store.
>> *cringe* Which database is this?
>
> Postgres.
>
> Why are you cringing? From my reading, I had gathered this was a pretty
> common setup to support failover of Postgres without the luxury of a
> SAN. Are you saying it's a bad idea?

PgSQL on top of DRBD is OK.  PgSQL on top of OCFS2 is a disaster waiting to
gnaw your leg off.

> Mmm, you're not:  
> http://fghaas.wordpress.com/2007/06/26/when-not-to-use-drbd :-)
>
> Or is it OCFS2 you're objecting to? We're using this because there are a  
> few shared files ("blobs" in our CMS) that get written by processes on  
> both nodes. This is very infrequent, though.

Split them -- put PostgreSQL on a regular filesystem and mount it before
starting the database, and run a separate dual-primary for your blobs.

> Also note that this database will see relatively few write transactions  
> compared to read transactions, if that makes a difference.

Cluster filesystems suck at high IO request rates, regardless of whether
they're reads or writes.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Failover with multiple services on one node

2010-03-08 Thread Matthew Palmer
On Mon, Mar 08, 2010 at 03:21:32PM +0800, Martin Aspeli wrote:
> Matthew Palmer wrote:
>>> What is the normal way to handle this? Do people have one floating IP
>>> address per service?
>>
>> This is how I prefer to do it.  RFC1918 IP addresses are cheap, IPv6 address
>> quintuply so.  Having everything tied to one address causes some mayhem when
>> something falls over (failover is quick, but it ain't instantaneous), so
>> it's far better to give everything it's own address and let them drift about
>> independently.  Also, it makes load-spreading (run PgSQL on one machine,
>> memcached on the other, for instance) much easier.
>
> Pardon my Linux ignorance, but does mean we need one NIC per service as  
> well, or can we bind multiple (floating) IPs to each interface?

Yes, you can bind many IP addresses to a single NIC.  The IP address
management RAs pretty much depend on this behaviour.  What (modern) OS are
you familiar with that doesn't support this?

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Failover with multiple services on one node

2010-03-07 Thread Matthew Palmer
On Mon, Mar 08, 2010 at 01:34:01PM +0800, Martin Aspeli wrote:
> This question was sort of implied in my thread last week, but I'm going  
> to re-ask it properly, to reduce my own confusion if nothing else.
>
> We have two servers, master and slave. In the cluster, we have:

[bunchteen services, some HA, some not, one service IP]

> Each Zope instance is configured with a database connection string for  
> Postgres (e.g. postgres://192.168.245.10:5432) and a similar connection  
> string for memcached (e.g. 192.168.245.10:11211).
>
> My question is this: Do we need to group all the clustered resources  
> (the IP address, HAProxy, Postgres, memcached) so that if any one of  
> them fails, they all fail over to slave?
>
> If we don't do this, how can we manage the connection strings in Zope?  
> Since Zope needs a domain name or IP address as part of the connection  
> string, it'd be no good if, e.g. memcached failed over to slave, but the  
> IP address stayed with master, because Zope would still be looking for  
> it on master.
>
> What is the normal way to handle this? Do people have one floating IP  
> address per service?

This is how I prefer to do it.  RFC1918 IP addresses are cheap, IPv6 address
quintuply so.  Having everything tied to one address causes some mayhem when
something falls over (failover is quick, but it ain't instantaneous), so
it's far better to give everything it's own address and let them drift about
independently.  Also, it makes load-spreading (run PgSQL on one machine,
memcached on the other, for instance) much easier.

> Use groups to consider all those services together  
> at all times?

I wouldn't recommend it, given that there's no logical reason that they all
have to be together.

> Use some kind of hosts file trickery?

You *know* you're doing something wrong when hacking the hosts file is the
answer.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] ra apache and password for ssl-key

2010-03-02 Thread Matthew Palmer
On Tue, Mar 02, 2010 at 03:47:56PM +0100, Testuser  SST wrote:
> I?m running an 2-Node apache cluster and all works fine, but is there a
> way to start the apache with the supply of a needed password to start up
> the ssl-engine (there is one ssl-cert with and one without password on
> this server)

This is unrelated to Pacemaker, but there is an Apache config option to run
a script that prints the passphrase for an SSL key.  See the mod_ssl docs
for the details.  Or, alternately, just take the damn passphrase off, since
if the attacker can read the key, they can also run the script that prints
the key.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] anyone doing Xen HA with Pacemaker?

2010-02-11 Thread Matthew Palmer
On Thu, Feb 11, 2010 at 10:52:34AM +0100, Sander van Vugt wrote:
> I'm working on different Xen HA projects, but sometimes get the idea
> that I'm the only one on the planet doing such projects. Is there anyone
> on the list involved in Xen HA projects? I would appreciate having the
> opportunity to exchange some thoughts now and then. 

We're doing it at $DAYJOB.  Can't say it's been a rousing success.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Seeking advice on debian 2 node active/passive webserver

2009-12-30 Thread Matthew Palmer
On Wed, Dec 30, 2009 at 10:54:59AM +0100, f...@fredleroy.com wrote:
> Many thanks for your help !
> 
> just one question about your mysql ip.
> Do you use a dedicated ip for mysql ? Why not just refer to localhost ?

We have a strong policy of "one service, one IP", on the basis that sooner
or later we're going to want to split services onto separate machines, and
just being able to "float" the service IP somewhere else alongside the
service itself is a lot less hassle than synchronising DNS changes with
switchovers.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Seeking advice on debian 2 node active/passive webserver

2009-12-30 Thread Matthew Palmer
On Wed, Dec 30, 2009 at 10:04:43AM +0100, f...@fredleroy.com wrote:
> Hi all,
> 
> I'm a real newbie to pacemaker and after quite a few reading, I believe my
> setup would be the following :
> - 2 node cluster active/passive
> - using debian lenny, 1 nic per node, hard raid1 on each node
> - plan to use the corosync/pacemaker package
> - each node will host drdb (protocol c), ip, apache and mysql services
> - drdb will be used for apache and mysql conf and data files
> - will group ip, apache and mysql
> - will use ms for drdb
> - drdb will be using internal meta data
> - group of services will be collocated on drdb master
> - order group after drdb master.
> 
> Is this a workable setup ?

Yes.  We do very similar setups for several customers.

> I'm a bit worried about performance issues regarding disk latency,
> especially with one nic (lan is gigabit).

Uhm... disk latency with one... NIC?  Do you mean that your write IO latency
might not be so hot with a single NIC?  That is a possibility; I'd be
putting another Gb NIC in the machines and connect them with a crossover
cable.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Matthew Palmer
On Fri, Nov 20, 2009 at 03:14:16PM -0200, Alexandre Biancalana wrote:
> On Fri, Nov 20, 2009 at 2:53 PM, Matthew Palmer  wrote:
> > On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote:
> >> ?I'm building a 4 node cluster where 2 nodes will export drbd devices
> >> via ietd iscsi target (storage nodes) and other 2 nodes will run xen
> >> vm (app nodes) stored in lvm partition accessed via open-iscsi
> >> initiator, using multipath to failover.
> >>
> >> ?Configuring the cluster resources order I came up with a situation
> >> that I don't find a solution. The xen vm resources depends of iscsi
> >> initiator resource to run, I have two iscsi initiator resources, one
> >> for each storage node, how can I make the vm resources dependent on
> >> any iscsi initiator resources ?
> >
> > Personally, I think you've got the wrong design. ?I'd prefer to loosely
> > couple the storage and VM clusters, with the storage cluster exporting iSCSI
> > initiators which the VM cluster then attaches to the VMs as required. ?Put
> > the error handling for the case where the iSCSI initiator isn't available
> > for a VM into the resource agent for the VM. ?To me, this seems like a more
> > robust solution. ?Tying everything up together feels like you're asking for
> > trouble whenever any failover happens -- everything gets recalculated and 
> > the
> > cluster spends the next several minutes jiggling resources around before
> > everything settles back down again.
> 
> Hi Matt, thank you for the reply.
> 
> Ok. But if I go with your suggestion I end with the same question.

No, you don't.  Whatever you're doing, it's not what I'm suggesting.

> Having the 2 node storage cluster exporting the block device via
> iSCSI, how can I make the VM resource at the VM cluster depend on
> *any* iSCSI target exported ? The standard order configuration just
> allow dependency on *one* resource.

You.  Don't.  Specify.  A.  Resource.  Dependency.  You make your VM RA
handle iSCSI failure internally.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] resource dependency

2009-11-20 Thread Matthew Palmer
On Fri, Nov 20, 2009 at 02:42:29PM -0200, Alexandre Biancalana wrote:
>  I'm building a 4 node cluster where 2 nodes will export drbd devices
> via ietd iscsi target (storage nodes) and other 2 nodes will run xen
> vm (app nodes) stored in lvm partition accessed via open-iscsi
> initiator, using multipath to failover.
> 
>  Configuring the cluster resources order I came up with a situation
> that I don't find a solution. The xen vm resources depends of iscsi
> initiator resource to run, I have two iscsi initiator resources, one
> for each storage node, how can I make the vm resources dependent on
> any iscsi initiator resources ?

Personally, I think you've got the wrong design.  I'd prefer to loosely
couple the storage and VM clusters, with the storage cluster exporting iSCSI
initiators which the VM cluster then attaches to the VMs as required.  Put
the error handling for the case where the iSCSI initiator isn't available
for a VM into the resource agent for the VM.  To me, this seems like a more
robust solution.  Tying everything up together feels like you're asking for
trouble whenever any failover happens -- everything gets recalculated and the
cluster spends the next several minutes jiggling resources around before
everything settles back down again.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] mysql active-passive cluster with shared storage in a SAN

2009-10-26 Thread Matthew Palmer
On Mon, Oct 26, 2009 at 11:46:28AM +0100, I?aki S?nchez wrote:
> I want to set up a two node mysql cluster, active-passive with shared  
> storage in a SAN.
> I want only one node at a time to have mysqld running and mysql data  
> filesystem mounted. In case of takeover, the second node would mount the  
> filesystem and start mysql deaemon.
>
> Two questions:
> -Do I need a clustered filesystem for the data?

No.

> -Do I need STONITH?

Yes.

> Could you point me to some usefull link for my setup? (I'm a newby)

http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo is the closest I can find,
just replace DRBD management with SAN management.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Installation woes (w/Debian packages)

2009-10-16 Thread Matthew Palmer
On Fri, Oct 16, 2009 at 10:54:18AM +0200, Raoul Bhatia [IPAX] wrote:
> On 10/16/2009 09:59 AM, Matthew Palmer wrote:
> > If this were a single-machine service, I'd completely agree with you. 
> > Unfortunately, a cluster service like pacemaker needs to have absolutely
> > consistent configuration across all the nodes in the cluster, and having it
> > read off a file on disk would make that *amazingly* difficult and 
> > dangerous. 
> > I remember the fun and games I had dealing with cman (or whatever it was
> > that went with that) and it's "read an XML config file and update everyone"
> > model.  I'll take "crm configure edit" over that any day, TYVM.
> 
> to my knowledge, if no cib.xml file exists, pacemaker creates an empty
> one with epoch="0" (or similar, to my experience at least < 100 ;) )
> 
> i've done the following steps numerous times:
> 1. stop pacemaker on all nodes
> 2. erase all cib.xml related files
> 3. drop a new cib.xml into the correct directory on one node
> 4. set the correct permissions
> 5. startup all nodes
> 6. witness the new configuration unfold

And boy won't *that* work well on a live cluster.

Furrfu,
- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Installation woes (w/Debian packages)

2009-10-16 Thread Matthew Palmer
On Fri, Oct 16, 2009 at 10:50:44AM +0200, Raoul Bhatia [IPAX] wrote:
> On 10/16/2009 09:59 AM, Matthew Palmer wrote:
> >> (1 min later: http://wiki.github.com/camptocamp/puppet-pacemaker has
> >> no downloads, and no documentation; is it even remotely stable/ready
> >> for use?)
> > 
> > No idea, we wrote our own pacemaker management manifests.  I've found
> > publically-available Puppet manifests to be uniformly poor quality,
> > undocumented, site-specific crap.
> 
> matthew, is there a possibility that you either share your puppet
> configuration/manifests

That's not my call.  They're not my "property" (they're owned by my
employer), and I didn't write the vast majority of them myself anyway.

> and/or contribute to the puppet-pacemaker project?

Having just taken a look at the puppet-pacemaker project out of sheer
perverse curiosity, I can now say with authority that that's never going to
happen.  E-mail me privately if you'd like a frank list of what I think is
wrong -- or just look for stuff by me in the puppet list archives on the
subject of global variables, amongst other things.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Installation woes (w/Debian packages)

2009-10-16 Thread Matthew Palmer
On Fri, Oct 16, 2009 at 09:23:41AM +0200, Colin wrote:
> On Thu, Oct 15, 2009 at 10:51 AM, Matthew Palmer  wrote:
> > On Thu, Oct 15, 2009 at 10:07:56AM +0200, Colin wrote:
> >> Another question regarding how to activate a pacemaker config: Is
> >> there any way to activate the config before the cluster starts up?
> >>
> >> (Scenario is that the installation of the cluster nodes is fully
> >> automatic. It seems a bit awkward how to configure pacemaker if I
> >> can't just write out a config file during install: I need to somehow
> >> make sure that on first system boot a script that activates my config
> >> is executed, but not too early because it takes a minute or so until
> >> cibadmin(1) and friends actually work...)
> >
> > I believe you can drop a cib.xml into place before the cluster first starts
> > and it'll pick up and run with that. ?I'm not a fan of that method, though,
> > as it has all the same problems as imaging machines (no easy means of
> > updating running configs the same way as you update "initial" configs, and
> > so on). ?We're configuring pacemaker using Puppet, just describing the
> > primitives, groups, constraints and so on in the manifest and having Puppet
> > do all the heavy lifting if required.
> 
> Thanks for the note, plus: I've never heard of Puppet, but will check it out.
> 
> (1 min later: http://wiki.github.com/camptocamp/puppet-pacemaker has
> no downloads, and no documentation; is it even remotely stable/ready
> for use?)

No idea, we wrote our own pacemaker management manifests.  I've found
publically-available Puppet manifests to be uniformly poor quality,
undocumented, site-specific crap.

> > Why do you need to have the config setup completely before starting
> > the cluster, though?
> 
> Let's just say I like my programs/daemons to start up with the correct
> configuration, because I've already been burnt: Some time back there
> was a similar problem with a different application where the default
> that it started up with just didn't work correctly; it's always easier
> when a program/daemon just reads a config file, and monitors it for
> changes (or re-reads it on HUP), these application-specific ways of
> feeding a config into an already running program are particular
> annoying because every program uses a different method for it.

If this were a single-machine service, I'd completely agree with you. 
Unfortunately, a cluster service like pacemaker needs to have absolutely
consistent configuration across all the nodes in the cluster, and having it
read off a file on disk would make that *amazingly* difficult and dangerous. 
I remember the fun and games I had dealing with cman (or whatever it was
that went with that) and it's "read an XML config file and update everyone"
model.  I'll take "crm configure edit" over that any day, TYVM.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Installation woes (w/Debian packages)

2009-10-15 Thread Matthew Palmer
On Thu, Oct 15, 2009 at 10:07:56AM +0200, Colin wrote:
> On Sun, Oct 11, 2009 at 9:13 PM, Andrew Beekhof  wrote:
> > On Fri, Oct 9, 2009 at 3:12 PM, Colin  wrote:
> >> The config explained document is excellent -- once everything is up
> >> and running to arrive at "its level".
> >
> > Agreed. ?I've started working on some howtos to fill the gap, but it
> > will take time :-)
> 
> Another question regarding how to activate a pacemaker config: Is
> there any way to activate the config before the cluster starts up?
> 
> (Scenario is that the installation of the cluster nodes is fully
> automatic. It seems a bit awkward how to configure pacemaker if I
> can't just write out a config file during install: I need to somehow
> make sure that on first system boot a script that activates my config
> is executed, but not too early because it takes a minute or so until
> cibadmin(1) and friends actually work...)

I believe you can drop a cib.xml into place before the cluster first starts
and it'll pick up and run with that.  I'm not a fan of that method, though,
as it has all the same problems as imaging machines (no easy means of
updating running configs the same way as you update "initial" configs, and
so on).  We're configuring pacemaker using Puppet, just describing the
primitives, groups, constraints and so on in the manifest and having Puppet
do all the heavy lifting if required.

Why do you need to have the config setup completely before starting
the cluster, though?

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Low cost stonith device

2009-10-05 Thread Matthew Palmer
On Mon, Oct 05, 2009 at 02:39:19PM +0200, Florian Haas wrote:
> And whether or not these node names are fully-qualified or not is
> actually not up to the user, but depends on the distro used. That was my
> point. :)

On the contrary, all my (Debian) pacemaker nodes have their FQDN as the node
name.  The distro may provide a default, but it is entirely within the power
of the administrator to change that default.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Stickiness, scoring, and the startled herd

2009-09-28 Thread Matthew Palmer
On Mon, Sep 28, 2009 at 09:36:48AM +0200, Johan Verrept wrote:
> On Sun, 2009-09-27 at 16:32 +1000, Matthew Palmer wrote:
> > On a related topic, is there any way to find out what the cluster's scores
> > for all resources are, and how it came to calculate those scores?  The logs
> > are full of trivia, but they lack this sort of really useful detail.  I
> > assume there'd be some sort of tool I could run against a CIB XML file and
> > see what's going on, but I can't seem to find anything.
> 
> ptest -LsV

Oh, *SHINY*.

Thanks,
- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Stickiness, scoring, and the startled herd

2009-09-26 Thread Matthew Palmer
Hi all,

I've got a cluster of three Xen dom0s (running VMs managed by pacemaker with
DRBD in the dom0s for the VM disks) that I'm trying to get working in a
stable fashion, but I'm having a hard time avoiding what I've dubbed the
"startled herd" problem.

Basically, once the allocation of VMs is in a stable state, the whole
cluster sits there quite happily and the VMs run nicely.  However, the
moment *anything* about the cluster changes (adding a new VM, a new
constraint on a VM -- practically *anything*) then all of the VMs start
stopping and starting themselves, and everything just gets badly out of hand
for a few minutes.  Obviously, this isn't particularly highly available, and
I really need to stop it.

It seemed like stickiness was the solution to my problem -- crank the
stickiness up high enough, and things have to stay put.  However, even with
a stickiness of 10, the damn things just won't stay put.

As an example: One of the servers (xen1) got rebooted, and everything moved
around.  So, I cranked up the stickiness to 99 in an attempt to keep
everything where it was -- and when xen1 came back, everything stayed where
it was (WIN!).  But then I inserted the location rule for one of the VMs to
move to xen1, and it didn't move.  OK, fair enough, I've effectively made
everything infinitely sticky -- so I dropped the stickiness on everything
back to 10, and *bam* next thing I know I've got 3 VMs on xen1, an extra
2 VMs on xen2, and now xen3 is completely empty.

How the hell did that happen?  What am I doing wrong?

On a related topic, is there any way to find out what the cluster's scores
for all resources are, and how it came to calculate those scores?  The logs
are full of trivia, but they lack this sort of really useful detail.  I
assume there'd be some sort of tool I could run against a CIB XML file and
see what's going on, but I can't seem to find anything.

System info: pacemaker 1.0.4, Heartbeat 2.99.2+sles11r1, running on Debian
Lenny.

Any help greatly appreciated -- whether it be docs (I've read Configuration
Explained and Colocation Explained, but there doesn't seem to be much else
out there), extra diagnostic commands I can run to examine the system state,
or just a simple "you need to set fooble to 17".

For reference, here's the output of crm configure show as it currently
stands (I'd provide log output, except there's so much of it I have no idea
what to chop -- I can't even recognise where one thing ends and the next
begins):

node $id="046fdbe2-40a8-41a8-bfd9-a62504fe7954" xen3
node $id="6267c66f-5824-4da6-b9f7-3ddfff35aab3" xen2
node $id="80177967-82ec-415b-b6ff-ec3a9de315c7" xen1
primitive vm1_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm1_disk" resource-stickiness="10"
primitive vm1_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm1.cfg" resource-stickiness="10"
primitive vm2_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm2_disk" resource-stickiness="10"
primitive vm2_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm2.cfg" resource-stickiness="10"
primitive vm3_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm3_disk" resource-stickiness="10"
primitive vm3_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm3.cfg" resource-stickiness="10"
primitive vm4_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm4_disk" resource-stickiness="10"
primitive vm4_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm4.cfg" resource-stickiness="10"
primitive vm5_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm5_disk" resource-stickiness="10"
primitive vm5_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm5.cfg" resource-stickiness="10"
primitive vm6_disk ocf:linbit:drbd \
op monitor interval="10s" \
params drbd_resource="vm6_disk" resource-stickiness="10"
primitive vm6_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
op stop interval="0" timeout="300s" \
params xmfile="/etc/xen/vm6.cfg" resource-stickiness="10"
primitive vm7_disk ocf:linbit:drbd \
op monitor interval="10s" \
params resource-stickiness="10" drbd_resource="vm7_disk"
primitive vm7_vm ocf:heartbeat:Xen \
op monitor interval="10s" \
params resource-stickiness="10" xmfile="/etc/xen/vm7.cfg"
primitive vm8_disk ocf:linbit:drbd \
op monitor interval=

Re: [Pacemaker] A function demand of the new environment.

2009-09-21 Thread Matthew Palmer
On Mon, Sep 21, 2009 at 09:47:56PM +0900, renayama19661...@ybb.ne.jp wrote:
> I understand that I can come true by the method that you showed enough.
> 
> However, we wanted to do respawn under the cluster software if possible.

Why, though?  It's not the right solution to the problem.

- Matt

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker