On 12/01/2012 08:53 AM, David Coulson wrote:
> 
> On 12/1/12 8:48 AM, Hermes Flying wrote:
>> Great help! Please allow me to trouble you with one last question.
>>
>> If I get this, when I use fencing and the corosync fails then linux-2
>> will attempt to crash linux-1 and take over. At this point though
>> linux-1 won't try to do anything right? Since it knows it is the
>> primary, I mean.
> 
> linux-1 will be powered off or crashed, so i think that speaks for itself.

Pacemaker does not inherently have a concept of Primary/Secondary. It
can be configured to operate in such a mode, or it can be active/active,
n-1, n-n, etc. It all comes down to how each given resource is
configured. You can have multiple resources in multiple configurations
on the same instance. For example, you may have a DRBD resource set to
run on both nodes (active/active), and a VM that runs on only one node
at a time (active/passive) that uses the DRBD storage.

So, given this, there is no inherent mechanism to say "always fence node
X".

So imagine you have two nodes and they lose communication; Both will
think the other has failed and both will try to fence it's peer. The
classic analogy is "an old west shoot-out"; Fastest node lives, slower
nodes gets powered off.

You can help ensure one node has a better chance of winning by setting a
delay on it. So it the above config, if you had set a 5 second delay
against node 1, then it would get a 5 second head start in fencing node
2 (node 2 will wait the set time before trying to fence node 1). In this
way, you can help ensure one node lives in such a case.

Sometime a delay is needed in any case because in some fence methods,
like IPMI-based devices, it's technically possible that both nodes will
get their fence call out before dieing, leaving both nodes powered off.
Setting a delay will protect against this.

>> Then you say:"Any resource previously running on linux-1 will be
>> started on linux-2."
>> Now at this point: By resource you mean only pacemaker and its related
>> modules, right? Because I want  Tomcat to be up and running and
>> receiving requests in Linux-2 as well, which will be forwarded by load
>> balancer of linux-1. Is this correct?
> 
> I mean 'resources managed by pacemaker'. So if you VIP was running on
> linux-1, and it fails, and linux-2 fences it, the only place the VIP can
> run is linux-2. linux-1 is totally down.
>>
>> Also in your setup of 2 NICs or 2 switches I assume that the idea is
>> that the probability of split-brain due to network failure is very low
>> right? Because I have read that it is not possible to avoid
>> split-brain without adding a third node. But I may be misunderstanding
>> this
> A third node will eliminate split brain by definition, as quorum will
> only be obtained if a minimum of two nodes are available.
> 
> If you have a diverse network configuration and good change management,
> you're probably not going to experience a split brain unless you have a
> substantial environment failure that will probably impact your client
> ability to access anything. Since you are not running shared storage,
> you're not going to experience data loss which is typically the biggest
> concern with split brain.

A couple of notes;

Only mode=1 bonding (active/passive) works reliably. No other mode is
supported (and I've tested all and found all other modes would fail).

Quorum comes into play with 3+ nodes, but it does not eliminate the need
for fencing. Imagine that a node was in the middle of a write to shared
storage and then hung totally. The other two nodes declare it as failed,
don't fence it, and proceed to clean up the shared storage to return it
to a consistent state. Time passes and new data gets written to the same
area of shared storage. Then the hung node thaws, has no idea time has
passed, and just finishes the write thinking it's locks are still valid
and that it still has quorum. Voila! Corrupt storage, despite quorum.

Clustering all all about protecting against failures. Hand-waving a
failure as unlikely is not a good practice in HA clustering. If it's
possible, plan for it. :)

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to