On 12/01/2012 08:53 AM, David Coulson wrote: > > On 12/1/12 8:48 AM, Hermes Flying wrote: >> Great help! Please allow me to trouble you with one last question. >> >> If I get this, when I use fencing and the corosync fails then linux-2 >> will attempt to crash linux-1 and take over. At this point though >> linux-1 won't try to do anything right? Since it knows it is the >> primary, I mean. > > linux-1 will be powered off or crashed, so i think that speaks for itself.
Pacemaker does not inherently have a concept of Primary/Secondary. It can be configured to operate in such a mode, or it can be active/active, n-1, n-n, etc. It all comes down to how each given resource is configured. You can have multiple resources in multiple configurations on the same instance. For example, you may have a DRBD resource set to run on both nodes (active/active), and a VM that runs on only one node at a time (active/passive) that uses the DRBD storage. So, given this, there is no inherent mechanism to say "always fence node X". So imagine you have two nodes and they lose communication; Both will think the other has failed and both will try to fence it's peer. The classic analogy is "an old west shoot-out"; Fastest node lives, slower nodes gets powered off. You can help ensure one node has a better chance of winning by setting a delay on it. So it the above config, if you had set a 5 second delay against node 1, then it would get a 5 second head start in fencing node 2 (node 2 will wait the set time before trying to fence node 1). In this way, you can help ensure one node lives in such a case. Sometime a delay is needed in any case because in some fence methods, like IPMI-based devices, it's technically possible that both nodes will get their fence call out before dieing, leaving both nodes powered off. Setting a delay will protect against this. >> Then you say:"Any resource previously running on linux-1 will be >> started on linux-2." >> Now at this point: By resource you mean only pacemaker and its related >> modules, right? Because I want Tomcat to be up and running and >> receiving requests in Linux-2 as well, which will be forwarded by load >> balancer of linux-1. Is this correct? > > I mean 'resources managed by pacemaker'. So if you VIP was running on > linux-1, and it fails, and linux-2 fences it, the only place the VIP can > run is linux-2. linux-1 is totally down. >> >> Also in your setup of 2 NICs or 2 switches I assume that the idea is >> that the probability of split-brain due to network failure is very low >> right? Because I have read that it is not possible to avoid >> split-brain without adding a third node. But I may be misunderstanding >> this > A third node will eliminate split brain by definition, as quorum will > only be obtained if a minimum of two nodes are available. > > If you have a diverse network configuration and good change management, > you're probably not going to experience a split brain unless you have a > substantial environment failure that will probably impact your client > ability to access anything. Since you are not running shared storage, > you're not going to experience data loss which is typically the biggest > concern with split brain. A couple of notes; Only mode=1 bonding (active/passive) works reliably. No other mode is supported (and I've tested all and found all other modes would fail). Quorum comes into play with 3+ nodes, but it does not eliminate the need for fencing. Imagine that a node was in the middle of a write to shared storage and then hung totally. The other two nodes declare it as failed, don't fence it, and proceed to clean up the shared storage to return it to a consistent state. Time passes and new data gets written to the same area of shared storage. Then the hung node thaws, has no idea time has passed, and just finishes the write thinking it's locks are still valid and that it still has quorum. Voila! Corrupt storage, despite quorum. Clustering all all about protecting against failures. Hand-waving a failure as unlikely is not a good practice in HA clustering. If it's possible, plan for it. :) -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems