Christopher,

If I ignore pacemaker's existence, and just run corosync, corosync
disagrees about node membership in the situation presented in the first
email. While it's true that stonith just happens to quickly correct the
situation after it occurs it still smells like a bug in the case where
corosync in used in isolation. Corosync is after all a membership and
total ordering protocol, and the nodes in the cluster are unable to
agree on membership.

The Totem protocol specifies a ring_id in the token passed in a ring.
Since all of the 3 nodes but one have formed a new ring with a new id
how is it that the single node can survive in a ring with no other
members passing a token with the old ring_id?

Are there network failure situations that can fool the Totem membership
protocol or is this an implementation problem? I don't see how it could

The main problem (as you noted in original mail) is really about blocking only one direction (input one). This is called byzantine failure and it's something what corosync is unable to handle. Totem was simply never designed to solve byzantine failures.

Regards,
  Honza


not be one or the other, and it's bad either way.

On Thu, Mar 17, 2016, at 02:08 PM, Digimer wrote:
On 17/03/16 01:57 PM, vija ar wrote:
root file system is fine ...

but fencing is not a necessity a cluster shld function without it .. i
see the issue with corosync which has all been .. a inherent way of not
working neatly or smoothly ..

Absolutely wrong.

If you have a service that can run on both/all nodes at the same time
without coordination, you don't need a cluster, just run your services
everywhere.

If that's not the case, then you need fencing so that the (surviving)
node(s) can be sure that they know where services are and are not
running.

for e.g. take an issue where the live node is hung in db cluster .. now
db perspective transactions r not happening and tht is fine as the node
is having some issue .. now there is no need to fence this hung node but
just to switch over to passive one .. but tht doesnt happens and fencing
takes place either by reboot or shut .. which further makes the DB dirty
or far more than tht in non-recoverable state which wouldnt have happen
if a normal switch to other node as in cluster would have happened ...

i see fencing is not a solution its only required to forcefully take
control which is not the case always

On Thu, Mar 17, 2016 at 12:49 PM, Ulrich Windl
<ulrich.wi...@rz.uni-regensburg.de
<mailto:ulrich.wi...@rz.uni-regensburg.de>> wrote:

     >>> Christopher Harvey <c...@eml.cc> schrieb am 16.03.2016 um 21:04
     in Nachricht
     <1458158684.122207.551267810.11f73...@webmail.messagingengine.com
     <mailto:1458158684.122207.551267810.11f73...@webmail.messagingengine.com>>:
     [...]
     >> > Would stonith solve this problem, or does this look like a bug?
     >>
     >> It should, that is its job.
     >
     > is there some log I can enable that would say
     > "ERROR: hey, I would use stonith here, but you have it disabled! your
     > warranty is void past this point! do not pass go, do not file a bug"?

     What should the kernel say during boot if the user has not defined a
     root file system?

     Maybe the "stonith-enabled=false" setting should be called either
     "data-corruption-mode=true" or "hang-forever-on-error=true" ;-)

     Regards,
     Ulrich



     _______________________________________________
     Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org>
     http://clusterlabs.org/mailman/listinfo/users

     Project Home: http://www.clusterlabs.org
     Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
     Bugs: http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to