Re: [ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-11-02 Thread Jean-Marc Saffroy
Replying to myself:

On Wed, 11 Oct 2017, Jean-Marc Saffroy wrote:

> I am caught by surprise with this behaviour of DLM:
> - I have 5 nodes (test VMs)
> - 3 of them have 1 vote for the corosync quorum (they are "voters")
> - 2 of them have 0 vote ("non-voters")
> 
> So the corosync quorum is 2.
> 
> On the non-voters, I run DLM and an application that runs it. On DLM, 
> fencing is disabled.
> 
> Now, if I stop corosync on 2 of the voters:
> - as expected, corosync says "Activity blocked"
> - but to my surprise, DLM seems happy to give more locks
> 
> Shouldn't DLM block lock requests in this situation?

Apparently DLM does not care about changes in quorum until there are 
changes in membership of the process groups it is part of. In my test, the 
"voters" do not run DLM, and therefore (I suppose?) DLM does not react to 
their absence.

DLM does block lock requests when quorum is lost AND THEN there is a 
change in membership for the DLM participants, because quorum is required 
for lockspace operations.

Does that make sense?


Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Corosync lost quorum but DLM still gives locks

2017-10-11 Thread Jean-Marc Saffroy
Hi,

I am caught by surprise with this behaviour of DLM:
- I have 5 nodes (test VMs)
- 3 of them have 1 vote for the corosync quorum (they are "voters")
- 2 of them have 0 vote ("non-voters")

So the corosync quorum is 2.

On the non-voters, I run DLM and an application that runs it. On DLM, 
fencing is disabled.

Now, if I stop corosync on 2 of the voters:
- as expected, corosync says "Activity blocked"
- but to my surprise, DLM seems happy to give more locks

Shouldn't DLM block lock requests in this situation?


Cheers,
JM

-- 

[root@vm4 ~]# corosync-quorumtool 
Quorum information
--
Date: Wed Oct 11 20:29:52 2017
Quorum provider:  corosync_votequorum
Nodes:3
Node ID:  5
Ring ID:  3/24
Quorate:  No

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  1
Quorum:   2 Activity blocked
Flags:

Membership information
--
Nodeid  Votes Name
 3  1 172.16.2.33
 4  0 172.16.3.33
 5  0 172.16.4.33 (local)

[root@vm4 ~]# dlm_tool status
cluster nodeid 5 quorate 0 ring seq 24 24
daemon now 6908 fence_pid 0 
node 4 M add 4912 rem 0 fail 0 fence 0 at 0 0
node 5 M add 4912 rem 0 fail 0 fence 0 at 0 0

[root@vm4 ~]# corosync-cpgtool 
Group Name PID Node ID
dlm:ls:XYZ\x00
   971   4 (172.16.3.33)
 10095   5 (172.16.4.33)
dlm:controld\x00
   971   4 (172.16.3.33)
 10095   5 (172.16.4.33)

[root@vm4 ~]# cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 

[root@vm4 ~]# rpm -q corosync dlm
corosync-2.4.0-9.el7_4.2.x86_64
dlm-4.0.7-1.el7.x86_64

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-04 Thread Jean-Marc Saffroy
On Wed, 4 Oct 2017, Jan Friesse wrote:

> > Could you clarify the formula for me? I don't see how "- 2" and "650" 
> > map to this configuration.
> 
> Since Corosync 2.3.4 when nodelist is used, totem.token is used only as 
> a basis for calculating real token timeout. You can check corosync.conf 
> man page for more information and formula.

A-ha! I was looking for that in the corosync.conf man page shipped with 
Ubuntu 14, which of course ships corosync 2.3.3. Silly me!

So with the right man page, that's indeed spelled out under 
"token_coefficient". Thanks!

> > And I suppose that on our bigger system (20+5 servers) we need to 
> > greatly increase the consensus timeout.
> 
> Consensus timeout reflects token value so if it is not defined in config 
> file it's computed as token * 1.2. This is not reflected in manpage and 
> needs to be fixed.

Actually the man page I see for 2.4.2 does mention this :) so I guess we 
should simply comment out our setting for "consensus".


Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-03 Thread Jean-Marc Saffroy
Hi Jan,

On Tue, 3 Oct 2017, Jan Friesse wrote:

> > I hope this makes sense! :)
> 
> I would still have some questions :) but that is really not related to 
> the problem you have.

Questions are welcome! I am new to this stack, so there is certainly room 
for learning and for improvement.

> My personal favorite is consensus timeout. Because you've set (and I 
> must say according to doc correctly) consensus timeout to 3600 (= 1.2 * 
> token). Problem is, that result token timeout is not 3000, but with 5 
> nodes it is actually 3000 (base token) + (no_nodes - 2) * 650 ms = 4950 
> (as you can check by observing runtime.config.totem.token key). So it 
> may make sense to set consensus timeout to ~6000.

Could you clarify the formula for me? I don't see how "- 2" and "650" map 
to this configuration.

And I suppose that on our bigger system (20+5 servers) we need to greatly 
increase the consensus timeout.

Overall, tuning the timeouts seems related to be Black Magic. ;) I liked 
the idea suggested in an old thread that there would be a spreadsheet (or 
even just plain formulas) exposing the relation between the various knobs.

One thing I wonder is: would it make sense to annotate the state machine 
diagram in the Totem paper (page 15 of 
http://www.cs.jhu.edu/~yairamir/tocs.ps.gz) with those tunables? Assuming 
the paper still reflects the behavior of the current code.

> This doesn't change the fact that "bug" is reproducible even with 
> "correct" consensus, so I will continue working on this issue.

Great! Thanks for taking the time to investigate.


Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-10-02 Thread Jean-Marc Saffroy
On Mon, 2 Oct 2017, Jan Friesse wrote:

> > We had one problem on a real deployment of DLM+corosync (5 voters and 20
> > non-voters, with dlm on those 20, for a specific application that uses
> 
> What you mean by voters and non-voters? There is 25 nodes in total and 
> each of them is running corosync?

Yes, there are 25 servers running corosync:

- 5 are configured to have one vote for quorum, on these servers corosync 
serves no other purpose

- 20 have zero vote for quorum, and these servers also run DLM and the 
application that uses DLM

The intent with this configuration is:

- to avoid split brain in case of network partition: application servers 
must be in the same partition as the quorum majority (so, 3 of the 5 
"voters") to carry on their operations

- to allow independent failure of any number of application servers

I hope this makes sense! :)

> > libdlm). On a reboot of one server running just corosync (which thus did
> > NOT run dlm), a large number of other servers got briefly evicted from the
> 
> This is kind of weird. AFAIK DLM is joining to CPG group and using CPG
> membership. So if DLM was not running on the node then other nodes joined to
> DLM CPG group should not even notice its leave.

Indeed, but we saw "Process pause detected" on all servers, and corosync 
temporarily formed an operational cluster excluding most of the 
"non-voters" (those with zero quorum vote). Then most servers joined back, 
but then DLM complained about the "stateful merge".

> What you mean by zero vote? You mean DLM vote or corosync number of 
> votes (related to quorum)?

I mean the vote in the corosync quorum, I'm not aware of anything like 
that with DLM (or maybe you could think of the per-server weight when one 
manually defines servers that master locks in a lock space, but we don't 
use that).

> I've tried to reproduce the problem and I was not successful with 3 
> nodes cluster using more or less default config (not changing 
> join/consensus/...). I'll try 5 nodes possibly with totem values and see 
> if problem appears.

I've tried again today, and first with just 3 servers (VMs), using the 
same config I sent earlier (which has 3 nodes with 1 vote, 2 nodes with 0 
vote), I could no longer reproduce either. Then I spawned 2 more VMs and 
had them join the existing 3-node cluster (those I added were the 2 
servers with 0 vote), and then I saw the "Process pause ..." log line. And 
now I have stopped the last 2 servers, and I am back to just 3, and I keep 
seeing that log line.

If you're still curious and if that's useful, I can try to reproduce on a 
set of VMs where I could give you full ssh access.


Thanks!

Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Is "Process pause detected" triggered too easily?

2017-09-27 Thread Jean-Marc Saffroy
On Wed, 27 Sep 2017, Jan Friesse wrote:

> I don't think scheduling is the case. If scheduler would be the case 
> other message (Corosync main process was not scheduled for ...) would 
> kick in. This looks more like a something is blocked in totemsrp.

Ah, interesting!

> > Also, it looks like the side effect is that corosync drops important
> > messages (I think "join" messages?), and I fear that this can lead to
> 
> You mean membership join messages? Because there are a lot (327) of them 
> in log you've sent.

Yes. In my test setup I didn't see any issue where we lost membership join 
messages, but the reason why I am looking into this is this:

We had one problem on a real deployment of DLM+corosync (5 voters and 20 
non-voters, with dlm on those 20, for a specific application that uses 
libdlm). On a reboot of one server running just corosync (which thus did 
NOT run dlm), a large number of other servers got briefly evicted from the 
corosync ring; and when rejoining, dlm complained about a "stateful merge" 
which forces a reboot. Note, dlm fencing is disabled.

In that system, it was "legal" for corosync to kick out these servers 
(they had zero vote), but it was highly unexpected (they were running 
fine) and the impact is high (reboot).

We did see "Process pause detected" in the logs on that system when the 
incident happened, which is why I think could be a clue.

> I'll definitively try to reproduce this bug and let you know. I don't 
> think any message get lost, but it's better to be on a safe side.

Thanks!


Cheers,
JM

-- 
saff...@gmail.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org