Re: May violate the ZAB agreement -- version 3.6.1

li xun Fri, 28 Aug 2020 22:03:32 -0700

Hi hanm

Thanks

This is the issue in jira  https://issues.apache.org/jira/browse/ZOOKEEPER-3911 
<https://issues.apache.org/jira/browse/ZOOKEEPER-3911>

———————————————————————————————————————————————————————

Below are my thoughts

Before the server becomes the real leader, the follower needs to synchronize 
data with the leader. When encountering big data, it will be very slow, causing 
the server to be temporarily unavailable. Can the leader communicate with the 
follower before the synchronization starts, and calculate the maximum zxid_n 
[reference 1] in the proposal owned by the leader that has reached the quorum, 
and then allow the leader to immediately be able to access externally, but only 
access <=zxid_n Data (such as webapp, which can access the leader, which 
reduces the time that zk is inaccessible), there may be two solutions for 
follower
1) Since the follower has not synchronized the data, external webpp access is 
temporarily not allowed, so that even if the data that the follower needs to 
synchronize is large, it will not affect the external service provided by zk. 
But disadvantages: access pressure will be concentrated in the leader, at this 
time the entire cluster does not have the characteristics of distributed, prone 
to single point of failure
2) The follower immediately provides services to the outside world, but since 
the follower has not synchronized with the leader, if the follower has just 
experienced a restart, then the follower cannot confirm that it currently holds 
the largest zxid_x that has reached the quorum, and may need the follower to do 
it once Additional inquiry to confirm whether zxid_x reaches a quorum. (Or make 
a separate flag for zxid to indicate whether a certain zxid reaches a quorum) 
Then follower provides access to the outside, only access <=zxid_x
Disadvantages: complex implementation and increased communication volume

Reference 1: from <paxos made simple> Leslie Lamport 01 Nov 2001
"
2.3 Learning a Chosen Value
To learn that a value has been chosen, a learner must find out that a pro- 
posal has been accepted by a majority of acceptors. The obvious algorithm is to 
have each acceptor, whenever it accepts a proposal, respond to all learners, 
sending them the proposal. This allows learners to find out about a chosen 
value as soon as possible, but it requires each acceptor to respond to each 
learner—a number of responses equal to the product of the number of acceptors 
and the number of learners.
The assumption of non-Byzantine failures makes it easy for one learner to find 
out from another learner that a value has been accepted. We can have the 
acceptors respond with their acceptances to a distinguished learner, which in 
turn informs the other learners when a value has been chosen. This approach 
requires an extra round for all the learners to discover the chosen value. It 
is also less reliable, since the distinguished learner could fail. But it 
requires a number of responses equal only to the sum of the number of acceptors 
and the number of learners.
More generally, the acceptors could respond with their acceptances to some set 
of distinguished learners, each of which can then inform all the learners when 
a value has been chosen. Using a larger set of distinguished
learners provides greater reliability at the cost of greater communication 
complexity.
Because of message loss, a value could be chosen with no learner ever finding 
out. The learner could ask the acceptors what proposals they have accepted, but 
failure of an acceptor could make it impossible to know whether or not a 
majority had accepted a particular proposal . In that case, learners will find 
out what value is chosen only when a new proposal is chosen. If a learner needs 
to know whether a value has been chosen, it can have a proposer issue a 
proposal, using the algorithm described above.
“

Best，
li xun 

> 2020年8月29日 10:59，Michael Han <h...@apache.org> 写道：
> 
> Hi Xun,
> 
> I think this is a bug, your test case is sound to me. Do you mind
> creating a JIRA for this issue?
> 
> Followers should not ACK NEWLEADER without ACK every transaction from the
> DIFF sync. To ACK every transaction, a follower either persists the
> transaction in log, or takes a snapshot before sending the ACK of the
> NEWLEADER (which we did, before ZOOKEEPER-2678 where the snapshot
> optimization was introduced).
> 
> A potential fix I have in mind is to make sure to persist all DIFF sync
> proposals from LEADER (similar to what we are already doing for proposals
> coming between NEWLEADER and UPTODATE). By doing so, when the leader
> receives NEWLEADER ACK from a quorum, it's guaranteed that
> every transaction leader DIFF sync to follower is quorum committed. Thus
> there will not be inconsistent views moving forward. Alternatively we can
> take a snapshot before ACK NEWLEADER but that will be a big performance hit
> for big data trees.
> 
> I am also interested to hear what others think about this.
> 
> On Fri, Aug 28, 2020 at 12:20 AM li xun <274952...@qq.com> wrote:
> 
>> There is a example in the link, would you understand what I mean？
>> 
>> 
>> 
>> https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing
>> 
>> Since version 3.4, the quorum of followers and the leader did not
>> synchronize the files immediately when the synchronization was completed,
>> and the data was not persisted to the files in an instant, and at this time
>> the zk server can provide external access, such as webapp access, if it
>> appears at this time Failure, phantom reading may occur
>> 
>> 
>>> 2020年8月28日 14:51，Justin Ling Mao <maoling199210...@sina.com> 写道：
>>> 
>>> @李珣The situation you describe may have conceptual deviations about how
>> the consensus protocol works:---> Since the data of the follower when the
>> follower uses the DIFF method to synchronize with the leader is still in
>> the memory, it has not had time to persist1. The write path is: write
>> transaction log(WAL) firstly, after reaching a consensus, then apply to
>> memory, other than the opposite.
>>> ---> but at this time, the latest zxid_n of the leader has not been
>> supported by the quorum of the follower. At this time, if a client connects
>> to the leader and sees zxid_n,2. If a write has not been supported by the
>> quorum, it's not safe to apply to the state machine and the client is not
>> able to see this write.
>>> I guess that your question may be: how the system handles the
>> uncommitted logs when leader changes?
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Ted Dunning <ted.dunn...@gmail.com>
>>> To: dev@zookeeper.apache.org
>>> Subject: Re: May violate the ZAB agreement -- version 3.6.1
>>> Date: 2020-08-28 01:25
>>> 
>>> How is it that participant A would have a later zxid than the leader?
>>> In particular, it seems to me that it should be impossible to have these
>>> two facts be true:
>>> 1) a transaction has been committed with zxid = z_0. This implies that a
>>> quorum of the cluster has accepted this transaction and it has been
>>> committed.
>>> 2) a new leader election nominates a leader with latest zxid < z_0.
>>> My reasoning is that any new leader election has to involve a quorum and
>> at
>>> least a sufficient number of that quorum must have accepted zxid >= z_0
>> and
>>> therefore would refuse to be part of the quorum (this is a
>> contradiction).
>>> Thus, no leader could be elected with zxid < z_0 if fact (1) is true.
>>> What you are describing seems to require both of these facts.
>>> Perhaps I am missing something about your suggested scenario. Could you
>>> describe what you are thinking in more detail?
>>> On Thu, Aug 27, 2020 at 2:08 AM 李珣 <274952...@qq.com> wrote:
>>>> version 3.6.1
>>>> org.apache.zookeeper.server.quorum.Learner.java line:605
>>>> Suppose there is a situation
>>>> zxid_n is the largest zxid of Participant A (the leader has just resumed
>>>> from downtime). Zxid_n has not been recognized by the quorum. Assuming
>>>> Participant A is elected as the Leader, then if a follower appears to
>> use
>>>> DIFF to synchronize data with the Leader, Leader After sending the
>>>> UPTODATE, the leader can already provide external access, but at this
>> time,
>>>> the latest zxid_n of the leader has not been supported by the quorum of
>> the
>>>> follower. At this time, if a client connects to the leader and sees
>> zxid_n,
>>>> then at this time both the leader and the follower are down. For some
>>>> reason, the leader cannot be started, and the follower can start
>> normally.
>>>> At this time, a new leader can only be elected from the follower. Since
>> the
>>>> data of the follower when the follower uses the DIFF method to
>> synchronize
>>>> with the leader is still in the memory, it has not had time to persist,
>>>> then this The newly elected leader does not have the data of zxid_n, but
>>>> before zxid_n has been seen by the client on the old leader, there will
>> be
>>>> inconsistencies in the data view.
>>>> Is the above situation possible?
>> 
>>

Re: May violate the ZAB agreement -- version 3.6.1

Reply via email to