[
https://issues.apache.org/jira/browse/ZOOKEEPER-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859210#comment-15859210
]
Kfir Lev-Ari edited comment on ZOOKEEPER-2684 at 2/9/17 8:41 AM:
-----------------------------------------------------------------
[~nerdyyatrice], can you please describe the scenario in which the same request
is processed in the queue twice?
As I see it, if a request r is received from a local client, then r is added to
the queue (note that r was already sent to the leader prior to that point).
Once a commit arrives from the leader, r is processed, and r won't be back to
the queue, regardless of a possible client disconnection (AFAIK, the connection
is only needed at the end of the line, when some kind of result is returned).
Now, lets say the client gets disconnected at some point in the time frame
above while r is processed, and connects to some server (same server or
different).
If a commit arrives to a different server, r will be processed as if it belongs
to a remote client, i.e., we will only perform the update, without using the
connection. I'm not sure that after disconnection ZK is required to inform the
client's new session on his past actions.. (but I guess it can also be fixed if
needed).
If a commit arrives and r is in the queue waiting for it, then it is processed
as if it belongs to a local connected client, but eventually the connection
handle will show that that connection ended, (if I remember the code
correctly), so nothing to report, but ZK continue as usual.
Note that if a client writes something with lower cxid than r, the commit
processor doesn't track such a behavior, i.e., it is possible that the next
head after r will have lower cxid than r. We only care about the order of
commits that we receive from the leader, and that order can't be changed,
because it is based on the network protocol order of messages (i.e., if r was
already sent to the leader, than clearly r is committed prior to any new
message of the same client).
Bottom line, it seems like r is processed only once per processor. What am I
missing?
was (Author: kfirlevari):
[~nerdyyatrice], can you please describe the scenario in which the same request
is processed in the queue twice?
As I see it, if a request r is received from a local client, then r is added to
the queue (note that r was already sent to the leader prior to that point).
Once a commit arrives from the leader, r is processed, and r won't be back to
the queue, regardless of a possible client disconnection (AFAIK, the connection
is only needed at the end of the line, when some kind of result is returned).
Now, lets say the client gets disconnected at some point in the time frame
above while r is processed, and connects to some server (same server or
different).
In the patch, if a commit arrives to a different server, r will be processed as
if it belongs to a remote client, i.e., we will only perform the update,
without using the connection. I'm not sure that after disconnection ZK is
required to inform the client's new session on his past actions.. (but I guess
it can also be fixed if needed).
If a commit arrives and r is in the queue waiting for it, then it is processed
as if it belongs to a local connected client, but eventually the connection
handle will show that that connection ended, (if I remember the code
correctly), so nothing to report, but ZK continue as usual.
Note that if a client writes something with lower cxid than r, the commit
processor doesn't track such a behavior, i.e., it is possible that the next
head after r will have lower cxid than r. We only care about the order of
commits that we receive from the leader, and that order can't be changed,
because it is based on the network protocol order of messages (i.e., if r was
already sent to the leader, than clearly r is committed prior to any new
message of the same client).
Bottom line, it seems like r is processed only once per processor. What am I
missing?
> Fix a crashing bug in the mixed workloads commit processor
> ----------------------------------------------------------
>
> Key: ZOOKEEPER-2684
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2684
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.0
> Environment: with pretty heavy load on a real cluster
> Reporter: Ryan Zhang
> Assignee: Ryan Zhang
> Priority: Blocker
> Attachments: ZOOKEEPER-2684.patch
>
>
> We deployed our build with ZOOKEEPER-2024 and it quickly started to crash
> with the following error
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:24:42,305 - ERROR
> [CommitProcessor:2]
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
> – Got cxid 0x119fa expected 0x11fc5 for client session id 1009079ba470055
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:32:04,746 - ERROR
> [CommitProcessor:2]
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
> – Got cxid 0x698 expected 0x928 for client session id 4002eeb3fd0009d
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:34:46,648 - ERROR
> [CommitProcessor:2]
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
> – Got cxid 0x8904 expected 0x8f34 for client session id 51b8905c90251
> atla-buh-05-sr1.prod.twttr.net: 2017-01-18 22:43:46,834 - ERROR
> [CommitProcessor:2]
> -org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:268)
> – Got cxid 0x3a8d expected 0x3ebc for client session id 2051af11af900cc
> clearly something is not right in the new commit processor per session queue
> implementation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)