[
https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849096#comment-15849096
]
Ryan Zhang edited comment on ZOOKEEPER-2024 at 2/1/17 11:22 PM:
----------------------------------------------------------------
We imported this commit but quickly found a bug somewhere and the commit
processor crashed because the commit is not the one that is waiting
smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR
[CommitProcessor:4] - Got cxid 0x2c2 expected 0x3a0 for client session id
53b398c5800b6
smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR
[CommitProcessor:4] - Got cxid 0x35db2 expected 0x3624c for client session id
1006589aec2087e
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,204 - ERROR
[CommitProcessor:4] - Got cxid 0x0 expected 0x6126 for client session id
4003f40105a00e7
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,205 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR
[CommitProcessor:4] - Got cxid 0x2fe94 expected 0x2fe96 for client session id
563a1644b008b
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR
[CommitProcessor:4] - Got cxid 0x9c98c expected 0x9cb5f for client session id
10045820b228ba2
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
I am adding more logs on the leader to try to debug but just wonder if anyone
has seen this too?
was (Author: nerdyyatrice):
We imported this commit but quickly found a bug somewhere and the commit
processor crashed because the commit is not the one that is waiting
smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR
[CommitProcessor:4] - Got cxid 0x2c2 expected 0x3a0 for client session id
53b398c5800b6
smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR
[CommitProcessor:4] - Got cxid 0x35db2 expected 0x3624c for client session id
1006589aec2087e
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,204 - ERROR
[CommitProcessor:4] - Got cxid 0x0 expected 0x6126 for client session id
4003f40105a00e7
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,205 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR
[CommitProcessor:4] - Got cxid 0x2fe94 expected 0x2fe96 for client session id
563a1644b008b
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR
[CommitProcessor:4] - Got cxid 0x9c98c expected 0x9cb5f for client session id
10045820b228ba2
smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR
[CommitProcessor:4] - Severe unrecoverable error, from thread :
CommitProcessor:4
just wonder if anyone has seen this?
> Major throughput improvement with mixed workloads
> -------------------------------------------------
>
> Key: ZOOKEEPER-2024
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
> Project: ZooKeeper
> Issue Type: Improvement
> Components: quorum, server
> Reporter: Kfir Lev-Ari
> Assignee: Kfir Lev-Ari
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch,
> ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch
>
>
> The patch is applied to the commit processor, and solves two problems:
> 1. Stalling - once the commit processor encounters a local write request, it
> stalls local processing of all sessions until it receives a commit of that
> request from the leader.
> In mixed workloads, this severely hampers performance as it does not allow
> read-only sessions to proceed at faster speed than read-write ones.
> 2. Starvation - as long as there are read requests to process, older remote
> committed write requests are starved.
> This occurs due to a bug fix
> (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing
> of local read requests before handling any committed write. The problem is
> only manifested under high local read load.
> Our solution solves these two problems. It improves throughput in mixed
> workloads (in our tests, by up to 8x), and reduces latency, especially higher
> percentiles (i.e., slowest requests).
> The main idea is to separate sessions that inherently need to stall in order
> to enforce order semantics, from ones that do not need to stall. To this end,
> we add data structures for buffering and managing pending requests of stalled
> sessions; these requests are moved out of the critical path to these data
> structures, allowing continued processing of unaffected sessions.
> Please see the docs:
> 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit
> processor algorithm.
> 2) The attached patch implements our solution, and a collection of related
> unit tests (https://reviews.apache.org/r/25160)
> 3) https://goo.gl/W0xDUP - performance results.
> (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the
> corresponding new system test that produced these performance measurements)
>
> See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)