[ https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849096#comment-15849096 ]
Ryan Zhang commented on ZOOKEEPER-2024: --------------------------------------- We imported this commit but quickly found a bug somewhere and the commit processor crashed because the commit is not the one that is waiting smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR [CommitProcessor:4] - Got cxid 0x2c2 expected 0x3a0 for client session id 53b398c5800b6 smf1-chn-23-sr1.prod.twitter.com: 2017-01-25 02:14:30,147 - ERROR [CommitProcessor:4] - Severe unrecoverable error, from thread : CommitProcessor:4 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR [CommitProcessor:4] - Got cxid 0x35db2 expected 0x3624c for client session id 1006589aec2087e smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:31:16,011 - ERROR [CommitProcessor:4] - Severe unrecoverable error, from thread : CommitProcessor:4 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,204 - ERROR [CommitProcessor:4] - Got cxid 0x0 expected 0x6126 for client session id 4003f40105a00e7 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:50:19,205 - ERROR [CommitProcessor:4] - Severe unrecoverable error, from thread : CommitProcessor:4 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR [CommitProcessor:4] - Got cxid 0x2fe94 expected 0x2fe96 for client session id 563a1644b008b smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 22:53:03,738 - ERROR [CommitProcessor:4] - Severe unrecoverable error, from thread : CommitProcessor:4 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR [CommitProcessor:4] - Got cxid 0x9c98c expected 0x9cb5f for client session id 10045820b228ba2 smf1-chn-23-sr1.prod.twitter.com: 2017-02-01 23:06:05,795 - ERROR [CommitProcessor:4] - Severe unrecoverable error, from thread : CommitProcessor:4 just wonder if anyone has seen this? > Major throughput improvement with mixed workloads > ------------------------------------------------- > > Key: ZOOKEEPER-2024 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum, server > Reporter: Kfir Lev-Ari > Assignee: Kfir Lev-Ari > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, > ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, > ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, > ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch, > ZOOKEEPER-2024.patch, ZOOKEEPER-2024.patch > > > The patch is applied to the commit processor, and solves two problems: > 1. Stalling - once the commit processor encounters a local write request, it > stalls local processing of all sessions until it receives a commit of that > request from the leader. > In mixed workloads, this severely hampers performance as it does not allow > read-only sessions to proceed at faster speed than read-write ones. > 2. Starvation - as long as there are read requests to process, older remote > committed write requests are starved. > This occurs due to a bug fix > (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing > of local read requests before handling any committed write. The problem is > only manifested under high local read load. > Our solution solves these two problems. It improves throughput in mixed > workloads (in our tests, by up to 8x), and reduces latency, especially higher > percentiles (i.e., slowest requests). > The main idea is to separate sessions that inherently need to stall in order > to enforce order semantics, from ones that do not need to stall. To this end, > we add data structures for buffering and managing pending requests of stalled > sessions; these requests are moved out of the critical path to these data > structures, allowing continued processing of unaffected sessions. > Please see the docs: > 1) https://goo.gl/m1cINJ - includes a detailed description of the new commit > processor algorithm. > 2) The attached patch implements our solution, and a collection of related > unit tests (https://reviews.apache.org/r/25160) > 3) https://goo.gl/W0xDUP - performance results. > (See https://issues.apache.org/jira/browse/ZOOKEEPER-2023 for the > corresponding new system test that produced these performance measurements) > > See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609 -- This message was sent by Atlassian JIRA (v6.3.15#6346)