[ https://issues.apache.org/jira/browse/ZOOKEEPER-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114409#comment-14114409 ]
Hongchao Deng commented on ZOOKEEPER-2024: ------------------------------------------ [~kfirlevari]. Thanks for sharing this novel idea. I have some questions. Can you help clarify? {quote} The main idea is to separate sessions that inherently need to stall in order to enforce order semantics, from ones that do not need to stall. To this end, we add data structures for buffering and managing pending requests of stalled sessions; these requests are moved out of the critical path to these data structures, allowing continued processing of unaffected sessions. {quote} There seems to be two kinds separation: 1. sessions unaffected from each other and 2. requests in one session unaffected from each other. Right? Another question, why does it need to buffer "pending requests of stalled sessions"? More specifically, why on server not client side? I assumed you were meaning the second type of separation. {quote} This occurs due to a bug fix (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing of local read requests before handling any committed write. In order to avoid starvation, our solution prioritizes committed write requests over reads, and enforces fairness among read requests of sessions. {quote} This sounds like a bug. We shouldn't give any preference to either read or write. A user should configure on client side to tackle the need of either heavy write or heavy read. Right? > Major throughput improvement with mixed workloads > ------------------------------------------------- > > Key: ZOOKEEPER-2024 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum, server > Reporter: Kfir Lev-Ari > Attachments: ZOOKEEPER-2024.patch > > > The patch is applied to the commit processor, and solves two problems: > 1. Stalling - once the commit processor encounters a local write request, it > stalls local processing of all sessions until it receives a commit of that > request from the leader. > In mixed workloads, this severely hampers performance as it does not allow > read-only sessions to proceed at faster speed than read-write ones. > 2. Starvation - as long as there are read requests to process, older remote > committed write requests are starved. > This occurs due to a bug fix > (https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing > of local read requests before handling any committed write. The problem is > only manifested under high local read load. > Our solution solves these two problems. It improves throughput in mixed > workloads (in our tests, by up to 8x), and reduces latency, especially higher > percentiles (i.e., slowest requests). > The main idea is to separate sessions that inherently need to stall in order > to enforce order semantics, from ones that do not need to stall. To this end, > we add data structures for buffering and managing pending requests of stalled > sessions; these requests are moved out of the critical path to these data > structures, allowing continued processing of unaffected sessions. > In order to avoid starvation, our solution prioritizes committed write > requests over reads, and enforces fairness among read requests of sessions. > Please see the docs: > 1) > https://docs.google.com/document/d/1oXJiSt9VqL35hCYQRmFuC63ETd0F_g6uApzocgkFe3Y/edit?usp=sharing > - includes a detailed description of the new commit processor algorithm. > 2) The attached patch implements our solution, and a collection of related > unit tests > 3) > https://docs.google.com/spreadsheets/d/1vmdfsq4WLr92BQO-CGcualE0KhAtjIu3bCaVwYajLo8/edit?usp=sharing > - shows performance results of running system tests on the patched ZK using > the patched system test from > https://issues.apache.org/jira/browse/ZOOKEEPER-2023. > See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609 -- This message was sent by Atlassian JIRA (v6.2#6252)