Kfir Lev-Ari created ZOOKEEPER-2024:
---------------------------------------
Summary: Major throughput improvement with mixed workloads
Key: ZOOKEEPER-2024
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2024
Project: ZooKeeper
Issue Type: Improvement
Components: quorum, server
Reporter: Kfir Lev-Ari
Attachments: ZOOKEEPER-2024.patch
The patch is applied to the commit processor, and solves two problems:
1. Stalling - once the commit processor encounters a local write request, it
stalls local processing of all sessions until it receives a commit of that
request from the leader.
In mixed workloads, this severely hampers performance as it does not allow
read-only sessions to proceed at faster speed than read-write ones.
2. Starvation - as long as there are read requests to process, older remote
committed write requests are starved.
This occurs due to a bug fix
(https://issues.apache.org/jira/browse/ZOOKEEPER-1505) that forces processing
of local read requests before handling any committed write. The problem is only
manifested under high local read load.
Our solution solves these two problems. It improves throughput in mixed
workloads (in our tests, by up to 8x), and reduces latency, especially higher
percentiles (i.e., slowest requests).
The main idea is to separate sessions that inherently need to stall in order to
enforce order semantics, from ones that do not need to stall. To this end, we
add data structures for buffering and managing pending requests of stalled
sessions; these requests are moved out of the critical path to these data
structures, allowing continued processing of unaffected sessions.
In order to avoid starvation, our solution prioritizes committed write requests
over reads, and enforces fairness among read requests of sessions.
Please see the docs:
1)
https://docs.google.com/document/d/1oXJiSt9VqL35hCYQRmFuC63ETd0F_g6uApzocgkFe3Y/edit?usp=sharing
- includes a detailed description of the new commit processor algorithm.
2) The attached patch implements our solution, and a collection of related unit
tests
3)
https://docs.google.com/spreadsheets/d/1vmdfsq4WLr92BQO-CGcualE0KhAtjIu3bCaVwYajLo8/edit?usp=sharing
- shows performance results of running system tests on the patched ZK using
the patched system test from
https://issues.apache.org/jira/browse/ZOOKEEPER-2023.
See also https://issues.apache.org/jira/browse/ZOOKEEPER-1609
--
This message was sent by Atlassian JIRA
(v6.2#6252)