Vitaliy Bondarenko created ZOOKEEPER-4562:
---------------------------------------------
Summary: Zookeeper as a platform (multi-tenant setup): Throughput
quotas for each tenant
Key: ZOOKEEPER-4562
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4562
Project: ZooKeeper
Issue Type: New Feature
Components: server
Reporter: Vitaliy Bondarenko
Folks, I have kind of an RFC here for the feature described as '{*}Zookeeper as
a platform (multi-tenant setup): Throughput quotas for each tenant{*}' I am
sure the problem described below bothered other engineers too, so it might be
solved at least to some extent?
I would appreciate your comments on this.
*Problem*
In a multi-tenant zookeeper, It is impossible to separate throughput between
tenants. This leads to a situation when noisy tenant can grab most of the
throughput and affect other tenants by increasing their latency and in severe
cases the ability to read/write data.
*Use-cases*
As a zookeeper platform engineer I want to bo able to separate throughput and
other resources usage between tenants in a multi-tenant zookeeper environment.
*Example*
Let's consider multi-tenant platform setup, when we have :
* tenant_1 having chroot /tenant1_data
* tenant_2 having chroot /tenant2_data
Tenants have recursive ACL configured in a way that tenant_1 clients does not
have any access to chroot of tenant_2 and vice versa. Effectively, each of them
can see only it's own Zookeeper data.
So far so good.
Now as a zookeeper platform engineer I want to be able to limit the resources
usage by tenants 1 and 2. Let's assume that tenant1 needs 90% of disk
usage/throughput and tenant2 only 10%.
# I can use quota for chroot folders to limit the disk usage by every tenant.
Which is great!
# I can use Throttling to throttle connection for all connections at once. So
the connection throttling for tenant 1 will affect tenant 2 as well in my
understanding.
# I want to limit the throughput for tenant1 to consume max 90% or read/write
throughput.
There are multiple options how particularly should 3. work. I imagine something
like this probably: Count number of bytes each tenant wants to write (let's
limit to writes first) in a running fashion. if tenant1 accumulates more than
90% write traffic during certain time period, we should throttle him to allow
tenant 2 to use his 10%.
Another, possibly simpler option is to configure absolute quotas per tenant.
Basically bytes per second each tenant allowed to write or read. The con of
this method is high probability of unused capacities.
*Open questions*
What is the current best practice for such a multi-tenant setup? Can I achieve
what described above to some extent with Throttling/Quota?
Let's kick of the discussion about this feature request!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)