[ https://issues.apache.org/jira/browse/ZOOKEEPER-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000695#comment-17000695 ]
Fangmin Lv commented on ZOOKEEPER-3619: --------------------------------------- Thanks [~randgalt], we'll add you as reviewer when it's ready, we can probably add a new client implementation of semaphore based on this in Curator. > Implement server side semaphore API to improve the efficiency and throughput > of coordination > --------------------------------------------------------------------------------------------- > > Key: ZOOKEEPER-3619 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3619 > Project: ZooKeeper > Issue Type: New Feature > Components: server > Affects Versions: 3.6.0 > Reporter: Fangmin Lv > Assignee: Fangmin Lv > Priority: Major > Fix For: 3.7.0 > > > The design principle of ZK API is simple, flexible and general, it can meets > different scenarios from coordination, health member track, meta store, etc. > But there are some cost of this general design, which makes heavy and > inefficient client code for recipes like distributed and semaphore, etc. > Currently, the general client side semaphore implementation without waiting > time are: > # client A create sequential and ephemeral node N-1 > # client B create sequential and ephemeral node N-2 > # client A and B query all children and see if its holding the lock node > with the smallest sequential id > # since client A has smaller sequential id, its the semaphore owner (assume > semaphore value is 1) > # client B will delete the node, close the session, and probably try again > later from step 2 > All the contenders will issue 4 write (create session, create lock, delete > lock, close session) and 1 read (get children), which are pretty heavy and > not scale well. > We actually hit this issue internally for one heavy semaphore use case, and > we have to create dozens of ensembles to support their traffic. > To make the semaphore recipe more efficient, we can move the semaphore > implementation to server side, where leader has all the context about who'll > win the semaphore/lock during txn preparation time, do short circuit and fail > the contender directly without proposing and committing those create/delete > lock transactions. > To implement this, we need to add new semaphore API, which suppose to replace > client side lock, leader election (semaphore value 1), and general semaphore > use cases. > We started to design and implement it recently, it will based on another big > improvement we've almost finished and will soon upstream it in ZOOKEEPER-3594 > to skip proposing requests with error transactions. > Meanwhile, we'd like to hear some early feedback from the community about > this feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)