Hi, all
I proposed a Jira issue to commit ZooKeeper codes. I was asked to follow the 
new issue. So firstly I need to send an email to describe my two issues.

First one:
Jira:  https://issues.apache.org/jira/browse/ZOOKEEPER-3167.
Purpose:  add an API to get total count of recursive sub nodes of one node
Description:

1.    In production environment, there will be always a situation that there 
are a lot of recursive sub nodes of one node. We need to count total number of 
the node. Like this.(We want to get all the subnodes of nodeA.)

[cid:[email protected]]

2. Now, we can only use API getChildren which returns the List<String> of first 
level of sub nodes.(We can only get the nodeB list directly). We need to 
iterate every sub node to get recursive sub nodes. It will cost a lot of time.

3. In zookeeper server side, it uses Hasp<String, DataNode> to store node. The 
key of the map represents the path of the node. We can iterate the map get 
total number of all levels of sub nodes of one node.


Second One:
Jira:  https://issues.apache.org/jira/browse/ZOOKEEPER-3168
Purpose:  Reduce session revalidation time after zxid roll over
Description:

1. Sometimes Zookeeper cluster will receive a lot of connections from clients, 
sometimes connection number even exceeds 1W. When zxid rolls over, the clients 
will reconnect and revalidate the session.

2. In Zookeeper design structure, when follower server receives the session 
revalidation requests, it will send requests to leader server, which is 
designed to be responsible for session revalidation.

[cid:[email protected]] When LearnerZooKeeperServer receives 
reconnection, it will send revalidation requests to LeaderZooKeeperServer. 
LeaderZooKeeperServer will face a lot of pressure.

3. In a short time, Leader will handle lots of requests. I use a tool to get 
the statistics, some clients need to wait over 20s. It is too long for some 
special clients, like ResourceManager.

4. I design a thought: when zxid rollover happens. Leader will record the 
accurate time. When reelection finishs, all servers will get the rollover time. 
When clients reconnect and revalidate session. All servers can judge it. So it 
can reduce a lots of pressure of cluster, all clients can will wait for less 
time.


These are my two issues. Help to review the solution is right or not. Thank you 
a lot.
田毅群
技术产品中心  云平台
爱奇艺公司
QIYI.com, Inc.
地址:上海市长宁区临虹路365号爱奇艺创新大厦6层
邮编:201103
手机:+86 157 2140 1256
邮箱:[email protected]<mailto:[email protected]>

Reply via email to