[jira] [Created] (ZOOKEEPER-3864) Reject create/renew/close global session in RO mode

2020-06-15 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3864:


 Summary: Reject create/renew/close global session in RO mode
 Key: ZOOKEEPER-3864
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3864
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.2


These Ops are not read operations. They will modify the state, 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3863) Do not track global sessions in ReadOnlyZooKeeperServer

2020-06-15 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3863:


 Summary: Do not track global sessions in ReadOnlyZooKeeperServer
 Key: ZOOKEEPER-3863
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3863
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.6.2
Reporter: Jie Huang


ReadOnlyZooKeeperServer is using the default SessionTrackerImpl, which tracks 
and expires the global sessions, which should be tracked and expired only by 
the leader. This diff changes the code to use LearnerSessionTracker, which only 
tracks and expires local session.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3859) Add a couple request processor metrics

2020-06-11 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3859:


 Summary: Add a couple request processor metrics
 Key: ZOOKEEPER-3859
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3859
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


These metrics, together with existing request processor metrics, help identify 
the bottleneck in the pipeline:

PROPOSAL_PROCESS_TIME

LEARNER_REQUEST_PROCESSOR_QUEUE_SIZE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3858) Add metrics to track server unavailable time

2020-06-10 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3858:


 Summary: Add metrics to track server unavailable time
 Key: ZOOKEEPER-3858
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3858
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


These metrics track the time when a ZooKeeper server is up and running but not 
serving client traffic because it is not part of a quorum. They don't track the 
hardware down time or ZooKeeper process down time.  

UNAVAILABLE_TIME: time between LOOKING and BROADCAST

LEADER_UNAVAILABLE_TIME: time between LOOKING and BROADCAST on the leader

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3856) Add a couple metrics to track inflight diff syncs and snap syncs

2020-06-07 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3856:


 Summary: Add a couple metrics to track inflight diff syncs and 
snap syncs
 Key: ZOOKEEPER-3856
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3856
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3847) Add a couple metrics to help track Netty memory usage

2020-05-22 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3847:


 Summary: Add a couple metrics to help track Netty memory usage
 Key: ZOOKEEPER-3847
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3847
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


Adding these metrics:
 * RESPONSE_BYTES: size of responses (in bytes) being sent to a client
 * WATCH_BYTES: size of watch events (in bytes) being sent to a client

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3846) Add a couple TLS related metrics

2020-05-22 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3846:


 Summary: Add a couple TLS related metrics
 Key: ZOOKEEPER-3846
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3846
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


Adding those metrics:
 * UNSUCCESSFUL_HANDSHAKE: number of unsuccessful TLS handshakes 
 * INSECURE_ADMIN: number of insecure connections to admin port



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3845) Add metric JVM_PAUSE_TIME

2020-05-21 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3845:


 Summary: Add metric JVM_PAUSE_TIME
 Key: ZOOKEEPER-3845
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3845
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


This metric is used to report how long the JVM stalls, which will help 
understand issues when there is unexpected high latency due to things like GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3844) Add useful metrics for ZK servers

2020-05-21 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3844:


 Summary: Add useful metrics for ZK servers
 Key: ZOOKEEPER-3844
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3844
 Project: ZooKeeper
  Issue Type: Improvement
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.2


In ZOOKEEPER-3245, we upstreamed metrics that we use to monitor and debug 
Zookeeper. We have introduced more metrics since then, which will be upstreamed 
in this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3816) Improve the lagging detection between the leader and learners

2020-05-03 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3816:


 Summary: Improve the lagging detection between the leader and 
learners 
 Key: ZOOKEEPER-3816
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3816
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
Assignee: Jie Huang
 Fix For: 3.6.2


Currently, we have SyncLimitCheck on the leader to detect a lagging leaner by 
tracking the time a proposal being acknowledged. If the leader doesn't receive 
the ack for a proposal from a learner within the syncLimit, it disconnects the 
learner. 

The purpose of the SyncLimitCheck is to prevent sessions connected to a slow 
learner from being expired.  By disconnecting the slow learner, it gives the 
clients a chance to re-connect to another server before session expiration. 

However, there are two cases that the sessions can still expire with current 
SyncLimitCheck implementation. 

One case is that the ack reaches the leader on time but a ping response 
including the session table is delayed. The lagging detection is based on the 
proposal/ack time yet the sessions are updated when the ping response is 
received. If the ping response is delayed longer than the ack, the sessions 
could expire without lagging being detected. It makes more sense to detect 
lagging based on ping/ping response time. 

Another case is that the leader detects lagging and closes the connection to 
the slower learner but the learner doesn't know that it is being disconnected 
due to long socket closing time or a lost RST signal. So the learner doesn't 
disconnect its clients, who lose their chance to re-connect to anther server 
before session expiration. The learner, like the leader, also needs a means to 
detect communication issues at a higher-than-socket layer.

So we need a lagging detector based on ping/ping response and bi-directional 
between the leader and the learners. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3774) Close quorum socket asynchronously on the leader to avoid ping being blocked by long socket closing time

2020-03-28 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3774:


 Summary: Close quorum socket asynchronously on the leader to avoid 
ping being blocked by long socket closing time
 Key: ZOOKEEPER-3774
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3774
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Reporter: Jie Huang
 Fix For: 3.7.0


In ZOOKEEPER-3574 we close the quorum sockets on followers asynchronously when 
a leader is partitioned away so the shutdown process will not be stalled by 
long socket closing time and the followers can quickly establish a new quorum 
to serve client requests.

We've found that the long socket closing time can cause trouble on the leader 
too when a follower is partitioned away if the partition is detected by 
PingLaggingDetector. When the ping thread detects partition, it tries to 
disconnect the follower. If the socket closing time is long, the ping thread 
will be blocked and no ping is sent to any follower--even the ones still 
connected to the leader--since the ping thread is responsible for sending pings 
to all followers. When followers don't receive pings, they don't send ping 
response. When the leader don't receive ping response, the sessions expire. 

To prevent good sessions from expiring, we need to close the socket 
asynchronously on the leader too.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3683) Discard requests that are delayed longer than a configured threshold

2020-01-08 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3683:


 Summary: Discard requests that are delayed longer than a 
configured threshold
 Key: ZOOKEEPER-3683
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3683
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


The RequestThrottler ensures that no requests more than the system can handle 
be fed into the request processor pipeline. In the meantime, the throttler 
queues all incoming requests and there is nothing to instruct the clients to 
slow down.

This new feature will mark all requests that wait in the RequestThrottler 
longer that specified throttledOpWaitTime as throttled and such requests will 
not see any processing other than being fed down the pipeline preserving the 
order of all requests.

The FinalProcessor will issue an error response (new error code: ZTHROTTLEDOP) 
for these undigested requests. The intent is for the clients to not retry them 
immediately.

Also the fact that throttled requests are unprocessed will speed the entire 
work of the pipeline. Throttled requests are not communicated between servers 
and only travel thru the server they belong to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3682) Stop initializing new SSL connection if ZK server is shutting down

2020-01-08 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3682:


 Summary: Stop initializing new SSL connection if ZK server is 
shutting down
 Key: ZOOKEEPER-3682
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3682
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


ZK keeps accepting new connections while it's being shut down then immediately 
close them when it finds out that the ZK server is not running. It's not a big 
deal before SSL is enabled since creating TCP connections is relatively cheap.
 
With SSL being widely enabled,  creating SSL connections involves handshake 
that takes non-trivial CPU time, which is wasted since the connections are 
closed right after. 
 
This JIRA is going to stop initializing TLS handshake if the zkServer is not 
serving to save resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3575) Moving sending packets in Learner to a separate thread

2019-10-12 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3575:


 Summary: Moving sending packets in Learner to a separate thread
 Key: ZOOKEEPER-3575
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3575
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Affects Versions: 3.6.0
Reporter: Jie Huang


After changing to close the socket asynchronously, the shutdown process can 
proceed while the socket is being closed. However, the shutdown process could 
still stall if a thread being shutdown is writing to the socket. For example, 
the SyncRequestProcessor flushes all ACK packets in queue when shutdown is 
called, which calls Learner.writePacket(), which will not return (with an IO 
exception) until the socket finishes closing. So it's still delayed by the 
socket closing time. 

To get around the delay, we move Learner.writePacket() to a separate thread. 
The tricky part is to handle the IO exception thrown by Learner.writePacket(). 
Currently, the IO exception is caught by different callers in different ways. 
For example, if an IO exception caught during revalidateSession, the session is 
closed and removed. In other cases, like in FollowerRequestProcessor and 
SendAckRequestProcess, the quorum socket is closed when the IO exception is 
caught. After moving it to a thread, the callers won't be able to catch and 
handle the exception. We need to handle it within the sending function. We 
reason that if an IO exception is thrown on the quorum socket of a follower, it 
only makes sense to shut down the server. So we make the sending thread a 
ZooKeeperCriticalThread.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3574) Close quorum socket asynchronously to avoid shutdown stalled by long socket closing time

2019-10-12 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3574:


 Summary: Close quorum socket asynchronously to avoid shutdown 
stalled by long socket closing time
 Key: ZOOKEEPER-3574
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3574
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Affects Versions: 3.6.0
Reporter: Jie Huang


Since we can't use SO_LINGER option or find a substitute to close a TLS socket 
quickly in JDK 11, we call close() asynchronously so the shutdown can proceed 
and a new leader election can be started while the socket being closed.  

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3573) Dealing with long TLS connection closing time without SO_LINGER option

2019-10-12 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3573:


 Summary: Dealing with long TLS connection closing time without 
SO_LINGER option
 Key: ZOOKEEPER-3573
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3573
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Jie Huang


As described in ZOOKEEPER-3384, with SSL sockets, a close_notify is required to 
be sent before closing the write side of a connection. When the send buffer is 
full and the writing is blocked, it will take a long time to send close_notify 
thus a long time to close the socket. The long closing time on followers with a 
partitioned-away leader would stall the shutdown process and delay a new leader 
election to establish a new quorum. As a result, the ensemble would be 
unavailable for a long time.

In ZOOKEEPER-3384, SO_LINGER option is used to close the socket quickly (and 
potentially uncleanly). In JDK 11, however, SO_LINGER option is not honored so 
we need a new way to avoid the long quorum unavailable time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3547) Add detailed documentation on throttling

2019-09-16 Thread Jie Huang (Jira)
Jie Huang created ZOOKEEPER-3547:


 Summary: Add detailed documentation on throttling
 Key: ZOOKEEPER-3547
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3547
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ZOOKEEPER-3503) Add server-side large request protection

2019-08-10 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3503:


 Summary: Add server-side large request protection
 Key: ZOOKEEPER-3503
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3503
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Jie Huang


This task adds a new request limiting mechanism to ZooKeeper that aims to 
protect ZooKeeper from accepting too many large requests and crashing because 
it runs out of memory. This is designed to augment the connection throttling 
(ZOOKEEPER-3242) and request throttling (ZOOKEEPER-3243), which focus on 
limiting the number rather than size of requests.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ZOOKEEPER-3493) Deflake testConcurrentRequestProcessingInCommitProcessor in CommitProcessorMetricsTest

2019-08-04 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3493:


 Summary: Deflake testConcurrentRequestProcessingInCommitProcessor 
in CommitProcessorMetricsTest
 Key: ZOOKEEPER-3493
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3493
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.6.0
Reporter: Jie Huang






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ZOOKEEPER-3492) Add weights to server side connection throttling

2019-08-04 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3492:


 Summary: Add weights to server side connection throttling
 Key: ZOOKEEPER-3492
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3492
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


In ZOOKEEPER-3242, we introduced connection throttling to protect the server 
from being overloaded. We realize that the costs for creating a local session, 
creating a global session, and reconnecting are different. So we associate 
weights to the costs when throttling. For example, for the same setting, the 
throttler will allow more connections to be created if they are local.  This 
allows the server resources to be fully utilized.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ZOOKEEPER-3437) Improve sync throttling on a learner master

2019-06-20 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3437:


 Summary: Improve sync throttling on a learner master
 Key: ZOOKEEPER-3437
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3437
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.6.0
Reporter: Jie Huang
 Fix For: 3.6.0


As described in ZOOKEEPER-1928, a leader can become overloaded if it sends too 
many snapshots concurrently during sync time.  Sending too many diffs at the 
same time can also cause the overloading issue. 

In this JIRA, we will:
 # add diff sync throttling in addition to snap sync throttling
 # extend the protection to followers that serve observers
 # improve the counting of concurrent snap syncs/diff syncs to avoid double 
counting or missing counting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3401) Fix metric PROPOSAL_ACK_CREATION_LATENCY

2019-05-23 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3401:


 Summary: Fix metric PROPOSAL_ACK_CREATION_LATENCY
 Key: ZOOKEEPER-3401
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3401
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3383) Improve prep processor metric accuracy and de-flaky unit test

2019-05-09 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3383:


 Summary: Improve prep processor metric accuracy and de-flaky unit 
test
 Key: ZOOKEEPER-3383
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3383
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3379) De-flaky test in Quorum Packet Metrics

2019-05-07 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3379:


 Summary: De-flaky test in Quorum Packet Metrics
 Key: ZOOKEEPER-3379
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3379
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3316) Remove unused code in SyncRequestProcessor

2019-04-04 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang resolved ZOOKEEPER-3316.
--
Resolution: Invalid

> Remove unused code in SyncRequestProcessor
> --
>
> Key: ZOOKEEPER-3316
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3316
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> to make spotbugs happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3324) Add read/write metrics for top level znodes

2019-04-04 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang updated ZOOKEEPER-3324:
-
Description: These metrics provide bytes read from each branch under the 
root and bytes written to each branch under the root. We use top level znodes 
not only to manage applications that share an ensemble but also to organize 
data on a dedicated ensemble. These metrics help us to do quota management, ACL 
management, etc at the top znode level.

> Add read/write metrics for top level znodes
> ---
>
> Key: ZOOKEEPER-3324
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3324
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> These metrics provide bytes read from each branch under the root and bytes 
> written to each branch under the root. We use top level znodes not only to 
> manage applications that share an ensemble but also to organize data on a 
> dedicated ensemble. These metrics help us to do quota management, ACL 
> management, etc at the top znode level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3328) misc metrics

2019-03-27 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang resolved ZOOKEEPER-3328.
--
Resolution: Not A Problem

It turns out that I don't have any left over metrics intended for this category.

> misc metrics
> 
>
> Key: ZOOKEEPER-3328
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3328
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3325) Add unavailable time metrics for quorum peers

2019-03-19 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang resolved ZOOKEEPER-3325.
--
Resolution: Later

These two metrics require ZabState. should be upstreamed together with ZabState.

> Add unavailable time metrics for quorum peers
> -
>
> Key: ZOOKEEPER-3325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3325
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3328) misc metrics

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3328:


 Summary: misc metrics
 Key: ZOOKEEPER-3328
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3328
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3327) Add unrecoverable error count

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3327:


 Summary: Add unrecoverable error count
 Key: ZOOKEEPER-3327
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3327
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3326) Add session/connection related metrics

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3326:


 Summary: Add session/connection related metrics
 Key: ZOOKEEPER-3326
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3326
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3325) Add unavailable time metrics for quorum peers

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3325:


 Summary: Add unavailable time metrics for quorum peers
 Key: ZOOKEEPER-3325
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3325
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3324) Add read/write metrics for top level znodes

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3324:


 Summary: Add read/write metrics for top level znodes
 Key: ZOOKEEPER-3324
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3324
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3323) Add TxnSnapLog metrics

2019-03-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3323:


 Summary: Add TxnSnapLog metrics
 Key: ZOOKEEPER-3323
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3323
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3321) Add metrics for Leader

2019-03-17 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3321:


 Summary: Add metrics for Leader
 Key: ZOOKEEPER-3321
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3321
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3319) Add metrics for follower and observer

2019-03-16 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3319:


 Summary: Add metrics for follower and observer
 Key: ZOOKEEPER-3319
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3319
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3313) Upgrade a few metrics to percentile counter

2019-03-16 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang resolved ZOOKEEPER-3313.
--
Resolution: Not A Problem

> Upgrade a few metrics to percentile counter
> ---
>
> Key: ZOOKEEPER-3313
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3313
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3313) Upgrade a few metrics to percentile counter

2019-03-16 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794407#comment-16794407
 ] 

Jie Huang commented on ZOOKEEPER-3313:
--

was planning to update READ_LATENCY, UPDATE_LATENCY, and PROPAGATION_LATENCY. 
but find out they are using percentile counters already in master

> Upgrade a few metrics to percentile counter
> ---
>
> Key: ZOOKEEPER-3313
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3313
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3316) Remove unused code in SyncRequestProcessor

2019-03-15 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3316:


 Summary: Remove unused code in SyncRequestProcessor
 Key: ZOOKEEPER-3316
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3316
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


to make spotbugs happy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3313) Upgrade a few metrics to percentile counter

2019-03-14 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3313:


 Summary: Upgrade a few metrics to percentile counter
 Key: ZOOKEEPER-3313
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3313
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3310) Add metrics for prep processor

2019-03-12 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3310:


 Summary: Add metrics for prep processor
 Key: ZOOKEEPER-3310
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3310
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3309) Add sync processor metrics

2019-03-12 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3309:


 Summary: Add sync processor metrics
 Key: ZOOKEEPER-3309
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3309
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3305) Add Quorum Packet metrics

2019-03-10 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3305:


 Summary: Add Quorum Packet metrics
 Key: ZOOKEEPER-3305
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3305
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: metric system
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3268) Add commit processor metrics

2019-02-04 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760024#comment-16760024
 ] 

Jie Huang commented on ZOOKEEPER-3268:
--

Add metrics for requests queued in the commit processor, time spent in the 
commit processor, and so on. 

> Add commit processor metrics
> 
>
> Key: ZOOKEEPER-3268
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3268
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3267) Add watcher metrics

2019-02-03 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759626#comment-16759626
 ] 

Jie Huang commented on ZOOKEEPER-3267:
--

Add metrics for fired watch counts, metrics for dead watchers (cleared count, 
cleaner latency, etc) in DeadWatcherListener 

> Add watcher metrics
> ---
>
> Key: ZOOKEEPER-3267
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3267
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (ZOOKEEPER-3268) Add commit processor metrics

2019-02-03 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang updated ZOOKEEPER-3268:
-
Comment: was deleted

(was: Add metrics for fired watch counts, metrics for dead watchers (cleared 
count, cleaner latency, etc) in DeadWatcherListener )

> Add commit processor metrics
> 
>
> Key: ZOOKEEPER-3268
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3268
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3268) Add commit processor metrics

2019-02-03 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759618#comment-16759618
 ] 

Jie Huang commented on ZOOKEEPER-3268:
--

Add metrics for fired watch counts, metrics for dead watchers in 
DeadWatcherListener 

> Add commit processor metrics
> 
>
> Key: ZOOKEEPER-3268
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3268
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3268) Add commit processor metrics

2019-02-03 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759618#comment-16759618
 ] 

Jie Huang edited comment on ZOOKEEPER-3268 at 2/4/19 4:22 AM:
--

Add metrics for fired watch counts, metrics for dead watchers (cleared count, 
cleaner latency, etc) in DeadWatcherListener 


was (Author: jiehuang):
Add metrics for fired watch counts, metrics for dead watchers in 
DeadWatcherListener 

> Add commit processor metrics
> 
>
> Key: ZOOKEEPER-3268
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3268
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3267) Add watcher metrics

2019-02-02 Thread Jie Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang updated ZOOKEEPER-3267:
-
Summary: Add watcher metrics  (was: Add watch metrics)

> Add watcher metrics
> ---
>
> Key: ZOOKEEPER-3267
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3267
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Jie Huang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3267) Add watch metrics

2019-02-01 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3267:


 Summary: Add watch metrics
 Key: ZOOKEEPER-3267
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3267
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Affects Versions: 3.6.0
Reporter: Jie Huang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3268) Add commit processor metrics

2019-02-01 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3268:


 Summary: Add commit processor metrics
 Key: ZOOKEEPER-3268
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3268
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3245) Add useful metrics for ZK pipeline and request/server states

2019-01-18 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746924#comment-16746924
 ] 

Jie Huang commented on ZOOKEEPER-3245:
--

Splitting this Jira into smaller children tasks

> Add useful metrics for ZK pipeline and request/server states
> 
>
> Key: ZOOKEEPER-3245
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3245
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add metrics to track time spent in the commit processor, watch counts and 
> fire rates, how long a Zookeeper server is unavailable between elections, 
> quorum packet size and time spent in the queue, aggregate request 
> states/flow, request throttle, sync processor queue time, per-connection read 
> and write request counts, commit processor queue sizes(read/write/commit), 
> final request processor read/write times, watch manager cnxn/path counts, 
> latencies at different points in pipeline for commits/informs, split up 
> request type counters for more request types, export sum metrics for all 
> AvgMinMax counters, per-connection watch fired counts, ack latency for each 
> follower, percentile metrics to zeus latency counters, proposal count, number 
> of outstanding changes,  snapshot and txns loading time during startup, 
> number of non-voting followers, leader unavailable time, etc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3251) Add new server metric types: percentile counter and counter set

2019-01-18 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3251:


 Summary: Add new server metric types: percentile counter and 
counter set
 Key: ZOOKEEPER-3251
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3251
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


This will add three metric types:

AvgMinMaxCounterSet

AvgMinMaxPercentileCounter

AvgMinMaxPercentileCounterSet

The percentile metrics allow us to get a better sense of the latency 
distribution. They are more expensive than AvgMinMax counters and are 
restricted to latency measurements for now.

The counter set allows the grouping of metrics such as write per namespace, 
read per namespace.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3245) Add useful metrics for ZK pipeline and request/server states

2019-01-11 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3245:


 Summary: Add useful metrics for ZK pipeline and request/server 
states
 Key: ZOOKEEPER-3245
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3245
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Jie Huang
 Fix For: 3.6.0


Add metrics to track time spent in the commit processor, watch counts and fire 
rates, how long a Zookeeper server is unavailable between elections, quorum 
packet size and time spent in the queue, aggregate request states/flow, request 
throttle, sync processor queue time, per-connection read and write request 
counts, commit processor queue sizes(read/write/commit), final request 
processor read/write times, watch manager cnxn/path counts, latencies at 
different points in pipeline for commits/informs, split up request type 
counters for more request types, export sum metrics for all AvgMinMax counters, 
per-connection watch fired counts, ack latency for each follower, percentile 
metrics to zeus latency counters, proposal count, number of outstanding 
changes,  snapshot and txns loading time during startup, number of non-voting 
followers, leader unavailable time, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3243) Add server side request throttling

2019-01-11 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3243:


 Summary: Add server side request throttling
 Key: ZOOKEEPER-3243
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


On-going performance investigation at Facebook has demonstrated that Zookeeper 
is easily overwhelmed by spikes in connection rates and/or write request rates. 
Zookeeper performance gets progressively worse, clients timeout and try to 
reconnect (exacerbating the problem) and things enter a death spiral. To solve 
this problem, we need to add load protection to Zookeeper via rate limiting and 
work shedding.

This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
request spikes.
 
When enabled, the RequestThrottler limits the number Of outstanding requests 
currently submitted to the request processor pipeline. 
 
The throttler augments the limit imposed by the globalOutstandingLimit that is 
enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
connection layer limit applies backpressure against the TCP connection by 
disabling selection on connections once the request limit is reached. However, 
the connection layer always allows a connection to send at least one request 
before disabling selection on that connection. Thus, in a scenario with 4 
client connections, the total number of requests inflight may be as high as 
4 even if the globalOustandingLimit was set lower.
 
The RequestThrottler addresses this issue by adding additional queueing. When 
enabled, client connections no longer submit requests directly to the request 
processor pipeline but instead to the RequestThrottler. The RequestThrottler is 
then responsible for issuing requests to the request processors, and enforces a 
separate maxRequests limit. If the total number of outstanding requests is 
higher than maxRequests, the throttler will continually stall for stallTime 
milliseconds until under limit.
 
The RequestThrottler can also optionally drop stale requests rather than submit 
them to the processor pipeline. A stale request is a request sent by a 
connection that is already closed, and/or a request whose latency will end up 
being higher than its associated session timeout.
To ensure ordering guarantees, if a request is ever dropped from a connection 
that connection is closed and flagged as invalid. All subsequent requests 
inflight from that connection are then dropped as well.
 
The notion of staleness is configurable, both connection staleness and latency 
staleness can be individually enabled/disabled. Both these settings and the 
various throttle settings (limit, stall time, stale drop) can be configured via 
system properties as well as at runtime via JMX.
 
The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3242) Add server side connecting throttling

2019-01-11 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3242:


 Summary: Add server side connecting throttling
 Key: ZOOKEEPER-3242
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3242
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Jie Huang
 Fix For: 3.6.0


On-going performance investigation at Facebook has demonstrated that Zookeeper 
is easily overwhelmed by spikes in connection rates and/or write request rates. 
Zookeeper performance gets progressively worse, clients timeout and try to 
reconnect (exacerbating the problem) and things enter a death spiral. To solve 
this problem, we need to add load protection to Zookeeper via rate limiting and 
work shedding.
 
This Jira adds a new connection rate limiting mechanism to Zookeeper in hopes 
of preventing Zookeeper from becoming overwhelmed during connection spikes. 
The new throttle is focused on limiting connections per second. The throttle is 
implemented as a token-bucket with optional probabilistic dropping based on the 
BLUE queue management algorithm.
 
This token-bucket design allows the throttle to allow short bursts to pass, 
while still capping the total number of requests per second. However, an issue 
with a token bucket approach is that the wall clock arrival time of requests 
affects the probability of a request being allowed to pass or not. Under 
constant load this can lead to request starvation for requests that constantly 
arrive later than the majority. The optional probabilistic dropping mechanism 
is designed to combat this, making rejections a random event with little skew 
based on arrival time.
 
A more verbose description can be found in the comments in 
org.apache.zookeeper.server.BlueThrottle.
 
By default, both the token-bucket and probabilistic dropping mechanism are 
disabled. Enabling and tuning the throttles can be done both via Java system 
properties as well as against a running node via JMX.
 
The throttle has been tested and benchmarked at Facebook.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3239) Adding EnsembleAuthProvider to verify the ensemble name

2019-01-09 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3239:


 Summary: Adding EnsembleAuthProvider to verify the ensemble name
 Key: ZOOKEEPER-3239
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3239
 Project: ZooKeeper
  Issue Type: Improvement
Reporter: Jie Huang
 Fix For: 3.6.0


This AuthenticationProvider checks to make sure that the ensemble name the 
client intends to connect to matches the name that the server thinks it belongs 
to. If the name does not match,
this provider will close the connection

This AuthenticationProvider does not "authenticate" the client. It prevents the 
client accidentally connecting to a wrong ensemble.

This feature has been implemented in the Facebook internal branch and I'm going 
to upstream it to the trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3216) Make init/sync limit tunable via JMX

2018-12-19 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725345#comment-16725345
 ] 

Jie Huang commented on ZOOKEEPER-3216:
--

link to the PR: https://github.com/apache/zookeeper/pull/738

> Make init/sync limit tunable via JMX
> 
>
> Key: ZOOKEEPER-3216
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3216
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx
>Reporter: Jie Huang
>Priority: Minor
>
> Add beans for initLimit and syncLimit so they can be adjusted through JMX



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3216) Make init/sync limit tunable via JMX

2018-12-14 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722018#comment-16722018
 ] 

Jie Huang commented on ZOOKEEPER-3216:
--

This will allow us to fix syncing issues when they happen without restart

> Make init/sync limit tunable via JMX
> 
>
> Key: ZOOKEEPER-3216
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3216
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx
>Reporter: Jie Huang
>Priority: Minor
>
> Add beans for initLimit and syncLimit so they can be adjusted through JMX



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3216) Make init/sync limit tunable via JMX

2018-12-13 Thread Jie Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720702#comment-16720702
 ] 

Jie Huang commented on ZOOKEEPER-3216:
--

This feature has been implemented in the Facebook internal branch and I'm going 
to upstream it to the trunk.

> Make init/sync limit tunable via JMX
> 
>
> Key: ZOOKEEPER-3216
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3216
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx
>Reporter: Jie Huang
>Priority: Minor
>
> Add beans for initLimit and syncLimit so they can be adjusted through JMX



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3216) Make init/sync limit tunable via JMX

2018-12-13 Thread Jie Huang (JIRA)
Jie Huang created ZOOKEEPER-3216:


 Summary: Make init/sync limit tunable via JMX
 Key: ZOOKEEPER-3216
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3216
 Project: ZooKeeper
  Issue Type: Improvement
  Components: jmx
Reporter: Jie Huang


Add beans for initLimit and syncLimit so they can be adjusted through JMX



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)