[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102708#comment-16102708
 ] 

Karan Mehta commented on ZOOKEEPER-2770:
----------------------------------------

[~tdunning]
There is no rate limiting on logging. If the threshold is too low, it might 
result in huge amount of messages getting printed. At this point, I suggest the 
following things. Please suggest your opinion.
1. Turn off this feature by default, so that we don't end up on an arbitrary 
value.
I personally not want this, since I believe that no matter what your 
requirements are or your hardware, it can be possible to put some upper bound 
on this value. Experienced people can comment on this more than I can. 
2. Add rate limiter based on some logic
  2.1 Time based logic (Limit messages printed in a given amount of time)
  2.2 Random sampling based on some probability percentage.

I am not aware of how typically these things are implemented. It would be good 
if you can suggest some part of code which does similar stuff. Thanks!

> ZooKeeper slow operation log
> ----------------------------
>
>                 Key: ZOOKEEPER-2770
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to