wenbingshen opened a new issue, #3718:
URL: https://github.com/apache/bookkeeper/issues/3718

   **BP**
   
   ### Motivation
   
   We have an efficient online bookie cluster with hundreds of bookie nodes 
deployed on SSD disks.
   
   We separate the AutoRecovery cluster and the Bookie cluster for independent 
deployment.
   
   I observed that our AutoRecovery cluster GC is very frequent. After 
investigation, I found that the `limitStatsLogging` of the bookkeeper client 
PCBC is disabled by default, and a large number of channel monitoring 
indicators are generated. Due to the large number of bookie cluster nodes, this 
metric data occupies a large amount of heap memory.
   
   A single `StringWriter` object occupies 16MB of memory, of which nearly 70 
`StringWriter` objects are waiting for the next GC to be destroyed, occupying 
1GB+ heap memory.
   
   
![image](https://user-images.githubusercontent.com/35599757/209527835-de1b9ef0-38d7-4ea6-b14b-75ec5879d26a.png)
   
   
![image](https://user-images.githubusercontent.com/35599757/209527853-3fd829f1-ab4b-47f1-a73c-662abc49463c.png)
   
   ### Proposal
   
   In my use, I haven't found any usefulness of these PCBC monitoring metrics 
data, at least so far, I haven't used it effectively.
   
   If our AutoRecovery and Bookie cluster are mixed in one process, these large 
objects will affect the performance and stability of Bookie cluster.
   
   Since I can't find the meaning of these metrics by default, I suggest to 
adjust the default value of `limitStatsLogging` to true.
   
   Everyone can choose to turn it on or off, but by default, it is difficult 
for users to find out what effect this parameter will have, so that when their 
cluster grows to hundreds or thousands, when they realize the problem 
sometimes, it is necessary to restart hundreds to thousands of bookies in a 
rolling manner.
   
   At the same time, I observed that in pulsar, various monitoring of the 
bookkeeper client is turned off by default, because they really affect the 
performance of the pulsar service, which is enough to show that we should try 
to change it, especially some very redundant metrics created based on channels.
   
   ### Compatibility, Deprecation, and Migration Plan
   
   Clients that rely on PCBC metrics monitoring need to pay attention to this 
upgrade, but this will not affect the actual functions of the client, only the 
metrics data, and users can choose to open it again.
   
   <!-- add a proposal PR link below -->
   Proposal PR - #abc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to