Hi BookKeepers, I've changed the limitStatsLogging default value to true from false: BP-60 <https://github.com/apache/bookkeeper/issues/3718>
Motivation We have an efficient online bookie cluster with hundreds of bookie nodes deployed on SSD disks. We separate the AutoRecovery cluster and the Bookie cluster for independent deployment. I observed that our AutoRecovery cluster GC is very frequent. After investigation, I found that the limitStatsLogging of the bookkeeper client PCBC is disabled by default, and a large number of channel monitoring indicators are generated. Due to the large number of bookie cluster nodes, this metric data occupies a large amount of heap memory. A single StringWriter object occupies 16MB of memory, of which nearly 70 StringWriter objects are waiting for the next GC to be destroyed, occupying 1GB+ heap memory. Proposal In my use, I haven't found any usefulness of these PCBC monitoring metrics data, at least so far, I haven't used it effectively. If our AutoRecovery and Bookie cluster are mixed in one process, these large objects will affect the performance and stability of Bookie cluster. Since I can't find the meaning of these metrics by default, I suggest to adjust the default value of limitStatsLogging to true. Everyone can choose to turn it on or off, but by default, it is difficult for users to find out what effect this parameter will have, so that when their cluster grows to hundreds or thousands, when they realize the problem sometimes, it is necessary to restart hundreds to thousands of bookies in a rolling manner. At the same time, I observed that in pulsar, various monitoring of the bookkeeper client is turned off by default, because they really affect the performance of the pulsar service, which is enough to show that we should try to change it, especially some very redundant metrics created based on channels. Compatibility, Deprecation, and Migration PlanClients that rely on PCBC metrics monitoring need to pay attention to this upgrade, but this will not affect the actual functions of the client, only the metrics data, and users can choose to open it again. What do you think about it? Best. Wenbing