Hi BookKeepers, I've changed the limitStatsLogging default value to true
from false:
BP-60 <https://github.com/apache/bookkeeper/issues/3718>

Motivation

We have an efficient online bookie cluster with hundreds of bookie nodes
deployed on SSD disks.

We separate the AutoRecovery cluster and the Bookie cluster for independent
deployment.

I observed that our AutoRecovery cluster GC is very frequent. After
investigation, I found that the limitStatsLogging of the bookkeeper client
PCBC is disabled by default, and a large number of channel monitoring
indicators are generated. Due to the large number of bookie cluster nodes,
this metric data occupies a large amount of heap memory.

A single StringWriter object occupies 16MB of memory, of which nearly 70
StringWriter objects are waiting for the next GC to be destroyed, occupying
1GB+ heap memory.
Proposal

In my use, I haven't found any usefulness of these PCBC monitoring metrics
data, at least so far, I haven't used it effectively.

If our AutoRecovery and Bookie cluster are mixed in one process, these
large objects will affect the performance and stability of Bookie cluster.

Since I can't find the meaning of these metrics by default, I suggest to
adjust the default value of limitStatsLogging to true.

Everyone can choose to turn it on or off, but by default, it is difficult
for users to find out what effect this parameter will have, so that when
their cluster grows to hundreds or thousands, when they realize the problem
sometimes, it is necessary to restart hundreds to thousands of bookies in a
rolling manner.

At the same time, I observed that in pulsar, various monitoring of the
bookkeeper client is turned off by default, because they really affect the
performance of the pulsar service, which is enough to show that we should
try to change it, especially some very redundant metrics created based on
channels.
Compatibility, Deprecation, and Migration PlanClients that rely on PCBC
metrics monitoring need to pay attention to this upgrade, but this will not
affect the actual functions of the client, only the metrics data, and users
can choose to open it again.


What do you think about it?

Best.
Wenbing

Reply via email to