[
https://issues.apache.org/jira/browse/ZOOKEEPER-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940506#comment-17940506
]
zhangtongr commented on ZOOKEEPER-4914:
---------------------------------------
I discovered an abnormality in the operation of the Zookeeper cluster, which
resulted in a failure. Additionally, attempts to restart the cluster were
unsuccessful.
> [QUESTION] Strategies for monitoring and preventing "unreasonable length"
> errors in session closure
> ---------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4914
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4914
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.3
> Reporter: zhangtongr
> Priority: Major
>
> Description:
> ===========
> I'm encountering potential risks with "unreasonable length" errors during
> session closure, particularly when sessions have numerous ephemeral nodes.
> I'd like to discuss possible monitoring and prevention strategies.
> Current Situation:
> -----------------
> 1. When sessions with many ephemeral nodes are closed, all node paths are
> collected into a single transaction
> 2. If the combined size exceeds jute.maxbuffer, it results in "unreasonable
> length" errors
> 3. Currently lacking effective ways to predict or prevent this issue
> Questions:
> ---------
> 1. Monitoring Strategy:
> * What metrics should we monitor to predict potential issues?
> * Are there existing metrics for tracking session transaction sizes?
> * How can we monitor the growth of ephemeral nodes per session?
> 2. Prevention Approaches:
> * What are the recommended approaches to prevent this issue?
> * Is there a way to estimate transaction size before session closure?
> * Are there best practices for managing large numbers of ephemeral nodes?
> 3. Configuration Guidelines:
> * What's the recommended jute.maxbuffer setting for different scenarios?
> * Are there other relevant configuration parameters?
> ---------------
> Would appreciate insights on:
> 1. Additional metrics to monitor
> 2. Early warning indicators
> 3. Prevention strategies
> 4. Best practices for large-scale deployments
> Thank you for any guidance or suggestions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)