[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17940506#comment-17940506
 ] 

zhangtongr commented on ZOOKEEPER-4914:
---------------------------------------

I discovered an abnormality in the operation of the Zookeeper cluster, which 
resulted in a failure. Additionally, attempts to restart the cluster were 
unsuccessful.

> [QUESTION] Strategies for monitoring and preventing "unreasonable length" 
> errors in session closure
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4914
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4914
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.3
>            Reporter: zhangtongr
>            Priority: Major
>
> Description:
> ===========
> I'm encountering potential risks with "unreasonable length" errors during 
> session closure, particularly when sessions have numerous ephemeral nodes. 
> I'd like to discuss possible monitoring and prevention strategies.
> Current Situation:
> -----------------
> 1. When sessions with many ephemeral nodes are closed, all node paths are 
> collected into a single transaction
> 2. If the combined size exceeds jute.maxbuffer, it results in "unreasonable 
> length" errors
> 3. Currently lacking effective ways to predict or prevent this issue
> Questions:
> ---------
> 1. Monitoring Strategy:
>    * What metrics should we monitor to predict potential issues?
>    * Are there existing metrics for tracking session transaction sizes?
>    * How can we monitor the growth of ephemeral nodes per session?
> 2. Prevention Approaches:
>    * What are the recommended approaches to prevent this issue?
>    * Is there a way to estimate transaction size before session closure?
>    * Are there best practices for managing large numbers of ephemeral nodes?
> 3. Configuration Guidelines:
>    * What's the recommended jute.maxbuffer setting for different scenarios?
>    * Are there other relevant configuration parameters?
> ---------------
> Would appreciate insights on:
> 1. Additional metrics to monitor
> 2. Early warning indicators
> 3. Prevention strategies
> 4. Best practices for large-scale deployments
> Thank you for any guidance or suggestions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to