[
https://issues.apache.org/jira/browse/ZOOKEEPER-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360734#comment-17360734
]
Damien Diederen commented on ZOOKEEPER-4306:
--------------------------------------------
Hi [~Changrui Lin],
I am currently looking into fixing this.
bq. Is it a bug or just a unspecified feature? If it just so, how should we
judge the upper limit of creating nodes?
I would definitely put this in the "bug" category \:)
It seems that we would want ephemeral node creation to start failing when the
session gets "too big" to fit in a transaction. [~eolivelli], [~lvfangmin]:
would you agree?
Here are a few related tickets which include ideas for minimizing
{{jute.maxbuffer}} annoyances:
# ZOOKEEPER-1162: Suggests \(among others) controlling node size during child
creation (similar to what I am proposing above);
# ZOOKEEPER-1644: Suggests compressing some of the data, which would allow for
a larger {{CloseSessionTxn}} (related to your comment about "absolute paths").
Note that it is currently possible to work around this issue by setting this
undocumented flag:
{noformat}
closeSessionTxn.enabled = false
{noformat}
(The flag was introduced as part of ZOOKEEPER-3145. Of course, disabling it
"unfixes" the "potential watch missing issue." Still, probably better than
suffering crashing ensembles.)
> CloseSessionTxn contains too many ephemal nodes cause cluster crash
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-4306
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4306
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.6.2
> Reporter: Lin Changrui
> Priority: Critical
> Attachments: cs.jpg, f.jpg, l1.png, l2.jpg, r.jpg
>
>
> We took a test about how many ephemal nodes can client create under one
> parent node with defalut configuration. The test caused cluster crash at
> last, exception stack trace like this.
> follower:
> !f.jpg!
> leader:
> !l1.png!
> !l2.jpg!
> It seems that leader sent a too large txn packet to followers. When follower
> try to deserialize the txn, it found the txn length out of its buffer
> size(default 1MB+1MB, jute.maxbuffer + jute.maxbuffer.extrasize). That causes
> followers crashed, and then, leader found there was no sufficient followers
> synced, so leader shutdown later. When leader shutdown, it called
> zkDb.fastForwardDataBase() , and leader found the txn read from txnlog out of
> its buffer size, so it crashed too.
> After the servers crashed, they try to restart the quorum. But they would not
> success because the last txn is too large. We lose the log at that moment,
> but the stack trace is same as this one.
> !r.jpg|width=1468,height=598!
>
> *Root Cause*
> We use org.apache.zookeeper.server.LogFormatter(-Djute.maxbuffer=74827780)
> visualize this log and found this. !cs.jpg|width=1400,height=581! So
> closeSessionTxn contains all ephemal nodes with absolute path. We know we
> will get a large getChildren respose if we create too many children nodes
> under one parent node, that is limited by jute.maxbuffer of client. If we
> create plenty of ephemal nodes under different parent nodes with one session,
> it may not cause out of buffer of client, but when the session close without
> delete these node first, it probably cause cluster crash.
> Is it a bug or just a unspecified feature?If it just so, how should we judge
> the upper limit of creating nodes?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)