Slow replication can lead to too many znodes in ZK. These would not be direct children of "/hbase/replication/rs" znode, but of specific RSes replication queue. The "108" showed on your "stat" command output is only the number of RSes, and would not vary, but further znodes under each of those would. To have a better picture of how large is your znode tree, you should rather use ZK org.apache.zookeeper.server.SnapshotFormatter tool. This will print the whole structure, from a given snapshot file as input. Also, there may be some further hints about which znode is exceeding the buffer limits by looking at the full stack trace from the jute buffer error. Would you be able to share those?
Em ter, 12 de mar de 2019 às 09:52, Asim Zafir <asim.za...@gmail.com> escreveu: > > Hi Weillington, > > Thanks for the response. I greatly appreciate. Yes we have replication > enabled and if I stat /hbase/replication/rs I get the following > > > cZxid = 0x1000001c8 > > ctime = Tue Jun 19 22:23:17 UTC 2018 > > mZxid = 0x1000001c8 > > mtime = Tue Jun 19 22:23:17 UTC 2018 > > pZxid = 0x10000cd555 > > cversion = 8050 > > dataVersion = 0 > > aclVersion = 0 > > ephemeralOwner = 0x0 > > dataLength = 0 > > numChildren = 108 > > how should I be able to see analyze the znode utilzation in this case and > more specifiically how is it impacting the jute-buffer size.. I can see > numChild nodes under /hbase/replication is 108 but how does it correspond > to zk jute buffer reaching max value? > > Also it is not clear the timestamp on this znodes isnt' incrementing. I > see the timestamp is still showing 2018 date. > > Thanks, > asim > > > -------------->>>>> > > > > This jute buffer len error generally means a given znode being watched/read > had grown too large to fit into the buffer. It's not specific to number of > watches attached, but amount of info stored in it, for example, too many > children znode under a given znode. In order to understand what's behind > the error, you should analyse your zookeeper znodes tree, you may have a > hint by looking at zookeeper snapashot files. Would you have replication > enabled on this cluster? A common cause for such errors in hbase is when > replication is slow/stuck, and source cluster is under heavy write load, > causing replication queue to grow much faster than it's ability to drain, > which will imply on many znodes created under "replication" znode. > >