Hi Stack, We are seeing excessive Region Server exists along with ZK connection tear down (Len error, jute buffer threshold being reached) I want to see what is contributing jute buffer reaching the max upper bound. so for after investigating the code and studying the protocol itself it appear it is a function of number of watches that gets set on the znodes. to bring stability to ZK service, we had to increase jute.buffer from 1mb to 20mb, 32mb and now it is set to 128mb. In order to understand more, I digged little bit more to see how many zookeeper watch objects are on zookeeper jvm /instance. I did a jmap history:live on zookeeper pid and I got the following output (please see below). I am not sure what is [C, [B here and it doesn't appear its refers to any class - I don't see this on dev instance of zookeeper. due to suspect memory leak or another issue? Please guide me through this as I can't find a resource who can go that far to give me any hint as to what may be happening on my end. Also is it safe for ZK sizes to increase that much? what is the impact of jute-buffer increasing on hbase? I will greatly appreciate your feedback and help on this.
num #instances #bytes class name ---------------------------------------------- 1: 220810 140582448 [C 2: 109370 34857168 [B 3: 103842 7476624 org.apache.zookeeper.data.StatPersisted 4: 220703 5296872 java.lang.String 5: 28682 3783712 <constMethodKlass> 6: 28682 3681168 <methodKlass> 7: 111000 3552000 java.util.HashMap$Entry 8: 107569 3442208 java.util.concurrent.ConcurrentHashMap$HashEntry 9: 103842 3322944 org.apache.zookeeper.server.DataNode 10: 2655 3179640 <constantPoolKlass> 11: 2313 2017056 <constantPoolCacheKlass> 12: 2655 1842456 <instanceKlassKlass> 13: 318 1241568 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry; 14: 7526 1221504 [Ljava.util.HashMap$Entry; 15: 1820 812976 <methodDataKlass> 16: 8228 394944 java.util.HashMap 17: 2903 348432 java.lang.Class 18: 4077 229688 [S 19: 4138 221848 [[I 20: 231 125664 <objArrayKlassKlass> 21: 7796 124736 java.util.HashSet 22: 6771 108336 java.util.HashMap$KeySet 23: 1263 62968 [Ljava.lang.Object; 24: 746 59680 java.lang.reflect.Method 25: 3570 57120 java.lang.Object 26: 502 36144 org.apache.zookeeper.server.Request 27: 649 25960 java.lang.ref.SoftReference 28: 501 24048 org.apache.zookeeper.txn.TxnHeader 29: 188 21704 [I 30: 861 20664 java.lang.Long 31: 276 19872 java.lang.reflect.Constructor 32: 559 17888 java.util.concurrent.locks.ReentrantLock$NonfairSync 33: 422 16880 java.util.LinkedHashMap$Entry 34: 502 16064 org.apache.zookeeper.server.quorum.QuorumPacket 35: 455 14560 java.util.Hashtable$Entry 36: 495 14368 [Ljava.lang.String; 37: 318 12720 java.util.concurrent.ConcurrentHashMap$Segment 38: 3 12336 [Ljava.nio.ByteBuffer; 39: 514 12336 javax.management.ObjectName$Property 40: 505 12120 java.util.LinkedList$Node 41: 501 12024 org.apache.zookeeper.server.quorum.Leader$Proposal 42: 619 11920 [Ljava.lang.Class; 43: 74 11840 org.apache.zookeeper.server.NIOServerCnxn 44: 145 11672 [Ljava.util.Hashtable$Entry; 45: 729 11664 java.lang.Integer 46: 346 11072 java.lang.ref.WeakReference 47: 449 10776 org.apache.zookeeper.txn.SetDataTxn 48: 156 9984 com.cloudera.cmf.event.shaded.org.apache.avro.Schema$Props 49: 266 8512 java.util.Vector 50: 75 8400 sun.nio.ch.SocketChannelImpl 51: 175 8400 java.nio.HeapByteBuffer 52: 247 8320 [Ljavax.management.ObjectName$Property; 53: 303 7272 com.cloudera.cmf.event.EventCode 54: 300 7200 java.util.ArrayList 55: 136 6528 java.util.Hashtable 56: 156 6240 java.util.WeakHashMap$Entry 57: 194 6208 com.sun.jmx.mbeanserver.ConvertingMethod