[ https://issues.apache.org/jira/browse/HBASE-15525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209775#comment-15209775 ]
Anoop Sam John commented on HBASE-15525: ---------------------------------------- The issue why we go out of off heap memory is because of 2 main reasons 1. The pool has already reached its max capacity of #BBs. And at a given point of time, all are in use. Again other Calls ask pool for BBs for their cell block creation. The pool happily make new BBs which are off heap with each having size of avg running length. And all these cell blocks are tied to Call until the Responder write them to socket. Ya we wont be keeping them in pool. But it is kept as is for loner time specially when the response Q is growing. 2. Even when the response CellBlock need is very low like 12 KB or so, we waste 512 MB per response. Waste in the sense that all the 500 MB is not usable at all. And even the new BBs which pool create on demand (These might not pooled at all as we reach max #BBs in pool) also takes 512 MB per BB. So in a simple way we can say that its really difficult for the user to predict how much max off heap size he need to give. With deepankar case, he is applying some calc based on the max #BBs in pool and max BB size + some additional GBs and set the max off heap size as 5 GB. But this is going wrong.. To explain it with an eg: Consider one configured the max #BBs in pool as 100. And max per item size as 1MB. Means max can have 100 MB off heap consumption by this pool.. Now consider there are lots of reqs and the response Q is big.. Say the 1st 100 response use all BBs from pool. Now again reqs are there and say there are like 100 more adding to Q.. Each one req to pool. It makes BB off heap for those. Means out of the pool we have made double the total max size what we thought it will take.. I agree that we wont store those all BBs in pool and ya the GC may be able to clean it also.. But for some time (untill we clear these response Q) the usage is more. And one more thing for GC is that the full GC only can clean the off heap area? So this in other words cause more full GCs? (If we go out of space in off heap area)!!! So that is why my thinking abt changing these temp BB creation when happens, those should be HBBs. We need to make pool such that we will give a BB back if it is having a free one. When it is not having a free one and capacity is not reached, it makes a new DBB and return. If that is also not the case it wont return any. The BBBPool will make and take back offf heap BBs only. If it can not give, let the caller do what they want (Make on heap BB and make sure dont give back to pool) And abt fixing the size of BBs from pool.. Will write in another comment. This is too big already > OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts > -------------------------------------------------------------------------- > > Key: HBASE-15525 > URL: https://issues.apache.org/jira/browse/HBASE-15525 > Project: HBase > Issue Type: Bug > Components: IPC/RPC > Reporter: deepankar > Assignee: Anoop Sam John > Priority: Critical > > After HBASE-13819 the system some times run out of direct memory whenever > there is some network congestion or some client side issues. > This was because of pending RPCs in the RPCServer$Connection.responseQueue > and since all the responses in this queue hold a buffer for cellblock from > BoundedByteBufferPool this could takeup a lot of memory if the > BoundedByteBufferPool's moving average settles down towards a higher value > See the discussion here > [HBASE-13819-comment|https://issues.apache.org/jira/browse/HBASE-13819?focusedCommentId=15207822&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15207822] -- This message was sent by Atlassian JIRA (v6.3.4#6332)