[ https://issues.apache.org/jira/browse/HBASE-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16248339#comment-16248339 ]
Abhishek Singh Chouhan commented on HBASE-19215: ------------------------------------------------ Going to put up a patch on monday [~apurtell] > Incorrect exception handling on the client causes incorrect call timeouts and > byte buffer allocations on the server > ------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-19215 > URL: https://issues.apache.org/jira/browse/HBASE-19215 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.1 > Reporter: Abhishek Singh Chouhan > Assignee: Abhishek Singh Chouhan > Fix For: 2.0.0, 3.0.0, 1.4.0, 1.3.2 > > > Ran into the situation of oome on the client : java.lang.OutOfMemoryError: > Direct buffer memory. > When we encounter an unhandled exception during channel write at RpcClientImpl > {noformat} > checkIsOpen(); // Now we're checking that it didn't became idle in between. > try { > call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header, > call.param, > cellBlock)); > } catch (IOException e) { > {noformat} > we end up leaving the connection open. This becomes especially problematic > when we get an unhandled exception between writing the length of our request > on the channel and subsequently writing the params and cellblocks > {noformat} > *dos.write(Bytes.toBytes(totalSize));* > // This allocates a buffer that is the size of the message internally. > header.writeDelimitedTo(dos); > if (param != null) param.writeDelimitedTo(dos); > if (cellBlock != null) dos.write(cellBlock.array(), 0, > cellBlock.remaining()); > dos.flush(); > return totalSize; > {noformat} > After reading the length rs allocates a bb and expects data to be filled. > However when we encounter an exception during param write we release the > writelock in rpcclientimpl and do not close the connection, the exception is > handled at AbstractRpcClient.callBlockingMethod and retried. Now the next > client request to the same rs writes to the channel however the server > interprets this as part of the previous request and errors out during proto > conversion when processing the request since its considered malformed(in the > worst case this might be misinterpreted as wrong data?). Now the remaining > data of the current request is read(the current request's size > prev > request's allocated partially filled bytebuffer) and is misinterpreted as the > size of new request, in my case this was in gbs. All the client requests time > out since this bytebuffer is never completely filled. We should close the > connection for any Throwable and not just ioexception. -- This message was sent by Atlassian JIRA (v6.4.14#64029)