[ 
https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577592#comment-16577592
 ] 

Ted Yu commented on HBASE-21031:
--------------------------------

Looks good overall. 

> Memory leak if replay edits failed during region opening
> --------------------------------------------------------
>
>                 Key: HBASE-21031
>                 URL: https://issues.apache.org/jira/browse/HBASE-21031
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.0.1
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Major
>         Attachments: HBASE-21031.branch-2.0.001.patch, 
> HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, 
> HBASE-21031.branch-2.0.004.patch, memoryleak.png
>
>
> Due to HBASE-21029, when replaying edits with a lot of same cells, the 
> memstore won't flush,  a exception will throw when all heap space was used:
> {code}
> 2018-08-06 15:52:27,590 ERROR 
> [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] 
> handler.OpenRegionHandler(302): Failed open of 
> region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., 
> starting to roll back the global memstore size.
> java.lang.OutOfMemoryError: Java heap space
>         at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>         at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>         at 
> org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
>         at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
>         at 
> org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
>         at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
>         at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
>         at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
>         at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
> {code}
> After this exception, the memstore did not roll back, and since MSLAB is 
> used, all the chunk allocated won't release for ever. Those memory is leak 
> forever...
> We need to rollback the memory if open region fails(For now, only global 
> memstore size is decreased after failure).
> Another problem is that we use replayEditsPerRegion in RegionServerAccounting 
> to record how many memory used during replaying. And decrease the global 
> memstore size if replay fails. This is not right, since during replaying, we 
> may also flush the memstore, the size in the map of replayEditsPerRegion is 
> not accurate at all! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to