[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589965#comment-16589965 ] Hudson commented on HBASE-21031: Results for branch master [build #450 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/450/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/450//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/450//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/450//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, HBASE-21031.branch-2.0.006.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589814#comment-16589814 ] Hudson commented on HBASE-21031: Results for branch branch-2 [build #1148 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1148/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1148//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1148//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1148//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} -- Something went wrong with this stage, [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1148//console]. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, HBASE-21031.branch-2.0.006.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589621#comment-16589621 ] Hudson commented on HBASE-21031: Results for branch branch-2.1 [build #226 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/226/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/226//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/226//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/226//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.1.1, 2.0.2 > > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, HBASE-21031.branch-2.0.006.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589594#comment-16589594 ] Hudson commented on HBASE-21031: Results for branch branch-2.0 [build #715 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/715/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/715//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/715//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/715//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Fix For: 3.0.0, 2.0.2, 2.1.1 > > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, HBASE-21031.branch-2.0.006.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588076#comment-16588076 ] Hadoop QA commented on HBASE-21031: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 48s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 55s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 52s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}130m 6s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936503/HBASE-21031.branch-2.0.006.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4193c8128c80 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / dcf8a23183 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14119/testReport/ | | Max. process+thread count | 4579 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14119/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587891#comment-16587891 ] stack commented on HBASE-21031: --- Retry while [~allan163] is offline. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, HBASE-21031.branch-2.0.006.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587558#comment-16587558 ] Hadoop QA commented on HBASE-21031: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 1s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 22s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 14s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 9m 6s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 2s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12936434/HBASE-21031.branch-2.0.006.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2cc12df77480 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / efa54012b4 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14111/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14111/testReport/ | | Max. process+thread count | 4149 (vs. ulimit of 1) | | modules
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587483#comment-16587483 ] Mike Drob commented on HBASE-21031: --- +1 assuming QA agrees, thanks for working on this! > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16587350#comment-16587350 ] Allan Yang commented on HBASE-21031: {quote} + LOG.error("replay failed, that's expected", t); I think the test is better without this since we already fail the test if the replay doesn't fail, so this is unneeded lines in the log. {quote} Done! {quote} please try to put line breaks in between words instead of in the middle of words. {quote} Done! {quote} Also, I responded to your other question about my comments for the try/catch/finally directly on RB. Let me know if that doesn't make sense or if you think it's better to leave them as it is. {quote} I think we'd better leave them there, since we only catch IOException here, but we need to abort the status whatever exception is thrown. Thanks for reviewing,[~mdrob]. Can you take a look at HBASE-21041 too, it is another bug you mentioned in the review board. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > HBASE-21031.branch-2.0.006.patch, memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581501#comment-16581501 ] Mike Drob commented on HBASE-21031: --- {quote} + LOG.warn("Failed drop memstore of region= {}, s" + + "ome chunks may not released forever since MSLAB is enabled", {quote} please try to put line breaks in between words instead of in the middle of words. Also, I responded to your other question about my comments for the try/catch/finally directly on RB. Let me know if that doesn't make sense or if you think it's better to leave them as it is. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581499#comment-16581499 ] Mike Drob commented on HBASE-21031: --- bq. +LOG.error("replay failed, that's expected", t); I think the test is better without this since we already fail the test if the replay doesn't fail, so this is unneeded lines in the log. Especially stack trace at error will make it stick out and harder to diagnose other failures. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577798#comment-16577798 ] Allan Yang commented on HBASE-21031: Thanks for reviewing, [~yuzhih...@gmail.com]. Any other comments? [~mdrob] > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, HBASE-21031.branch-2.0.005.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577686#comment-16577686 ] Hadoop QA commented on HBASE-21031: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 49s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 35s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 43s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}104m 25s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935287/HBASE-21031.branch-2.0.005.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5c4c8edeed01 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / aa83594b84 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14013/testReport/ | | Max. process+thread count | 4177 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14013/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577592#comment-16577592 ] Ted Yu commented on HBASE-21031: Looks good overall. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > HBASE-21031.branch-2.0.004.patch, memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577584#comment-16577584 ] Hadoop QA commented on HBASE-21031: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 50s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 40s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 5s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 13s{color} | {color:red} hbase-server: The patch generated 1 new + 250 unchanged - 0 fixed = 251 total (was 250) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}108m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 43s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935281/HBASE-21031.branch-2.0.004.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2ce66c307bf7 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / aa83594b84 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14010/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14010/testReport/ | | Max. process+thread count | 4088 (vs. ulimit of 1) | | modules | C: hbase-server U:
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577496#comment-16577496 ] Allan Yang commented on HBASE-21031: {quote} Can you come up with a test which fails if dropMemStoreContents only rolls back single region (the region which encounters Throwable) ? Thanks {quote} Sorry, maybe the comment in the patch is misleading, we only rollback the problematic region's memtore if opening fails, not all region's. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577482#comment-16577482 ] Allan Yang commented on HBASE-21031: [~yuzhih...@gmail.com], just checked, master branch and branch-2 has HBASE-20542, so the problem has fixed. I think we can check in this one only to branch-2.0 and branch-2.1? > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576494#comment-16576494 ] Ted Yu commented on HBASE-21031: Can you come up with a test which fails if {{dropMemStoreContents}} only rolls back single region (the region which encounters Throwable) ? Thanks > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576492#comment-16576492 ] Allan Yang commented on HBASE-21031: {quote} TestRecoveredEidtsReplayAndAbort passes with the above change. {quote} Sorry, [~yuzhih...@gmail.com], I missed your point. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576466#comment-16576466 ] Ted Yu commented on HBASE-21031: I modified the dropMemStoreContents() method by passing it the region which encounters Throwable. {code} public MemStoreSize dropMemStoreContents(HRegion r) throws IOException { ... for (HStore s : stores.values()) { if (!s.getHRegion().equals(r)) continue; {code} TestRecoveredEidtsReplayAndAbort passes with the above change. > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, HBASE-21031.branch-2.0.003.patch, > memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576432#comment-16576432 ] Hadoop QA commented on HBASE-21031: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 23s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 41s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 15s{color} | {color:red} hbase-server: The patch generated 1 new + 250 unchanged - 0 fixed = 251 total (was 250) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 3m 7s{color} | {color:red} patch has 10 errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 41s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935134/HBASE-21031.branch-2.0.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2ad2799203a4 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 7ee4aa459c | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14002/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | shadedjars | https://builds.apache.org/job/PreCommit-HBASE-Build/14002/artifact/patchprocess/patch-shadedjars.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14002/testReport/ | |
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576025#comment-16576025 ] Hadoop QA commented on HBASE-21031: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 52s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 21s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 1 new + 250 unchanged - 0 fixed = 251 total (was 250) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 3m 11s{color} | {color:red} patch has 10 errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 30s{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}104m 30s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 20s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | Redundant nullcheck of region which is known to be null in org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion() Redundant null check at OpenRegionHandler.java:is known to be null in org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion() Redundant null check at OpenRegionHandler.java:[line 307] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12935089/HBASE-21031.branch-2.0.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 72a3918f2f0b 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 / 7ee4aa459c | | maven | version: Apache Maven 3.5.4
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575858#comment-16575858 ] Allan Yang commented on HBASE-21031: [~mdrob], uploaded the new patch to the review board, feel free to give advices. Thanks! > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, > HBASE-21031.branch-2.0.002.patch, memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575027#comment-16575027 ] Hadoop QA commented on HBASE-21031: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 51s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 10s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s{color} | {color:green} branch-2.0 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-2.0 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 15s{color} | {color:red} hbase-server: The patch generated 14 new + 303 unchanged - 0 fixed = 317 total (was 303) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 6s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 11m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 13s{color} | {color:red} hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 11s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}143m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-server | | | Null pointer dereference of region in org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion() on exception path Dereferenced at OpenRegionHandler.java:in org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion() on exception path Dereferenced at OpenRegionHandler.java:[line 308] | | Failed junit tests | hadoop.hbase.regionserver.TestHRegion | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 | | JIRA Issue | HBASE-21031 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12934976/HBASE-21031.branch-2.0.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux b022565dbae1 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.0 /
[jira] [Commented] (HBASE-21031) Memory leak if replay edits failed during region opening
[ https://issues.apache.org/jira/browse/HBASE-21031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574986#comment-16574986 ] Mike Drob commented on HBASE-21031: --- Very interesting failure scenario, Allan. Great job diagnosing it. I think I have feedback for the patch, would you mind uploading to review board? > Memory leak if replay edits failed during region opening > > > Key: HBASE-21031 > URL: https://issues.apache.org/jira/browse/HBASE-21031 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.1 >Reporter: Allan Yang >Assignee: Allan Yang >Priority: Major > Attachments: HBASE-21031.branch-2.0.001.patch, memoryleak.png > > > Due to HBASE-21029, when replaying edits with a lot of same cells, the > memstore won't flush, a exception will throw when all heap space was used: > {code} > 2018-08-06 15:52:27,590 ERROR > [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] > handler.OpenRegionHandler(302): Failed open of > region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., > starting to roll back the global memstore size. > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41) > at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180) > at > org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148) > at > org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111) > at > org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287) > at > org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107) > at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706) > at > org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608) > at > org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404) > {code} > After this exception, the memstore did not roll back, and since MSLAB is > used, all the chunk allocated won't release for ever. Those memory is leak > forever... > We need to rollback the memory if open region fails(For now, only global > memstore size is decreased after failure). > Another problem is that we use replayEditsPerRegion in RegionServerAccounting > to record how many memory used during replaying. And decrease the global > memstore size if replay fails. This is not right, since during replaying, we > may also flush the memstore, the size in the map of replayEditsPerRegion is > not accurate at all! -- This message was sent by Atlassian JIRA (v7.6.3#76005)