[ https://issues.apache.org/jira/browse/HBASE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462174#comment-17462174 ]
Hudson commented on HBASE-26488: -------------------------------- Results for branch master [build #473 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/473/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/473/General_20Nightly_20Build_20Report/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/473/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/master/473/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Memory leak when MemStore retry flushing > ---------------------------------------- > > Key: HBASE-26488 > URL: https://issues.apache.org/jira/browse/HBASE-26488 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 3.0.0-alpha-1, 2.4.8 > Reporter: chenglei > Assignee: chenglei > Priority: Major > Labels: regionserver > Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.10 > > > {{HStore.flushCache}} use following {{DefaultStoreFlusher.flushSnapshot}} to > flush memStore and retry this method when this method throws exception(the > default retry count is 10),but increasing reference count of > {{MemStoreLABImpl}} which used by {{DefaultMemStore.snapshot}} in following > line 55 and decreasing reference count of {{MemStoreLABImpl}} in following > line 82 is asymmetrical: > {code:java} > 47 public List<Path> flushSnapshot(MemStoreSnapshot snapshot, long > cacheFlushId, > 48 MonitoredTask status, ThroughputController throughputController, > 49 FlushLifeCycleTracker tracker) throws IOException { > 50 ArrayList<Path> result = new ArrayList<>(); > 51 int cellsCount = snapshot.getCellsCount(); > 52 if (cellsCount == 0) return result; // don't flush if there are no > entries > 53 > 54 // Use a store scanner to find which rows to flush. > 55 InternalScanner scanner = createScanner(snapshot.getScanners(), > tracker); > 56 StoreFileWriter writer; > 57 try { > 58 // TODO: We can fail in the below block before we complete adding > this flush to > 59 // list of store files. Add cleanup of anything put on > filesystem if we fail. > 60 synchronized (flushLock) { > 61 status.setStatus("Flushing " + store + ": creating writer"); > 62 // Write the map out to the disk > 63 writer = store.createWriterInTmp(cellsCount, > 64 store.getColumnFamilyDescriptor().getCompressionType(), false, > true, > 65 snapshot.isTagsPresent(), false); > 66 IOException e = null; > 67 try { > 68 performFlush(scanner, writer, throughputController); > 69 } catch (IOException ioe) { > 70 e = ioe; > 71 // throw the exception out > 72 throw ioe; > 73 } finally { > 74 if (e != null) { > 75 writer.close(); > 76 } else { > 77 finalizeWriter(writer, cacheFlushId, status); > 78 } > 79 } > 80 } > 81 } finally { > 82 scanner.close(); > 83 } > ...... > {code} > The asymmetry because above line 55 {{snapshot.getScanners}} ,which is > following {{MemStoreSnapshot.getScanners}}: > {code:java} > public List<KeyValueScanner> getScanners() { > return scanners; > } > {code} > {{MemStoreSnapshot.getScanners}} does not create new {{SegmentScanner}}, > instead it reuses the {{SegmentScanner}} in following line 43 created in > {{MemStoreSnapshot}} ctor: > {code:java} > 38 public MemStoreSnapshot(long id, ImmutableSegment snapshot) { > 39 this.id = id; > 40 this.cellsCount = snapshot.getCellsCount(); > 41 this.memStoreSize = snapshot.getMemStoreSize(); > 42 this.timeRangeTracker = snapshot.getTimeRangeTracker(); > 43 this.scanners = snapshot.getSnapshotScanners(); > 44 this.tagsPresent = snapshot.isTagsPresent(); > 45 } > {code} > So when flushing MemStore by {{DefaultStoreFlusher.flushSnapshot}} at first > time, the reference count of {{MemStoreLABImpl}} which used by > {{DefaultMemStore.snapshot}} is 1, and if > {{DefaultStoreFlusher.flushSnapshot}} throws exception, the > {{SegmentScanner}} of {{DefaultMemStore.snapshot}} is closed at > the end of {{DefaultStoreFlusher.flushSnapshot}} , and the reference count > of corresponding {{MemStoreLABImpl}} is decreased to 0. > When retry flushing {{MemStore}} by {{DefaultStoreFlusher.flushSnapshot}} at > second time, if it success, the {{SegmentScanner}} of > {{DefaultMemStore.snapshot}} is closed again at the end of > {{DefaultStoreFlusher.flushSnapshot}} , and the reference count of > corresponding {{MemStoreLABImpl}} is decreased to -1. When flush is > completed, {{DefaultMemStore.clearSnapshot}} is called to close > {{DefaultMemStore.snapshot}} and corresponding {{MemStoreLABImpl}} , but in > the following line 268, reference count is -1 not 0, so > {{MemStoreLABImpl.recycleChunks}} is not called and the Chunks used by this > {{MemStoreLABImpl}} is leaked. > {code:java} > 261 public void close() { > 262 if (!this.closed.compareAndSet(false, true)) { > 263 return; > 264 } > 265 // We could put back the chunks to pool for reusing only when there is > no > 266 // opening scanner which will read their data > 267 int count = openScannerCount.get(); > 268 if(count == 0) { > 269 recycleChunks(); > 270 } > 271 } > {code} > My fix is for {{MemStoreSnapshot.getScanners}}, we call > {{snapshot.getSnapshotScanners}} to create a new {{SegmentScanner}}, and we > do not create {{SegmentScanner}} in {{MemStoreSnapshot}} ctor. -- This message was sent by Atlassian Jira (v8.20.1#820001)