[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407182#comment-16407182 ]
Ted Yu commented on HBASE-20090: -------------------------------- I have updated both description and Release Note. Thanks > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > ------------------------------------------------------------------------------- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Attachments: 20090-server-61260-01-000007.log, 20090.v10.txt, > 20090.v10.txt, 20090.v6.txt, 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > Observed the following in region server log (running on hadoop3 cluster): > {code} > 2018-02-26 16:06:50,044 WARN > [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] > regionserver.MemStoreFlusher: Memstore is above high water mark and block > 135ms > 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Cache flusher failed for entry > org.apache.hadoop.hbase.regionserver. > MemStoreFlusher$WakeupFlushThread@2adfadd7 > java.lang.IllegalStateException > at > org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) > at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:748) > {code} > Here is the Precondition from MemStoreFlusher#flushOneForGlobalPressure() : > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > bestRegionReplicaSize > 0); > {code} > When the Preconditions check fails, IllegalStateException would be raised. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-000007.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)