[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398881#comment-16398881 ]
ramkrishna.s.vasudevan commented on HBASE-20090: ------------------------------------------------ +1. Thanks [~yuzhih...@gmail.com] for working on this. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > ------------------------------------------------------------------------------- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Attachments: 20090-server-61260-01-000007.log, 20090.v6.txt, > 20090.v7.txt, 20090.v8.txt, 20090.v9.txt > > > Copied the following from a comment since this was better description of the > race condition. > The original description was merged to the beginning of my first comment > below. > With more debug logging, we can see the scenario where the exception was > triggered. > {code} > 2018-03-02 17:28:30,097 DEBUG [MemStoreFlusher.0] regionserver.CompactSplit: > Splitting TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085., > compaction_queue=(0:0), split_queue=1 > 2018-03-02 17:28:30,098 DEBUG > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] > regionserver.IncreasingToUpperBoundRegionSplitPolicy: ShouldSplit because > info size=6.9G, sizeToCheck=256.0M, regionsWithCommonTable=1 > 2018-03-02 17:28:30,296 INFO > [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,297 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush thread woke up because memory above low > water=381.5 M > 2018-03-02 17:28:30,297 INFO > [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] > regionserver.MemStoreFlusher: wake up flusher due to ABOVE_ONHEAP_LOWER_MARK > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > TestTable,,1520011528142.0453f29030757eedb6e6a1c57e88c085. with size 400432696 > 2018-03-02 17:28:30,298 DEBUG [MemStoreFlusher.1] > regionserver.MemStoreFlusher: region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. with size 0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Flush of region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global > heap pressure. Flush type=ABOVE_ONHEAP_LOWER_MARKTotal Memstore Heap > size=381.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: wake up by WAKEUPFLUSH_INSTANCE > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Nothing to flush for > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. > 2018-03-02 17:28:30,298 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Excluding unflushable region > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to > find a different region to flush. > {code} > Region 0453f29030757eedb6e6a1c57e88c085 was being split. > In HRegion#flushcache, the log from else branch can be seen in > 20090-server-61260-01-000007.log : > {code} > synchronized (writestate) { > if (!writestate.flushing && writestate.writesEnabled) { > this.writestate.flushing = true; > } else { > if (LOG.isDebugEnabled()) { > LOG.debug("NOT flushing memstore for region " + this > + ", flushing=" + writestate.flushing + ", writesEnabled=" > + writestate.writesEnabled); > } > {code} > Meaning, region 0453f29030757eedb6e6a1c57e88c085 couldn't flush, leaving > memory pressure at high level. > When MemStoreFlusher ran to the following call, the region was no longer a > flush candidate: > {code} > HRegion bestFlushableRegion = > getBiggestMemStoreRegion(regionsBySize, excludedRegions, true); > {code} > So the other region, > atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. , was examined > next. Since the region was not receiving write, the (current) Precondition > check failed. > The proposed fix is to convert the Precondition to normal return. -- This message was sent by Atlassian JIRA (v7.6.3#76005)