[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383145#comment-16383145 ]
Ted Yu edited comment on HBASE-20090 at 3/2/18 10:11 AM: --------------------------------------------------------- I added log (the first line below) for related variables in the Precondition check: {code} 2018-03-02 00:58:18,880 DEBUG [MemStoreFlusher.0] regionserver.MemStoreFlusher: regionToFlush ATLAS_ENTITY_AUDIT_EVENTS,,1519927487389.6b67b274d95d61fcf4c5ab91e102994d. regionToFlushSize=0 bestRegionReplica null bestRegionReplicaSize=0 2018-03-02 00:58:18,881 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.MemStoreFlusher$1@2a java.lang.IllegalStateException at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:259) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$700(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:345) {code} We can see that bestRegionReplica was null and the region for ATLAS_ENTITY_AUDIT_EVENTS had 0 flush size(because TestTable was written to, not ATLAS_ENTITY_AUDIT_EVENTS). It seems the Preconditions check can be converted to a normal condition check. [~ram_krish] [~anoop.hbase] [~anastas] : Can you take a look at the patch ? Here was snippet from region server log during PE randomWrite: {code} 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to find a different region to flush. {code} Note atlas_janus was not the table being written. TestTable was being written to. was (Author: yuzhih...@gmail.com): It seems the Preconditions check can be converted to a normal condition check. [~ram_krish] [~anoop.hbase] [~anastas] : Can you take a look at the patch ? Here was snippet from region server log during PE randomWrite: {code} 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. 2018-03-02 03:55:19,232 INFO [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. - trying to find a different region to flush. {code} Note atlas_janus was not the table being written. TestTable was being written to. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > ------------------------------------------------------------------------------- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Attachments: 20094.v01.patch > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)