[ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377843#comment-16377843 ]
Ted Yu edited comment on HBASE-20090 at 3/5/18 1:48 PM: -------------------------------------------------------- Here is the code in branch-2 : {code} try { wakeupPending.set(false); // allow someone to wake us up again fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); if (fqe == null || fqe instanceof WakeupFlushThread) { ... if (!flushOneForGlobalPressure()) { ... FlushRegionEntry fre = (FlushRegionEntry) fqe; if (!flushRegion(fre)) { break; ... } catch (Exception ex) { LOG.error("Cache flusher failed for entry " + fqe, ex); if (!server.checkFileSystem()) { break; } } {code} Inside flushOneForGlobalPressure(): {code} Preconditions.checkState( (regionToFlush != null && regionToFlushSize > 0) || (bestRegionReplica != null && bestRegionReplicaSize > 0)); {code} When the Preconditions check fails, IllegalStateException is caught by the catch block shown above. However, the fqe is not flushed, resulting in potential data loss. Observed the following in region server log (in hadoop3 cluster): {code} 2018-02-26 16:06:49,962 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=804.67 KB 2018-02-26 16:06:50,028 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, hasBloomFilter=true, into tmp file hdfs:// mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25 2018-02-26 16:06:50,042 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/ 81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, filesize=813.9 K 2018-02-26 16:06:50,044 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region TestTable, 000000000000000000155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 82ms, sequenceid=5448, compaction requested=true 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver. MemStoreFlusher$WakeupFlushThread@2adfadd7 java.lang.IllegalStateException at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) at java.lang.Thread.run(Thread.java:748) {code} Unfortunately the DEBUG logging was not on. Will see if I can reproduce the exception next time. was (Author: yuzhih...@gmail.com): Observed the following in region server log (in hadoop3 cluster): {code} 2018-02-26 16:06:49,962 INFO [MemStoreFlusher.1] regionserver.HRegion: Flushing 1/1 column families, memstore=804.67 KB 2018-02-26 16:06:50,028 INFO [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, hasBloomFilter=true, into tmp file hdfs:// mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25 2018-02-26 16:06:50,042 INFO [MemStoreFlusher.1] regionserver.HStore: Added hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/ 81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, filesize=813.9 K 2018-02-26 16:06:50,044 INFO [MemStoreFlusher.1] regionserver.HRegion: Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region TestTable, 000000000000000000155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 82ms, sequenceid=5448, compaction requested=true 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms 2018-02-26 16:06:50,044 WARN [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms 2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver. MemStoreFlusher$WakeupFlushThread@2adfadd7 java.lang.IllegalStateException at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) at java.lang.Thread.run(Thread.java:748) {code} Unfortunately the DEBUG logging was not on. Will see if I can reproduce the exception next time. > Properly handle Preconditions check failure in > MemStoreFlusher$FlushHandler.run > ------------------------------------------------------------------------------- > > Key: HBASE-20090 > URL: https://issues.apache.org/jira/browse/HBASE-20090 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: Ted Yu > Priority: Major > Attachments: 20090-server-61260-01-000007.log, 20090.v1.txt, > 20090.v4.txt, 20090.v5.txt > > > Here is the code in branch-2 : > {code} > try { > wakeupPending.set(false); // allow someone to wake us up again > fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS); > if (fqe == null || fqe instanceof WakeupFlushThread) { > ... > if (!flushOneForGlobalPressure()) { > ... > FlushRegionEntry fre = (FlushRegionEntry) fqe; > if (!flushRegion(fre)) { > break; > ... > } catch (Exception ex) { > LOG.error("Cache flusher failed for entry " + fqe, ex); > if (!server.checkFileSystem()) { > break; > } > } > {code} > Inside flushOneForGlobalPressure(): > {code} > Preconditions.checkState( > (regionToFlush != null && regionToFlushSize > 0) || > (bestRegionReplica != null && bestRegionReplicaSize > 0)); > {code} > When the Preconditions check fails, IllegalStateException is caught by the > catch block shown above. > However, the fqe is not flushed, resulting in potential data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)