[ 
https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377843#comment-16377843
 ] 

Ted Yu edited comment on HBASE-20090 at 3/5/18 1:48 PM:
--------------------------------------------------------

Here is the code in branch-2 :
{code}
        try {
          wakeupPending.set(false); // allow someone to wake us up again
          fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
          if (fqe == null || fqe instanceof WakeupFlushThread) {
...
              if (!flushOneForGlobalPressure()) {
...
          FlushRegionEntry fre = (FlushRegionEntry) fqe;
          if (!flushRegion(fre)) {
            break;
...
        } catch (Exception ex) {
          LOG.error("Cache flusher failed for entry " + fqe, ex);
          if (!server.checkFileSystem()) {
            break;
          }
        }
{code}
Inside flushOneForGlobalPressure():
{code}
      Preconditions.checkState(
        (regionToFlush != null && regionToFlushSize > 0) ||
        (bestRegionReplica != null && bestRegionReplicaSize > 0));
{code}
When the Preconditions check fails, IllegalStateException is caught by the 
catch block shown above.
However, the fqe is not flushed, resulting in potential data loss.

Observed the following in region server log (in hadoop3 cluster):
{code}
2018-02-26 16:06:49,962 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=804.67 KB
2018-02-26 16:06:50,028 INFO  [MemStoreFlusher.1] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, 
hasBloomFilter=true, into tmp file hdfs://      
mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25
2018-02-26 16:06:50,042 INFO  [MemStoreFlusher.1] regionserver.HStore: Added 
hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/
    81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, 
filesize=813.9 K
2018-02-26 16:06:50,044 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region 
TestTable,                  
000000000000000000155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 
82ms, sequenceid=5448, compaction requested=true
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms

2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.            
               MemStoreFlusher$WakeupFlushThread@2adfadd7
java.lang.IllegalStateException
        at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
        at java.lang.Thread.run(Thread.java:748)
{code}
Unfortunately the DEBUG logging was not on.
Will see if I can reproduce the exception next time.


was (Author: yuzhih...@gmail.com):
Observed the following in region server log (in hadoop3 cluster):
{code}
2018-02-26 16:06:49,962 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Flushing 1/1 column families, memstore=804.67 KB
2018-02-26 16:06:50,028 INFO  [MemStoreFlusher.1] 
regionserver.DefaultStoreFlusher: Flushed, sequenceid=5448, memsize=804.7 K, 
hasBloomFilter=true, into tmp file hdfs://      
mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/.tmp/info/81721cc57fee43ebb55ba430f5730c25
2018-02-26 16:06:50,042 INFO  [MemStoreFlusher.1] regionserver.HStore: Added 
hdfs://mycluster/apps/hbase/data/data/default/TestTable/3552368c92476437cb96e357d2c7d618/info/
    81721cc57fee43ebb55ba430f5730c25, entries=784, sequenceid=5448, 
filesize=813.9 K
2018-02-26 16:06:50,044 INFO  [MemStoreFlusher.1] regionserver.HRegion: 
Finished memstore flush of ~804.67 KB/823984, currentsize=0 B/0 for region 
TestTable,                  
000000000000000000155728,1519661093622.3552368c92476437cb96e357d2c7d618. in 
82ms, sequenceid=5448, compaction requested=true
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 185ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 163ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=22,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=25,queue=1,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 160ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=23,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 158ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 151ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=24,queue=0,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 147ms
2018-02-26 16:06:50,044 WARN  
[RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=16020] 
regionserver.MemStoreFlusher: Memstore is above high water mark and block 135ms

2018-02-26 16:06:50,049 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: 
Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.            
               MemStoreFlusher$WakeupFlushThread@2adfadd7
java.lang.IllegalStateException
        at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:174)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$600(MemStoreFlusher.java:69)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
        at java.lang.Thread.run(Thread.java:748)
{code}
Unfortunately the DEBUG logging was not on.
Will see if I can reproduce the exception next time.

> Properly handle Preconditions check failure in 
> MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-20090
>                 URL: https://issues.apache.org/jira/browse/HBASE-20090
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Major
>         Attachments: 20090-server-61260-01-000007.log, 20090.v1.txt, 
> 20090.v4.txt, 20090.v5.txt
>
>
> Here is the code in branch-2 :
> {code}
>         try {
>           wakeupPending.set(false); // allow someone to wake us up again
>           fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>           if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>               if (!flushOneForGlobalPressure()) {
> ...
>           FlushRegionEntry fre = (FlushRegionEntry) fqe;
>           if (!flushRegion(fre)) {
>             break;
> ...
>         } catch (Exception ex) {
>           LOG.error("Cache flusher failed for entry " + fqe, ex);
>           if (!server.checkFileSystem()) {
>             break;
>           }
>         }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>       Preconditions.checkState(
>         (regionToFlush != null && regionToFlushSize > 0) ||
>         (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the 
> catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to