Stopping a HRegionServer with unflushed cache causes data loss from 
org.apache.hadoop.hbase.DroppedSnapshotException 
---------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-1052
                 URL: https://issues.apache.org/jira/browse/HBASE-1052
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.18.1, 0.18.0
            Reporter: Cosmin Lehene
            Priority: Critical
             Fix For: 0.18.2


 1.  Start a Hbase cluster
 2.  Create a table t1: create 't1', {NAME => 'f1'}
 3.  Put a cell in the table: put 't1', 'r1', 'f1:', 'value'
 4.  Scan it, see it's fine
 5.  Stop the HRegionSever hosting the t1 region: hbase/bin/hbase-daemon.sh 
stop regionserver.
 6.  Watch the region being reassigned from the original HRegionServer
 7.  Scan the t1 table again. It's empty now.

If between step 4 and step 5 the cache is flushed (e.g. Hbase cluster restart) 
no data is loss. However it means that if you stop a region server with dirty 
cache you will loose some data. 



HRegionServer log after issuing hbase-daemon.sh stop regionserver:

2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 60020: exiting
2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 
on 60020: exiting
2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 60020: exiting
2008-12-09 06:37:46,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 60020: exiting
2008-12-09 06:37:46,874 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2008-12-09 06:37:46,874 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2008-12-09 06:37:46,874 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor 
ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=60030]
2008-12-09 06:37:46,886 INFO org.mortbay.http.SocketListener: Stopped 
SocketListener on 0.0.0.0:60030
2008-12-09 06:37:46,948 INFO org.mortbay.util.Container: Stopped 
HttpContext[/static,/static]
2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped 
HttpContext[/logs,/logs]
2008-12-09 06:37:47,007 INFO org.mortbay.util.Container: Stopped [EMAIL 
PROTECTED]
2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped 
WebApplicationContext[/,/]
2008-12-09 06:37:47,094 INFO org.mortbay.util.Container: Stopped [EMAIL 
PROTECTED]
2008-12-09 06:37:47,094 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region 
t1,,1228833363456
2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Compactions and cache flushes disabled for region t1,,1228833363456
2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Scanners disabled for region t1,,1228833363456
2008-12-09 06:37:47,094 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No 
more active scanners for region t1,,1228833363456
2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Updates disabled for region t1,,1228833363456
2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No 
more row locks outstanding on region t1,,1228833363456
2008-12-09 06:37:47,095 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Started memcache flush for region t1,,1228833363456. Current region memcache 
size 18.0
2008-12-09 06:37:47,095 INFO org.apache.hadoop.hbase.regionserver.Flusher: 
regionserver/0:0:0:0:0:0:0:0:60020.cacheFlusher exiting
2008-12-09 06:37:47,096 INFO org.apache.hadoop.hbase.regionserver.LogRoller: 
LogRoller exiting.
2008-12-09 06:37:47,096 INFO 
org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
regionserver/0:0:0:0:0:0:0:0:60020.compactor exiting
2008-12-09 06:37:47,099 ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: error closing region 
t1,,1228833363456
org.apache.hadoop.hbase.DroppedSnapshotException: region: t1,,1228833363456
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1071)
    at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:619)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.closeAllRegions(HRegionServer.java:951)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459)
    at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Filesystem closed
    at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
    at org.apache.hadoop.dfs.DFSClient.getFileInfo(DFSClient.java:564)
    at 
org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
    at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
    at 
org.apache.hadoop.hbase.regionserver.HStoreFile.<init>(HStoreFile.java:152)
    at 
org.apache.hadoop.hbase.regionserver.HStore.internalFlushCache(HStore.java:599)
    at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:577)
    at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1058)
    ... 4 more
2008-12-09 06:37:47,100 DEBUG org.apache.hadoop.hbase.regionserver.HLog: 
closing log writer in 
hdfs://h1:54310/hbase/log_10.131.237.51_1228833326838_60020
2008-12-09 06:37:47,101 ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: Close and delete failed
java.io.IOException: Filesystem closed
    at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:196)
    at org.apache.hadoop.dfs.DFSClient.access$600(DFSClient.java:59)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2689)
    at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2655)
    at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
    at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
    at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:962)
    at org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:349)
    at org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:333)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:461)
    at java.lang.Thread.run(Thread.java:619)
2008-12-09 06:37:47,102 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: telling master that region 
server is shutting down at: 10.131.237.51:60020
2008-12-09 06:37:47,104 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server at: 
10.131.237.51:60020
2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: 
regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closing leases
2008-12-09 06:37:47,882 INFO org.apache.hadoop.hbase.Leases: 
regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker closed leases
2008-12-09 06:37:54,919 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
2008-12-09 06:37:54,920 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
regionserver/0:0:0:0:0:0:0:0:60020 exiting
2008-12-09 06:37:54,920 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to