[
https://issues.apache.org/jira/browse/HBASE-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731061#action_12731061
]
Jean-Daniel Cryans commented on HBASE-1439:
-------------------------------------------
This log from a recent trunk makes me think we should still do that special
watcher (the master split logs around 14:11:58):
{code}
2009-07-13 14:10:44,219 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region t3,,1247502416090 in 0sec
2009-07-13 14:10:46,735 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_CLOSE:
t3,******************,1247507770509: Overloaded
2009-07-13 14:10:46,743 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_CLOSE:
t3******************,1247507770509: Overloaded
2009-07-13 14:10:46,743 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t3,******************,1247507770509: compactions & flushes disabled
2009-07-13 14:10:46,743 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on
t3,**************,1247507770509
2009-07-13 14:10:46,744 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No
more row locks outstanding on region t3,U**************9,1247507770509
2009-07-13 14:10:46,744 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed block
2009-07-13 14:10:46,744 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t3,,******************,1247507770509
2009-07-13 14:11:54,283 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Cache Stats: Sizes: Total=336.81897MB (353180272), Free=59.856064MB (62763632),
Max=396.67502MB (415943904), Counts: Blocks=5251, Access=259404, Hit=5232,
Miss=2
54172, Evictions=13, Evicted=8223, Ratios: Hit Ratio=2.016931027173996%, Miss
Ratio=97.98306822776794%, Evicted/Run=632.5384521484375
2009-07-13 14:11:54,375 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x12274aa40060001 to sun.nio.ch.selectionkeyi...@2b20bf2c
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:11:54,375 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x22274aa40700002 to sun.nio.ch.selectionkeyi...@12746ad0
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:11:58,996 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper
event, state: Disconnected, type: None, path: null
2009-07-13 14:11:59,002 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message (Retry:
0)
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
...
2009-07-13 14:11:59,097 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded max retries: 10
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:561)
at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server **************/**************:2181
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected
local=/**************53663 remote=**************/**************:2181]
2009-07-13 14:11:59,262 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server **************/**************:2181
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected
local=/**************:44760 remote=**************/**************:2181]
2009-07-13 14:11:59,635 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-07-13 14:12:00,103 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state:
Expired, type: None, path: null
2009-07-13 14:12:00,104 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x12274aa40060001 to sun.nio.ch.selectionkeyi...@5e297e83
java.io.IOException: Session Expired
at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
2009-07-13 14:12:00,104 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired
2009-07-13 14:12:00,104 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Restarting Region Server
2009-07-13 14:12:00,105 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded max retries: 10
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:561)
at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:00,106 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4,
memstoreSize=96, usedHeap=1153, maxHeap=1983, blockCacheSize=353180272,
blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:00,110 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC
Server handler 9 on 60020 took 1033ms appending an edit to hlog; editcount=60781
2009-07-13 14:12:04,256 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC
Server handler 9 on 60020 took 4128ms appending an edit to hlog; editcount=62375
2009-07-13 14:12:04,257 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stop was requested,
clearing the toDo despite of the exception
2009-07-13 14:12:04,257 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x22274aa40700002 to sun.nio.ch.selectionkeyi...@54f015d5
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:12:04,257 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting
2009-07-13 14:12:04,257 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
2009-07-13 14:12:04,258 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server listener on 60020
2009-07-13 14:12:04,282 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-07-13 14:12:04,286 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
regionserver/**************:60020.cacheFlusher exiting
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
LogRoller exiting.
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.Leases:
regionserver/**************:60020.leaseChecker closing leases
2009-07-13 14:12:04,286 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
2009-07-13 14:12:04,286 INFO org.apache.hadoop.hbase.Leases:
regionserver/**************:60020.leaseChecker closed leases
2009-07-13 14:12:04,318 INFO org.apache.hadoop.hbase.regionserver.LogFlusher:
regionserver/**************:60020.logFlusher exiting
2009-07-13 14:12:04,484 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
Server Responder
2009-07-13 14:12:04,494 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60020: exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60020: exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60020: exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60020: exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60020: exiting
2009-07-13 14:12:04,512 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60020: exiting
2009-07-13 14:12:04,513 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60020: exiting
2009-07-13 14:12:05,076 DEBUG org.apache.hadoop.hbase.regionserver.HLog:
closing hlog writer in
hdfs://**************:50000/hbase/.logs/**************,60020,1247497897306
2009-07-13 14:12:05,076 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Cannot
append; log is closed
2009-07-13 14:12:05,076 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/**************:60020.compactor exiting
2009-07-13 14:12:05,076 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
regionserver/**************:60020.majorCompactionChecker exiting
2009-07-13 14:12:05,138 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log in abort
java.io.IOException: java.io.IOException: Could not complete write to file
/hbase/.logs/**************60020,1247497897306/hlog.dat.1247508029747 by
DFSClient_51605845
at
org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:449)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:626)
at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:05,138 INFO org.apache.hadoop.fs.FileSystem: Could not cancel
cleanup thread, though no FileSystems are open
2009-07-13 14:12:05,138 ERROR org.apache.hadoop.hbase.util.FSUtils: file system
close failed:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeThreads(DFSClient.java:3127)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3170)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3115)
at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:1002)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:208)
at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:269)
at
org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:119)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:861)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:809)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:792)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1793)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
2009-07-13 14:12:05,138 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region
t3,**************
2009-07-13 14:12:05,138 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down
HRegionServer: file system not available
java.io.IOException: File system is not available
at
org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:123)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:861)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:809)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:792)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1793)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
Caused by: java.io.IOException: Call to **************1/**************:50000
failed on local exception: java.nio.channels.ClosedChannelException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
at org.apache.hadoop.ipc.Client.call(Client.java:742)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:587)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at
org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:112)
... 9 more
Caused by: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156)
at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:276)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
2009-07-13 14:12:05,138 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t3,**************: compactions & flushes disabled
2009-07-13 14:12:05,143 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on t3,**************
2009-07-13 14:12:05,143 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4,
memstoreSize=96, usedHeap=1156, maxHeap=1983, blockCacheSize=353180272,
blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:05,143 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No
more row locks outstanding on region t3,**************
2009-07-13 14:12:05,144 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020, call put([...@1df68388,
[Lorg.apache.hadoop.hbase.client.Put;@23356516) from **************:48126:
error: java.io.IOException: Cannot append; log is
closed
java.io.IOException: Cannot append; log is closed
at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1446)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1246)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1209)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1785)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:913)
2009-07-13 14:12:05,151 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t3,**************
2009-07-13 14:12:05,152 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region
t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t3,**************: compactions & flushes disabled
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on t3**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No
more row locks outstanding on region t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t3,**************
2009-07-13 14:12:05,152 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region
t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t3,,1247502416090: compactions & flushes disabled
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No
more row locks outstanding on region t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed block
2009-07-13 14:12:05,152 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t3,,1247502416090
2009-07-13 14:12:05,152 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: closing region
t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Closing t3,**************: compactions & flushes disabled
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Updates disabled for region, no outstanding scanners on t3,**************
2009-07-13 14:12:05,152 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: No
more row locks outstanding on region t3,**************
-07-13 14:12:05,153 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call put([...@1df68388,
[Lorg.apache.hadoop.hbase.client.Put;@23356516) from **************:48126:
output error
2009-07-13 14:12:05,156 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020 caught: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1122)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:613)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:677)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:941)
2009-07-13 14:12:05,156 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60020: exiting
2009-07-13 14:12:05,156 DEBUG org.apache.hadoop.hbase.regionserver.Store:
closed block
2009-07-13 14:12:05,156 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Closed t3,**************
2009-07-13 14:12:05,156 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
**************:60020
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ZooKeeper: Closing session:
0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
ClientCnxn for session: 0x12274aa40060001
2009-07-13 14:12:05,157 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12274aa40060001 closed
2009-07-13 14:12:05,157 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with
ZooKeeper
2009-07-13 14:12:05,157 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/**************:60020 exiting
2009-07-13 14:12:05,172 INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
Initializing RPC Metrics with hostName=HRegionServer, port=60020
2009-07-13 14:12:05,181 INFO
org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
globalMemStoreLimit=793.4m, globalMemStoreLimitLowMark=495.8m, maxHeap=1.9g
2009-07-13 14:12:05,181 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown thread.
2009-07-13 14:12:05,195 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
2009-07-13 14:12:05,195 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every 10000000ms
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server **************/**************:2181
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected
local=/**************:33151 remote=**************/**************:2181]
2009-07-13 14:12:05,219 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-07-13 14:12:05,380 INFO org.apache.zookeeper.ZooKeeper: Initiating client
connection, host=*****************:2181 sessionTimeout=30000
watcher=org.apache.hadoop.hbase.regionserver.hregionser...@47315d34
2009-07-13 14:12:10,636 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2009-07-13 14:12:10,636 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server **************/**************:2181
2009-07-13 14:12:10,636 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:710)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:425)
at java.lang.Thread.run(Thread.java:619)
2009-07-13 14:12:10,636 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
Aborting...
java.io.IOException: Region server startup failed
at
org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:829)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:748)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:425)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:710)
... 2 more
2009-07-13 14:12:10,636 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=4, stores=4, storefiles=7, storefileIndexSize=4,
memstoreSize=96, usedHeap=1173, maxHeap=1983, blockCacheSize=353180272,
blockCacheFree=62763632, blockCacheCount=5251, blockCacheHitRatio=2
2009-07-13 14:12:10,636 DEBUG org.apache.hadoop.hbase.RegionHistorian: Offlined
2009-07-13 14:12:11,084 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected
local=/**************:52015 remote=**************/**************:2181]
2009-07-13 14:12:11,084 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-07-13 14:12:11,086 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2009-07-13 14:12:11,086 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x22274aa40700002 to sun.nio.ch.selectionkeyi...@411cac1d
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:858)
2009-07-13 14:12:11,090 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
2009-07-13 14:12:11,135 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
**************:60020
2009-07-13 14:12:11,135 INFO org.apache.zookeeper.ZooKeeper: Closing session:
0x0
2009-07-13 14:12:11,135 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x0
2009-07-13 14:12:11,365 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state:
SyncConnected, type: None, path: null
2009-07-13 14:12:11,365 DEBUG
org.apache.hadoop.hbase.regionserver.HRegionServer: Ignoring ZooKeeper event
while shutting down
2009-07-13 14:12:11,417 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server **************/**************:2181
2009-07-13 14:12:11,421 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected
local=/**************52016 remote=**************/**************:2181]
2009-07-13 14:12:11,421 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-07-13 14:12:11,422 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x22274aa40700002 to sun.nio.ch.selectionkeyi...@44a45d2c
java.io.IOException: Session Expired
at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:548)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:661)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897)
2009-07-13 14:12:11,422 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper
event, state: Expired, type: None, path: null
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ZooKeeper: Closing session:
0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
ClientCnxn for session: 0x22274aa40700002
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ZooKeeper: Session:
0x22274aa40700002 closed
2009-07-13 14:12:11,422 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with
ZooKeeper
2009-07-13 14:12:11,422 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2009-07-13 14:12:11,997 INFO org.apache.zookeeper.ClientCnxn: Exception while
closing send thread for session 0x12274aa40060005 : Read error rc = -1
java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
ClientCnxn for session: 0x12274aa40060005
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12274aa40060005 closed
2009-07-13 14:12:12,101 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2009-07-13 14:12:12,101 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with
ZooKeeper
2009-07-13 14:12:12,149 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/**************:60020 exiting
{code}
> race between master and regionserver after missed heartbeat
> -----------------------------------------------------------
>
> Key: HBASE-1439
> URL: https://issues.apache.org/jira/browse/HBASE-1439
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.19.1
> Environment: CentOS 5.2 x86_64, HBase 0.19.1, Hadoop 0.19.1
> Reporter: Andrew Purtell
> Priority: Blocker
> Fix For: 0.20.1
>
>
> Seen on one of our 0.19.1 clusters:
> {code}
> java.io.FileNotFoundException: File does not exist:
> hdfs://jdc2-atr-dc-2.atr.trendmicro.com:50000
> /data/hbase/log_10.3.134.207_1242286427894_60020/hlog.dat.1242528291898
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
> at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
> at
> org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
> at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:753)
> at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:716)
> at
> org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:249)
> at org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:442)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:377)
> 2009-05-17 04:05:55,481 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: process
> shutdown of server 10.3.134.207:60020: logSplit: false, rootRescanned: false,
> numberOfMetaRegions: 1,
> onlineMetaRegions.size(): 1
> {code}
> I do not have the region server log yet, but here is my conjecture:
> Here, the write ahead log for 10.3.134.207 is missing in DFS:
> java.io.FileNotFoundException:
> hdfs://jdc2-atr-dc-2.atr.trendmicro.com:50000/data/hbase/log_10.3.134.207_1242286427894_60020/hlog.dat.1242528291898
> when the master tries to split it after declaring the region server crashed.
> There have been recent trouble reports on this cluster that indicate severe
> memory stress, e.g. kernel panics due to OOM. Based on that I think it is
> likely that the region server here missed a heartbeat so the master declared
> it crashed and began to split the log. But, the log was then deleted out from
> underneath the master's split thread. I think the region server was actually
> still running but partially swapped out or the node was otherwise overloaded
> so it missed its heartbeat. Then, when the region server came back after
> being swapped in, it realized it missed its heartbeat and shut down, deleting
> the log as is normally done.
> Even if the above is not the actual cause in this case, I think the scenario
> is plausible. What do you think?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.