[ https://issues.apache.org/jira/browse/HBASE-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Kyle Purtell resolved HBASE-9635. ---------------------------------------- Resolution: Incomplete > HBase Table regions are not getting re-assigned to the new region server when > it comes up (when the existing region server not able to handle the load) > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-9635 > URL: https://issues.apache.org/jira/browse/HBASE-9635 > Project: HBase > Issue Type: Bug > Components: master, regionserver > Affects Versions: 0.94.11 > Environment: SuSE11 > Reporter: shankarlingayya > Priority: Major > > {noformat} > HBase Table regions are not getting assigned to the new region server for a > period of 30 minutes (when the existing region server not able to handle the > load) > Procedure: > 1. Setup Non HA Hadoop Cluster with two nodes (Node1-XX.XX.XX.XX, > Node2-YY.YY.YY.YY) > 2. Install Zookeeper & HRegionServer in Node-1 > 3. Install HMaster & HRegionServer in Node-2 > 4. From Node2 create HBase Table ( table name 't1' with one column family > 'cf1' ) > 5. Perform addrecord 99649 rows > 6. kill both the node Region Server and limit the Node1 Region Server FD to > 600 > 7. Start only the Node1 Region server ==> so that FD exhaust can happen in > Node1 Region Server > 8. After some 5-10 minuites start the Node2 Region Server > ===> Huge number of regions of table 't1' are in OPENING state, which are not > getting re assigned to the Node2 region server which is free. > ===> When the new region server comes up then the master should detect and > allocate the open failed regions to the region server (here it is staying the > OPENINING state for 30 minutes which will have huge impcat user app which > makes use of this table) > 2013-09-23 18:46:12,160 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Instantiated t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd. > 2013-09-23 18:46:12,160 ERROR > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open > of region=t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd., > starting to roll back the global memstore size. > 2013-09-23 18:50:55,284 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to > renew lease for > [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 309 > seconds. Will retry shortly ... > java.io.IOException: Failed on local exception: java.net.SocketException: Too > many open files; Host Details : local host is: > "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy13.renewLease(Unknown Source) > at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at $Proxy13.renewLease(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522) > at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679) > at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) > at > org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) > at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:97) > at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) > at java.nio.channels.SocketChannel.open(SocketChannel.java:105) > at > org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) > at > org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) > at org.apache.hadoop.ipc.Client.call(Client.java:1318) > ... 16 more > 2013-09-23 18:50:56,285 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to > renew lease for > [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 310 > seconds. Will retry shortly ... > java.io.IOException: Failed on local exception: java.net.SocketException: Too > many open files; Host Details : local host is: > "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at $Proxy13.renewLease(Unknown Source) > at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at $Proxy13.renewLease(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522) > at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679) > at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417) > at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442) > at > org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) > at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:97) > at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) > at java.nio.channels.SocketChannel.open(SocketChannel.java:105) > at > org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642) > at > org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399) > at org.apache.hadoop.ipc.Client.call(Client.java:1318) > ... 16 more > 2013-09-23 18:50:57,287 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to > renew lease for > [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 311 > seconds. Will retry shortly ... > java.io.IOException: Failed on local exception: java.net.SocketException: Too > many open files; Host Details : local host is: > "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020; > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)