[ 
https://issues.apache.org/jira/browse/HBASE-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789004#comment-13789004
 ] 

rajeshbabu commented on HBASE-9635:
-----------------------------------

[~shankarlingayya]
bq. In my case when open failed in RegionServer1 at that the RegionServer2 was 
not up, and after that RegionServer2 came up successfully (means there is some 
time gap around 1-3 min between the open failure and the new RegionServer2 
coming up)
Before starting second region server, retries could have exhausted for all the 
region assignments. So waiting for timeout monitor to pick them to 
re-assign(after 30 mins). After starting new RS, you can run hbck tool to fix 
the assignments.
command : $HBASE_HOME/bin/hbase hbck -fixAssignments


> HBase Table regions are not getting re-assigned to the new region server when 
> it comes up (when the existing region server not able to handle the load) 
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9635
>                 URL: https://issues.apache.org/jira/browse/HBASE-9635
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.94.11
>         Environment: SuSE11
>            Reporter: shankarlingayya
>
> {noformat}
> HBase Table regions are not getting assigned to the new region server for a 
> period of 30 minutes (when the existing region server not able to handle the 
> load)
> Procedure:
> 1. Setup Non HA Hadoop Cluster with two nodes (Node1-XX.XX.XX.XX,  
> Node2-YY.YY.YY.YY)
> 2. Install Zookeeper & HRegionServer in Node-1
> 3. Install HMaster & HRegionServer in Node-2
> 4. From Node2 create HBase Table ( table name 't1' with one column family 
> 'cf1' )
> 5. Perform addrecord 99649 rows 
> 6. kill both the node Region Server and limit the Node1 Region Server FD to 
> 600
> 7. Start only the Node1 Region server ==> so that FD exhaust can happen in 
> Node1 Region Server
> 8. After some 5-10 minuites start the Node2 Region Server
> ===> Huge number of regions of table 't1' are in OPENING state, which are not 
> getting re assigned to the Node2 region server which is free. 
> ===> When the new region server comes up then the master should detect and 
> allocate the open failed regions to the region server (here it is staying the 
> OPENINING state for 30 minutes which will have huge impcat user app which 
> makes use of this table)
> 2013-09-23 18:46:12,160 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
> Instantiated t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd.
> 2013-09-23 18:46:12,160 ERROR 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open 
> of region=t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd., 
> starting to roll back the global memstore size.
> 2013-09-23 18:50:55,284 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
> renew lease for 
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 309 
> seconds.  Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too 
> many open files; Host Details : local host is: 
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at $Proxy13.renewLease(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at $Proxy13.renewLease(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522)
>         at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679)
>         at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
>         at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
>         at 
> org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
>         at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketException: Too many open files
>         at sun.nio.ch.Net.socket0(Native Method)
>         at sun.nio.ch.Net.socket(Net.java:97)
>         at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>         at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>         at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>         at 
> org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1318)
>         ... 16 more
> 2013-09-23 18:50:56,285 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
> renew lease for 
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 310 
> seconds.  Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too 
> many open files; Host Details : local host is: 
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>        at $Proxy13.renewLease(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at $Proxy13.renewLease(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522)
>         at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679)
>         at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
>         at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
>         at 
> org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
>         at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketException: Too many open files
>         at sun.nio.ch.Net.socket0(Native Method)
>         at sun.nio.ch.Net.socket(Net.java:97)
>         at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>         at 
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>         at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>         at 
> org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1318)
>         ... 16 more
> 2013-09-23 18:50:57,287 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to 
> renew lease for 
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 311 
> seconds.  Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too 
> many open files; Host Details : local host is: 
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to