Re: Where is HBase failed servers list stored

2015-03-05 Thread Bryan Beaudreault
You should run with a backup master in a production cluster.  The failover
process works very well and will cause no downtime.  I've done it literally
hundreds of times across our multiple production hbase clusters.

Even if you don't have a backup master, you should still be fine with
restarting the master.  It can handle a brief blip without any problems,
from what I've seen.  The master is really only used for coordination such
as region moves, RS failovers, etc.  Your clients can still retrieve data
from your regionservers, as long as no servers die in the brief moment you
are masterless.

On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
wrote:

 Since ours is production cluster we cant restart master.
 In our test cluster I tested this scenario, and it got resolved after
 restarting master.
 Other than restarting master I couldn't find any solution.
 Thanks,Sandeep.

  From: nkey...@gmail.com
  Date: Wed, 4 Mar 2015 14:55:03 +0100
  Subject: Re: Where is HBase failed servers list stored
  To: user@hbase.apache.org
 
  If I understand the issue correctly, restarting the master should solve
 the
  problem.
 
  On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
 
   Please see HBASE-13067 Fix caching of stubs to allow IP address
 changes of
   restarted remote servers
  
   Cheers
  
   On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
   wrote:
  
Hi nkeywal,
While trying to get more details about this issue I got to know that
HMaster is trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address of
regionsserver  then updated new IP Address in /etc/hosts file
 across all
HBase servers. I started RegionServer from master with start-hbase.sh
scripts  jps output in regionserver shows it's(regionserver
 process) up
and running.
But when running hbase balancer HMaster is trying to connect to old
 IP
Address instead of new IP Address.
One more thing here is when I checked regionserver status on 60010
 port
its showing as up and running.
Thanks,Sandeep.
   
 From: nkey...@gmail.com
 Date: Tue, 3 Mar 2015 19:01:01 +0100
 Subject: Re: Where is HBase failed servers list stored
 To: user@hbase.apache.org

 It's in local memory. When HBase cannot connect to a server, it
 puts it
 into the failedServerList for 2 seconds. This is to avoid having
 all
the
 threads going into a potentially long socket timeout. Are you sure
 that
you
 can connect from the master to this machine/port?

 You can change the time it stays in the list with
 hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
 should
not
 help.

 You should have another exception before this one in the logs (the
 one
that
 initially put this region server in this failedServerList).

 On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L 
 sandeepvre...@outlook.com
 wrote:

  Hi,
  While trying to run hbase balancer I am getting error message as
   This
  server is in the failed servers list.Due to this cluster is not
getting
  balanced.
  Even though regionserver is up and running hmaster is unable to
connect to
  it.
  The odd thing here is hmaster is able to start regionserver and
 it is
  detected as up and running but unable to assign regions.
  Can some one suggest any solution for this.
  Following is full stack
 
 trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
   This
  server is in the failed servers list: host1/192.168.2.20:60020
 at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
  at
 
   
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
   at
 org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
  at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at
 
   
  
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
   at
 
   
  
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
  at
 
   
  
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
  at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
  at
 
   
  
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
 at
 
   
  
 org.apache.hadoop.hbase.master.AssignmentManager.handleRegion

Re: Where is HBase failed servers list stored

2015-03-05 Thread Nicolas Liochon
As Bryan.
Le 5 mars 2015 17:55, Bryan Beaudreault bbeaudrea...@hubspot.com a
écrit :

 You should run with a backup master in a production cluster.  The failover
 process works very well and will cause no downtime.  I've done it literally
 hundreds of times across our multiple production hbase clusters.

 Even if you don't have a backup master, you should still be fine with
 restarting the master.  It can handle a brief blip without any problems,
 from what I've seen.  The master is really only used for coordination such
 as region moves, RS failovers, etc.  Your clients can still retrieve data
 from your regionservers, as long as no servers die in the brief moment you
 are masterless.

 On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
 wrote:

  Since ours is production cluster we cant restart master.
  In our test cluster I tested this scenario, and it got resolved after
  restarting master.
  Other than restarting master I couldn't find any solution.
  Thanks,Sandeep.
 
   From: nkey...@gmail.com
   Date: Wed, 4 Mar 2015 14:55:03 +0100
   Subject: Re: Where is HBase failed servers list stored
   To: user@hbase.apache.org
  
   If I understand the issue correctly, restarting the master should solve
  the
   problem.
  
   On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
  
Please see HBASE-13067 Fix caching of stubs to allow IP address
  changes of
restarted remote servers
   
Cheers
   
On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
 
wrote:
   
 Hi nkeywal,
 While trying to get more details about this issue I got to know
 that
 HMaster is trying to connect to wrong IP Address.
 Here is exact issue:
 Due to some unavoidable reason we are forced to change IP Address
 of
 regionsserver  then updated new IP Address in /etc/hosts file
  across all
 HBase servers. I started RegionServer from master with
 start-hbase.sh
 scripts  jps output in regionserver shows it's(regionserver
  process) up
 and running.
 But when running hbase balancer HMaster is trying to connect to old
  IP
 Address instead of new IP Address.
 One more thing here is when I checked regionserver status on 60010
  port
 its showing as up and running.
 Thanks,Sandeep.

  From: nkey...@gmail.com
  Date: Tue, 3 Mar 2015 19:01:01 +0100
  Subject: Re: Where is HBase failed servers list stored
  To: user@hbase.apache.org
 
  It's in local memory. When HBase cannot connect to a server, it
  puts it
  into the failedServerList for 2 seconds. This is to avoid
 having
  all
 the
  threads going into a potentially long socket timeout. Are you
 sure
  that
 you
  can connect from the master to this machine/port?
 
  You can change the time it stays in the list with
  hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
  should
 not
  help.
 
  You should have another exception before this one in the logs
 (the
  one
 that
  initially put this region server in this failedServerList).
 
  On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L 
  sandeepvre...@outlook.com
  wrote:
 
   Hi,
   While trying to run hbase balancer I am getting error message
 as
This
   server is in the failed servers list.Due to this cluster is
 not
 getting
   balanced.
   Even though regionserver is up and running hmaster is unable to
 connect to
   it.
   The odd thing here is hmaster is able to start regionserver and
  it is
   detected as up and running but unable to assign regions.
   Can some one suggest any solution for this.
   Following is full stack
  
  trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This
   server is in the failed servers list: host1/192.168.2.20:60020
  at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
   at
  

  org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at
  org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
   at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
 at
  

   
 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at
  

   
 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
   at
  

   
 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
   at
  

   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
   at
  

   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577

RE: Where is HBase failed servers list stored

2015-03-05 Thread Sandeep Reddy
Since ours is production cluster we cant restart master.
In our test cluster I tested this scenario, and it got resolved after 
restarting master.
Other than restarting master I couldn't find any solution.
Thanks,Sandeep.

 From: nkey...@gmail.com
 Date: Wed, 4 Mar 2015 14:55:03 +0100
 Subject: Re: Where is HBase failed servers list stored
 To: user@hbase.apache.org
 
 If I understand the issue correctly, restarting the master should solve the
 problem.
 
 On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
  restarted remote servers
 
  Cheers
 
  On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
  wrote:
 
   Hi nkeywal,
   While trying to get more details about this issue I got to know that
   HMaster is trying to connect to wrong IP Address.
   Here is exact issue:
   Due to some unavoidable reason we are forced to change IP Address of
   regionsserver  then updated new IP Address in /etc/hosts file across all
   HBase servers. I started RegionServer from master with start-hbase.sh
   scripts  jps output in regionserver shows it's(regionserver process) up
   and running.
   But when running hbase balancer HMaster is trying to connect to old IP
   Address instead of new IP Address.
   One more thing here is when I checked regionserver status on 60010 port
   its showing as up and running.
   Thanks,Sandeep.
  
From: nkey...@gmail.com
Date: Tue, 3 Mar 2015 19:01:01 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org
   
It's in local memory. When HBase cannot connect to a server, it puts it
into the failedServerList for 2 seconds. This is to avoid having all
   the
threads going into a potentially long socket timeout. Are you sure that
   you
can connect from the master to this machine/port?
   
You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
   not
help.
   
You should have another exception before this one in the logs (the one
   that
initially put this region server in this failedServerList).
   
On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
wrote:
   
 Hi,
 While trying to run hbase balancer I am getting error message as
  This
 server is in the failed servers list.Due to this cluster is not
   getting
 balanced.
 Even though regionserver is up and running hmaster is unable to
   connect to
 it.
 The odd thing here is hmaster is able to start regionserver and it is
 detected as up and running but unable to assign regions.
 Can some one suggest any solution for this.
 Following is full stack
 trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
  This
 server is in the failed servers list: host1/192.168.2.20:60020  at

  
  org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
 at

   org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
 at

  
  org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
   at

  
  org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
  at

  
  org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
 at

  
  org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
 at

  
  org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
at

  
  org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
   at

  
  org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
 at

  
  org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
 at
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at

  
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

  
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Thanks,Sandeep.
  
  
 
  

Re: Where is HBase failed servers list stored

2015-03-04 Thread Nicolas Liochon
If I understand the issue correctly, restarting the master should solve the
problem.

On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:

 Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
 restarted remote servers

 Cheers

 On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
 wrote:

  Hi nkeywal,
  While trying to get more details about this issue I got to know that
  HMaster is trying to connect to wrong IP Address.
  Here is exact issue:
  Due to some unavoidable reason we are forced to change IP Address of
  regionsserver  then updated new IP Address in /etc/hosts file across all
  HBase servers. I started RegionServer from master with start-hbase.sh
  scripts  jps output in regionserver shows it's(regionserver process) up
  and running.
  But when running hbase balancer HMaster is trying to connect to old IP
  Address instead of new IP Address.
  One more thing here is when I checked regionserver status on 60010 port
  its showing as up and running.
  Thanks,Sandeep.
 
   From: nkey...@gmail.com
   Date: Tue, 3 Mar 2015 19:01:01 +0100
   Subject: Re: Where is HBase failed servers list stored
   To: user@hbase.apache.org
  
   It's in local memory. When HBase cannot connect to a server, it puts it
   into the failedServerList for 2 seconds. This is to avoid having all
  the
   threads going into a potentially long socket timeout. Are you sure that
  you
   can connect from the master to this machine/port?
  
   You can change the time it stays in the list with
   hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
  not
   help.
  
   You should have another exception before this one in the logs (the one
  that
   initially put this region server in this failedServerList).
  
   On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
   wrote:
  
Hi,
While trying to run hbase balancer I am getting error message as
 This
server is in the failed servers list.Due to this cluster is not
  getting
balanced.
Even though regionserver is up and running hmaster is unable to
  connect to
it.
The odd thing here is hmaster is able to start regionserver and it is
detected as up and running but unable to assign regions.
Can some one suggest any solution for this.
Following is full stack
trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
 This
server is in the failed servers list: host1/192.168.2.20:60020  at
   
 
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
at
   
  org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at
   
 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
  at
   
 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
 at
   
 
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
at
   
 
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
at
   
 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
   at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
  at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
at
   
 
 org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
at
  java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at
   
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
   
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Thanks,Sandeep.
 
 



Where is HBase failed servers list stored

2015-03-03 Thread Sandeep L
Hi,
While trying to run hbase balancer I am getting error message as This server 
is in the failed servers list.Due to this cluster is not getting balanced.
Even though regionserver is up and running hmaster is unable to connect to it.
The odd thing here is hmaster is able to start regionserver and it is detected 
as up and running but unable to assign regions.
Can some one suggest any solution for this.
Following is full stack 
trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server 
is in the failed servers list: host1/192.168.2.20:60020  at 
org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
  at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)   
  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)  at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)   
 at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
   at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
  at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
  at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
 at 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
 at 
org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
at 
org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
  at 
org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:745)
Thanks,Sandeep.   

Re: Where is HBase failed servers list stored

2015-03-03 Thread Nicolas Liochon
It's in local memory. When HBase cannot connect to a server, it puts it
into the failedServerList for 2 seconds. This is to avoid having all the
threads going into a potentially long socket timeout. Are you sure that you
can connect from the master to this machine/port?

You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not
help.

You should have another exception before this one in the logs (the one that
initially put this region server in this failedServerList).

On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
wrote:

 Hi,
 While trying to run hbase balancer I am getting error message as This
 server is in the failed servers list.Due to this cluster is not getting
 balanced.
 Even though regionserver is up and running hmaster is unable to connect to
 it.
 The odd thing here is hmaster is able to start regionserver and it is
 detected as up and running but unable to assign regions.
 Can some one suggest any solution for this.
 Following is full stack
 trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
 server is in the failed servers list: host1/192.168.2.20:60020  at
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
 at
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)  at
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
   at
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
  at
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
 at
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
 at
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
 at
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
 at
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
 at
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
at
 org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
   at
 org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
 at
 org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Thanks,Sandeep.


RE: Where is HBase failed servers list stored

2015-03-03 Thread Sandeep L
Hi nkeywal,
While trying to get more details about this issue I got to know that HMaster is 
trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address of 
regionsserver  then updated new IP Address in /etc/hosts file across all HBase 
servers. I started RegionServer from master with start-hbase.sh scripts  jps 
output in regionserver shows it's(regionserver process) up and running.
But when running hbase balancer HMaster is trying to connect to old IP Address 
instead of new IP Address.
One more thing here is when I checked regionserver status on 60010 port its 
showing as up and running. 
Thanks,Sandeep.

 From: nkey...@gmail.com
 Date: Tue, 3 Mar 2015 19:01:01 +0100
 Subject: Re: Where is HBase failed servers list stored
 To: user@hbase.apache.org
 
 It's in local memory. When HBase cannot connect to a server, it puts it
 into the failedServerList for 2 seconds. This is to avoid having all the
 threads going into a potentially long socket timeout. Are you sure that you
 can connect from the master to this machine/port?
 
 You can change the time it stays in the list with
 hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not
 help.
 
 You should have another exception before this one in the logs (the one that
 initially put this region server in this failedServerList).
 
 On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
 wrote:
 
  Hi,
  While trying to run hbase balancer I am getting error message as This
  server is in the failed servers list.Due to this cluster is not getting
  balanced.
  Even though regionserver is up and running hmaster is unable to connect to
  it.
  The odd thing here is hmaster is able to start regionserver and it is
  detected as up and running but unable to assign regions.
  Can some one suggest any solution for this.
  Following is full stack
  trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
  server is in the failed servers list: host1/192.168.2.20:60020  at
  org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
  at
  org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)  at
  org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at
  org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
   at
  org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
  at
  org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
  at
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
  at
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
  at
  org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
  at
  org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
 at
  org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
at
  org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
  at
  org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
  Thanks,Sandeep.
  

Re: Where is HBase failed servers list stored

2015-03-03 Thread Ted Yu
Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
restarted remote servers

Cheers

On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com wrote:

 Hi nkeywal,
 While trying to get more details about this issue I got to know that
 HMaster is trying to connect to wrong IP Address.
 Here is exact issue:
 Due to some unavoidable reason we are forced to change IP Address of
 regionsserver  then updated new IP Address in /etc/hosts file across all
 HBase servers. I started RegionServer from master with start-hbase.sh
 scripts  jps output in regionserver shows it's(regionserver process) up
 and running.
 But when running hbase balancer HMaster is trying to connect to old IP
 Address instead of new IP Address.
 One more thing here is when I checked regionserver status on 60010 port
 its showing as up and running.
 Thanks,Sandeep.

  From: nkey...@gmail.com
  Date: Tue, 3 Mar 2015 19:01:01 +0100
  Subject: Re: Where is HBase failed servers list stored
  To: user@hbase.apache.org
 
  It's in local memory. When HBase cannot connect to a server, it puts it
  into the failedServerList for 2 seconds. This is to avoid having all
 the
  threads going into a potentially long socket timeout. Are you sure that
 you
  can connect from the master to this machine/port?
 
  You can change the time it stays in the list with
  hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
 not
  help.
 
  You should have another exception before this one in the logs (the one
 that
  initially put this region server in this failedServerList).
 
  On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L sandeepvre...@outlook.com
  wrote:
 
   Hi,
   While trying to run hbase balancer I am getting error message as This
   server is in the failed servers list.Due to this cluster is not
 getting
   balanced.
   Even though regionserver is up and running hmaster is unable to
 connect to
   it.
   The odd thing here is hmaster is able to start regionserver and it is
   detected as up and running but unable to assign regions.
   Can some one suggest any solution for this.
   Following is full stack
   trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
   server is in the failed servers list: host1/192.168.2.20:60020  at
  
 org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
   at
  
 org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
   at
  
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
 at
  
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at
  
 org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
   at
  
 org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
   at
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
   at
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
   at
  
 org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
   at
  
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
  at
  
 org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
 at
  
 org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
   at
  
 org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
   at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Thanks,Sandeep.