subject:"Where is HBase failed servers list stored"

Re: Where is HBase failed servers list stored

2015-03-05 Thread Bryan Beaudreault

You should run with a backup master in a production cluster. The failover
process works very well and will cause no downtime. I've done it literally
hundreds of times across our multiple production hbase clusters.

Even if you don't have a backup master, you should still be fine with
restarting the master. It can handle a brief blip without any problems,
from what I've seen. The master is really only used for coordination such
as region moves, RS failovers, etc. Your clients can still retrieve data
from your regionservers, as long as no servers die in the brief moment you
are masterless.

On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
wrote:

Since ours is production cluster we cant restart master.
In our test cluster I tested this scenario, and it got resolved after
restarting master.
Other than restarting master I couldn't find any solution.
Thanks,Sandeep.

From: nkey...@gmail.com
Date: Wed, 4 Mar 2015 14:55:03 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org

If I understand the issue correctly, restarting the master should solve
the
problem.

On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:

Please see HBASE-13067 Fix caching of stubs to allow IP address
changes of
restarted remote servers

Cheers

On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com
wrote:

Hi nkeywal,
While trying to get more details about this issue I got to know that
HMaster is trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address of
regionsserver then updated new IP Address in /etc/hosts file
across all
HBase servers. I started RegionServer from master with start-hbase.sh
scripts jps output in regionserver shows it's(regionserver
process) up
and running.
But when running hbase balancer HMaster is trying to connect to old
IP
Address instead of new IP Address.
One more thing here is when I checked regionserver status on 60010
port
its showing as up and running.
Thanks,Sandeep.

From: nkey...@gmail.com
Date: Tue, 3 Mar 2015 19:01:01 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org

It's in local memory. When HBase cannot connect to a server, it
puts it
into the failedServerList for 2 seconds. This is to avoid having
all
the
threads going into a potentially long socket timeout. Are you sure
that
you
can connect from the master to this machine/port?

You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
should
not
help.

You should have another exception before this one in the logs (the
one
that
initially put this region server in this failedServerList).

On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L
sandeepvre...@outlook.com
wrote:

Hi,
While trying to run hbase balancer I am getting error message as
This
server is in the failed servers list.Due to this cluster is not
getting
balanced.
Even though regionserver is up and running hmaster is unable to
connect to
it.
The odd thing here is hmaster is able to start regionserver and
it is
detected as up and running but unable to assign regions.
Can some one suggest any solution for this.
Following is full stack

trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This
server is in the failed servers list: host1/192.168.2.20:60020
at

org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
at

org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at
org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at

org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at

org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at

org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
at

org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
at

org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
at

org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
at

org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
at

org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
at

org.apache.hadoop.hbase.master.AssignmentManager.handleRegion

Re: Where is HBase failed servers list stored

2015-03-05 Thread Nicolas Liochon

As Bryan.
Le 5 mars 2015 17:55, Bryan Beaudreault bbeaudrea...@hubspot.com a
écrit :

On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy sandeepvre...@outlook.com
wrote:

From: nkey...@gmail.com
Date: Wed, 4 Mar 2015 14:55:03 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org

If I understand the issue correctly, restarting the master should solve
the
problem.

On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu yuzhih...@gmail.com wrote:

Please see HBASE-13067 Fix caching of stubs to allow IP address
changes of
restarted remote servers

Cheers

On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L sandeepvre...@outlook.com

wrote:

Hi nkeywal,
While trying to get more details about this issue I got to know
that
HMaster is trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address
of
regionsserver then updated new IP Address in /etc/hosts file
across all
HBase servers. I started RegionServer from master with
start-hbase.sh
scripts jps output in regionserver shows it's(regionserver
process) up
and running.
But when running hbase balancer HMaster is trying to connect to old
IP
Address instead of new IP Address.
One more thing here is when I checked regionserver status on 60010
port
its showing as up and running.
Thanks,Sandeep.

From: nkey...@gmail.com
Date: Tue, 3 Mar 2015 19:01:01 +0100
Subject: Re: Where is HBase failed servers list stored
To: user@hbase.apache.org

It's in local memory. When HBase cannot connect to a server, it
puts it
into the failedServerList for 2 seconds. This is to avoid
having
all
the
threads going into a potentially long socket timeout. Are you
sure
that
you
can connect from the master to this machine/port?

You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
should
not
help.

You should have another exception before this one in the logs
(the
one
that
initially put this region server in this failedServerList).

On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L
sandeepvre...@outlook.com
wrote:

Hi,
While trying to run hbase balancer I am getting error message
as
This
server is in the failed servers list.Due to this cluster is
not
getting
balanced.
Even though regionserver is up and running hmaster is unable to
connect to
it.
The odd thing here is hmaster is able to start regionserver and
it is
detected as up and running but unable to assign regions.
Can some one suggest any solution for this.
Following is full stack