(Moved this conversation off the vote thread)

On Sat, May 12, 2012 at 3:14 PM, Mikael Sitruk <mikael.sit...@gmail.com> wrote:
> So in case a RS goes down, the master will split the log and reassign the
> regions to other RS, then each RS will replay the log, during this step the
> regions are unavailable, and clients will got exceptions.

To be clear, the log splitting will result in each region having under
it, its own edits only for replay.

> 1. how the master will choose a RS to assign a region?

Random currently.  It picks from the list of currently live RSs.

> 2. how many RS will be involved in this reassignment

All remaining live regionservers.

> 3. client that got exception should renew their connections or they can
> reuse the same one?

When client gets the NotServingRegionException, it goes back to the
.META. to find location of the region.  It'll then retry this
location.   The location maybe the still-down server.  Client will
keep at this until it either timesout or it the address is updated in
.META. with the new location.

> 4. is there a way to figure out how long this split+replay will take
> (either by formula at the design time of a deployment, or at runtime via
> API asking the master for example)???
>

Usually its a factor of how many WAL files the regionserver was
carrying when it went down (You'll see in the logs where we sometimes
force flushes to clear the memstores carrying oldest edits just so we
can clear out old WAL files.  The log roller figures out what needs
flushing.  See 
http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/LogRoller.html#95).
 You can set the max number of WALs a regionserver carries; grep
'hbase.regionserver.maxlogs' (We don't seem to doc this one -- we
should fix that).

St.Ack

Reply via email to