[ 
https://issues.apache.org/jira/browse/HBASE-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114991#comment-13114991
 ] 

Aravind Gottipati commented on HBASE-4298:
------------------------------------------

@Ted: Thank you for the review.  I made some changes and updated my patch (in 
github).  Notes in line.

* I think we only need to log the number of draining servers.
* The javadoc should state that keys of the map are region servers.
* For DrainingServerTracker.java, please remove year.
* For nodeChildrenChanged(), please change the sentence for catch black of 
IOException, it mentioned zk exception.
* Should remove the createNode rather than just comment it out.
* serverManager methods is different between add() and remove(): one inside 
synchronized block, one outside.
* I think a better name maybe "zookeeper.znode.draining.rs"

- I agree with all of these and they are all fixed in my latest code push (on 
github).

* I wonder if Map is needed for drainingServers because it is private and 
getDrainingServersList() only returns the keySet.
- The map isn't required, but I followed the example of onlineServers and 
serverConnections.  For code in trunk, I have changed it to a ArrayList.  A 
similar change does not work (easily) in the 0.90 branch.  Code in 
AssignmentManager uses HServerInfo in 0.90, and changing drainingServers to an 
array list will mean key lookups etc.  I have left it as a Map in 0.90, but I 
changed it to a list in trunk.

* removeServerFromDrainList / addServerToDrainList should return a boolean.
- The remove and add methods are called from DrainingServerTracker.  The 
context is a ZK callback, and the corresponding remove and add functions there 
simply return voids.  I changed the code to return booleans in trunk, but left 
it as void in the 0.90 branch. I figured they might actually be used in trunk, 
but I doubt they will be in 0.90.

* Unit tests..
- I will work with Stack and get the tests to you.

* Share your experience from using this in your environment.
- To reboot the cluster, we currently drain one server at a time (using the 
graceful stop shell script).  This process takes forever to go through all the 
servers.  The goal here is to enable us to drain multiple servers 
simultaneously.  Doing this by keeping track of servers externally makes the 
programming painful, and we'd have to share state somehow between different 
scripts that all aim to drain different servers.  Leaving this list in ZK and 
having HBase keep them from getting new regions seems like the right way to go 
about it.  I have tested this in a test cluster of about 14 servers.  This code 
by itself only solves one part of our problem.  The rest of it will be solved 
by command line scripts that will create nodes to be shut down under 
/draining/rs in ZK, and then move regions out from them.

Please let me know if you have any other questions about this stuff.

                
> Support to drain RS nodes through ZK
> ------------------------------------
>
>                 Key: HBASE-4298
>                 URL: https://issues.apache.org/jira/browse/HBASE-4298
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.4
>         Environment: all
>            Reporter: Aravind Gottipati
>            Priority: Critical
>              Labels: patch
>             Fix For: 0.92.0, 0.90.5
>
>
> HDFS currently has a way to exclude certain datanodes and prevent them from 
> getting new blocks.  HDFS goes one step further and even drains these nodes 
> for you.  This enhancement is a step in that direction.
> The idea is that we mark nodes in zookeeper as draining nodes.  This means 
> that they don't get any more new regions.  These draining nodes look exactly 
> the same as the corresponding nodes in /rs, except they live under /draining.
> Eventually, support for draining them can be added.  I am submitting two 
> patches for review - one for the 0.90 branch and one for trunk (in git).
> Here are the two patches
> 0.90 - 
> https://github.com/aravind/hbase/commit/181041e72e7ffe6a4da6d82b431ef7f8c99e62d2
> trunk - 
> https://github.com/aravind/hbase/commit/e127b25ae3b4034103b185d8380f3b7267bc67d5
> I have tested both these patches and they work as advertised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to