[jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Samir Ahmic (JIRA) Wed, 08 Jan 2014 01:59:54 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Samir Ahmic updated HBASE-7386:
-------------------------------

    Attachment: HBASE-7386-conf-v2.patch
                HBASE-7386-bin-v2.patch

Here is summary of v2 patches:
* Unnecessary comments removed 
* graceful_stop.sh modified to support case when supervisord is used (to reduce 
copy/paste), also script had issue with restoring balancer state that is now 
fixed 
* added  option "clean_znode" in hbase-daemon.sh that calls cleanZNode(). This 
used by zk_cleaner.py listener script.
* added zk_cleaner.py supervisord event listener which removes znode when 
regionserver crash and send mail notification about that event.  Sending email 
is optional
* i have verify that supervisor approach improves master failover in my testing 
this time is ~7s when using supervisor and when using standard scripts it is 
~40s
* since we have 'autorestart=true' in supervisord config if any process fails 
unexpectedly supervisor will restart it automatically 
 
 

> Investigate providing some supervisor support for znode deletion
> ----------------------------------------------------------------
>
>                 Key: HBASE-7386
>                 URL: https://issues.apache.org/jira/browse/HBASE-7386
>             Project: HBase
>          Issue Type: Task
>          Components: master, regionserver, scripts
>            Reporter: Gregory Chanan
>            Assignee: stack
>            Priority: Blocker
>         Attachments: HBASE-7386-bin-v2.patch, HBASE-7386-bin.patch, 
> HBASE-7386-conf-v2.patch, HBASE-7386-conf.patch, HBASE-7386-src.patch, 
> HBASE-7386-v0.patch, supervisordconfigs-v0.patch
>
>
> There a couple of JIRAs for deleting the znode on a process failure:
> HBASE-5844 (RS)
> HBASE-5926 (Master)
> which are pretty neat; on process failure, they delete the znode of the 
> underlying process so HBase can recover faster.
> These JIRAs were implemented via the startup scripts; i.e. the script hangs 
> around and waits for the process to exit, then deletes the znode.
> There are a few problems associated with this approach, as listed in the 
> below JIRAs:
> 1) Hides startup output in script
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 2) two hbase processes listed per launched daemon
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 3) Not run by a real supervisor
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463409&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463409
> 4) Weird output after kill -9 actual process in standalone mode
> https://issues.apache.org/jira/browse/HBASE-5926?focusedCommentId=13506801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506801
> 5) Can kill existing RS if called again
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13463401&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13463401
> 6) Hides stdout/stderr[6]
> https://issues.apache.org/jira/browse/HBASE-5844?focusedCommentId=13506832&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13506832
> I suspect running in via something like supervisor.d can solve these issues 
> if we provide the right support.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-7386) Investigate providing some supervisor support for znode deletion

Reply via email to