[jira] Commented: (HBASE-2998) rolling-restart.sh shouldn't rely on zoo.cfg

HBase Review Board (JIRA) Thu, 21 Oct 2010 01:58:43 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923373#action_12923373
 ]


HBase Review Board commented on HBASE-2998:
-------------------------------------------

Message from: [email protected]


bq.  On 2010-10-20 17:54:04, Jonathan Gray wrote:
bq.  > 
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java, 
line 103
bq.  > <http://review.cloudera.org/r/1057/diff/1/?file=15040#file15040line103>
bq.  >
bq.  >
bq.  
bq.  Jonathan Gray wrote:
bq.      Not sure where my comment went :)  Just wondering if in the rolling 
restart case, we aren't aborting, right?
bq.      
bq.      Is the stop() now a hard stop and that's why we need to close regions 
first?

Looking into this more --prompted by your comment -- this extra callout to shut 
down user regions in shutdown hook shouldn't be needed.  Something else is 
going on.  I'm digging in.

Right, we are not aborting when rolling restart.  We are intercepting the jvm 
kill signal.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1057/#review1594
-----------------------------------------------------------





> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
>                 Key: HBASE-2998
>                 URL: https://issues.apache.org/jira/browse/HBASE-2998
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.0
>
>         Attachments: 2998.txt
>
>
> I tried the rolling-restart script on our dev environment, which is 
> configured with zoo.cfg for zookeeper, and it worked pretty well. Then I 
> tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered 
> some downtime (no biggie tho, nothing critical was running). When the script 
> calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase 
> configuration files. What happens next if ZK isn't running on the master node 
> is that it receives a ConnectionRefused, ignores it, procedes to restart the 
> master (which waits on the znode), and the starts restarting the region 
> servers. They can't shutdown properly under 60 seconds, since they need a 
> master, so they get killed. What follows is pretty ugly and pretty much 
> requires a whole restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2998) rolling-restart.sh shouldn't rely on zoo.cfg

Reply via email to