[ https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923269#action_12923269 ]
HBase Review Board commented on HBASE-2998: ------------------------------------------- Message from: "Jonathan Gray" <jg...@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1057/#review1594 ----------------------------------------------------------- Looking good! trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ShutdownHook.java <http://review.cloudera.org/r/1057/#comment5394> - Jonathan > rolling-restart.sh shouldn't rely on zoo.cfg > -------------------------------------------- > > Key: HBASE-2998 > URL: https://issues.apache.org/jira/browse/HBASE-2998 > Project: HBase > Issue Type: Bug > Reporter: Jean-Daniel Cryans > Assignee: stack > Priority: Critical > Fix For: 0.90.0 > > Attachments: 2998.txt > > > I tried the rolling-restart script on our dev environment, which is > configured with zoo.cfg for zookeeper, and it worked pretty well. Then I > tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered > some downtime (no biggie tho, nothing critical was running). When the script > calls this line: > {code} > bin/hbase zkcli stat $zmaster > {code} > It directly runs a ZooKeeperMain which isn't modified to read from the HBase > configuration files. What happens next if ZK isn't running on the master node > is that it receives a ConnectionRefused, ignores it, procedes to restart the > master (which waits on the znode), and the starts restarting the region > servers. They can't shutdown properly under 60 seconds, since they need a > master, so they get killed. What follows is pretty ugly and pretty much > requires a whole restart. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.