[
https://issues.apache.org/jira/browse/HBASE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922854#action_12922854
]
stack commented on HBASE-2998:
------------------------------
So, ZKMain takes a -server argument which is host+port. I was thinking of
doing something like the following in hbase so when passed zkcli we actually
passed in the -server argument with the host+port read from hbase Configuration
(or from zoo.cfg if present):
{code}
Index: bin/hbase
===================================================================
--- bin/hbase (revision 1024523)
+++ bin/hbase (working copy)
@@ -262,7 +262,8 @@
HBASE_OPTS="$HBASE_OPTS $HBASE_ZOOKEEPER_OPTS"
fi
elif [ "$COMMAND" = "zkcli" ] ; then
- CLASS='org.apache.zookeeper.ZooKeeperMain'
+ SERVERPORT=`"$bin"/hbase org.apache.hadoop.hbase.zookeeper.ZKServerTool
-hostport | grep '^ZK hostport:' | sed 's,^ZK hostport:,,'`
+ CLASS='org.apache.zookeeper.ZooKeeperMain -server ${SERVERPORT}'
elif [ "$COMMAND" = "classpath" ] ; then
echo $CLASSPATH
exit 0
{code}
I need to mangle the ZKServerTool some to output what I want.
> rolling-restart.sh shouldn't rely on zoo.cfg
> --------------------------------------------
>
> Key: HBASE-2998
> URL: https://issues.apache.org/jira/browse/HBASE-2998
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.90.0
>
>
> I tried the rolling-restart script on our dev environment, which is
> configured with zoo.cfg for zookeeper, and it worked pretty well. Then I
> tried it on our MR cluster, which doesn't have a zoo.cfg, and we suffered
> some downtime (no biggie tho, nothing critical was running). When the script
> calls this line:
> {code}
> bin/hbase zkcli stat $zmaster
> {code}
> It directly runs a ZooKeeperMain which isn't modified to read from the HBase
> configuration files. What happens next if ZK isn't running on the master node
> is that it receives a ConnectionRefused, ignores it, procedes to restart the
> master (which waits on the znode), and the starts restarting the region
> servers. They can't shutdown properly under 60 seconds, since they need a
> master, so they get killed. What follows is pretty ugly and pretty much
> requires a whole restart.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.