Great. Thanks Ted.

On Tue, Mar 20, 2012 at 10:09 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Patrick:
> I logged https://issues.apache.org/jira/browse/ZOOKEEPER-1428
>
> If you feel there is anything missing in the JIRA, feel free to add it.
>
> Thanks for your help on this issue.
>
> Cheers
>
> On Tue, Mar 20, 2012 at 9:42 AM, Patrick Hunt <ph...@apache.org> wrote:
>
>> On Tue, Mar 20, 2012 at 9:32 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> > Near term, if we can find out a way for shell script to detect the
>> absence
>> > of particular zookeeper node, rolling-restart.sh can be restored.
>> > Otherwise we may need to remove it.
>>
>> I just tested this out with 3.4, and I see the following for statting
>> a non-existant znode:
>>
>> [zk: (CONNECTED) 1] stat /foobar
>> Node does not exist: /foobar
>>
>> vs statting one that does exist:
>>
>> [zk: (CONNECTED) 2] stat /
>> cZxid = 0x0
>> ctime = Wed Dec 31 16:00:00 PST 1969
>> mZxid = 0x0
>> mtime = Wed Dec 31 16:00:00 PST 1969
>> pZxid = 0x0
>> cversion = -1
>> dataVersion = 0
>> aclVersion = 0
>> ephemeralOwner = 0x0
>> dataLength = 0
>> numChildren = 1
>>
>> You can look for "^Node does not exist" in the stat output instead of
>> checking the exit code. This would get around the problem until a more
>> permanent solution could be found.
>>
>> I hear you re time bound (i'd love to work on this myself). In that
>> case, would you mind creating a jira based on my suggestion of having
>> a new command line tool, give your hbase case as an example and any
>> requirements you might think of. Perhaps Hartmut or one of the other
>> contributors might be interested to work on this.
>> https://issues.apache.org/jira/browse/ZOOKEEPER
>>
>> Patrick
>>
>> >
>> > On Tue, Mar 20, 2012 at 9:16 AM, Patrick Hunt <ph...@apache.org> wrote:
>> >
>> >> On Tue, Mar 20, 2012 at 6:57 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> >> > I looked at the patch for ZOOKEEPER-1059 which should have converted
>> the
>> >> > NPE to KeeperException.NoNodeException
>> >> >
>> >> > Why would 'zkcli stat' command return 0 in case hbase master znode
>> >> expires ?
>> >> >
>> >> > Advice is appreciated.
>> >>
>> >> Hi Ted, sorry to see you're having troubles. I think I see the
>> >> disconnect. ZooKeeperMain is first and foremost a user shell. As such
>> >> it should not exit unless the quit command is run (or killed
>> >> explicitly, etc...). In this case ZOOKEEPER-1059 is fixing a bug in
>> >> the shell. It indeed is converting the NPE into a NoNodeException,
>> >> which the shell then converts into an error message to the user, and
>> >> continues. Prior to this patch the shell was failing on the NPE, which
>> >> then generated the non-0 exit from the process.
>> >>
>> >> Note that trunk has some further improvements along these lines that
>> >> you might also run into at some point in the future (3.5+):
>> >>
>> >> https://issues.apache.org/jira/browse/ZOOKEEPER-271
>> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1391
>> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1307
>> >>
>> >> I think what we need is to have a tool that's intended for use both
>> >> programmatically and by humans, with more strict requirements about
>> >> input, output formatting and command handling, etc... Please see the
>> >> work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps
>> >> we can augment these new classes to also support such a tool. However
>> >> it should instead be a true command line tool, rather than an shell.
>> >> Would you be available to work on this?
>> >>
>> >> Patrick
>> >>
>> >> ps. bigtop is now helping to verify cross project compatibility, it
>> >> would be great if you could introduce some hbase tests  that would
>> >> flag these breakages in future. When bigtop does it's integration (ie
>> >> runs the hbase tests using the corresponding version of zk) it would
>> >> find these problems. We'd catch it much earlier. Thanks!
>> >>
>> >>
>> >> > FYI Jon filed a JIRA for the issue below which is a blocker for HBase
>> >> trunk.
>> >> >
>> >> > On Tue, Mar 20, 2012 at 12:36 AM, Jonathan Hsieh <j...@cloudera.com>
>> >> wrote:
>> >> >
>> >> >> I'm trying to test HBASE-5589 -- to see if I can add an API call to
>> >> >> HMasterInterface and do a rolling-restart / upgrade on a live cluster
>> >> which
>> >> >> lead me down another rabbit hole.
>> >> >>
>> >> >> I'm wondering how rolling-restart.sh script worked in the past (I can
>> >> spend
>> >> >> more time setting up an older version to test this, but figured I'd
>> >> ask).
>> >> >>
>> >> >> I'm getting stuck when the bin/rolling-restart.sh tries to wait until
>> >> the
>> >> >> Master ZNode expires.  In this particular case, the script seems to
>> hang
>> >> >> there forever (even after the /hbase/master ephemeral node expires).
>> >> >>
>> >> >> Here's the code in the script:
>> >> >> ----
>> >> >> # make sure the master znode has been deleted before continuing
>> >> >>    zparent=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
>> >> >> zookeeper.znode.parent`
>> >> >>    if [ "$zparent" == "null" ]; then zparent="/hbase"; fi
>> >> >>    zmaster=`$bin/hbase org.apache.hadoop.hbase.util.HBaseConfTool
>> >> >> zookeeper.znode.master`
>> >> >>    if [ "$zmaster" == "null" ]; then zmaster="master"; fi
>> >> >>    zmaster=$zparent/$zmaster
>> >> >>    echo -n "Waiting for Master ZNode ${zmaster} to expire"
>> >> >>    while bin/hbase zkcli stat $zmaster >/dev/null 2>&1; do
>> >> >>      echo -n "."
>> >> >>      sleep 1
>> >> >>    done
>> >> >>    echo #force a newline
>> >> >> ----
>> >> >>
>> >> >> The problem is that 'bin/hbase zkcli stat /hbase/master ...' seems to
>> >> >> always returns with $? == 0 regardless if the znode is present or not
>> >> >> present!  I've checked with Patrick Hunt (ZK committer) and this the
>> >> >> expected behavior.  The only non-zero retcodes are for abnormal exits
>> >> >> (exceptions thrown)
>> >> >>
>> >> >> Here's the ZK code I was looking through
>> >> >>
>> >> >>
>> >>
>> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeperMain.java#L736
>> >> >>
>> >> >>
>> >> >>
>> >>
>> https://github.com/apache/zookeeper/blob/release-3.4.3/src/java/main/org/apache/zookeeper/ZooKeeper.java#L980
>> >> >>
>> >> >>
>> >> >> Thoughts?
>> >> >>
>> >> >> Jon.
>> >> >>
>> >> >> --
>> >> >> // Jonathan Hsieh (shay)
>> >> >> // Software Engineer, Cloudera
>> >> >> // j...@cloudera.com
>> >> >>
>> >>
>>

Reply via email to