David Knupp has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions ......................................................................
Patch Set 2: (4 comments) Success output: Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7fc349f3c210> Waiting for HBase node: /hbase/master Success: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs Stopping Zookeeper client Success with connection retry output: Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f0840813210> Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (1 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f084371ad50> Waiting for HBase node: /hbase/master Success: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs Stopping Zookeeper client HBase startup scripts succeeded Error output with ConnectionLoss: Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f66198f7210> Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (1 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f661c7fdd50> Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (2 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f661929fcd0> Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (3 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f66192ac490> Waiting for HBase node: /hbase/master Stopping Zookeeper client Traceback (most recent call last): File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 193, in <module> errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 141, in check_znodes_list_for_errors errors = sum([check_znode(node, zk_client, timeout) for node in nodes]) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode raise ConnectionLoss kazoo.exceptions.ConnectionLoss Random error: Connecting to Zookeeper host(s). Success: <kazoo.client.KazooClient object at 0x7f703e3bc210> Waiting for HBase node: /hbase/master Unexpected error checking HBase node: Stopping Zookeeper client Traceback (most recent call last): File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 188, in <module> errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 140, in check_znodes_list_for_errors return sum([check_znode(node, zk_client, timeout) for node in nodes]) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode raise RuntimeError RuntimeError http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS2, Line 134: errors > error counting seems to be off? You are overwriting it in L140. Perhaps use Actually, I don't think we need to increment errors in the case of exceptions. Thanks for point this out. Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") > Might be worth logging the current connection attempt and the maximum attem Done Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") > Log the exception trace here as well, just incase we need it for debugging? Done Line 145: errors += 1 > I suggest waiting a little before retrying to connect. How about 1s? Done -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp <dkn...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: David Knupp <dkn...@cloudera.com> Gerrit-HasComments: Yes