[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Internal Jenkins has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions This is the second patch to address IMPALA-4684. The first patch exposed a transient Zookeeper connection error on RHEL7. This patch introduces a retry (up to 3 times), and somewhat better logging. Tested by running tests against an RHEL7 instance and confirming that all HBase nodes start up. Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Reviewed-on: http://gerrit.cloudera.org:8080/5554 Reviewed-by: Alex BehmTested-by: Internal Jenkins --- M testdata/bin/check-hbase-nodes.py 1 file changed, 26 insertions(+), 8 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Alex Behm has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") > Looks like you missed this? err_msg.format(..., str(e))? Thanks. Cleaned up the logging. -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 143: > Done Looks like you missed this? err_msg.format(..., str(e))? -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 2: Oh, I guess there's one other possible outcome -- NoNodeError: Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master [...] Waiting for HBase node: /hbase/master Waiting for HBase node: /hbase/master Failed while checking for HBase node: /hbase/master Waiting for HBase node: /hbase/rs Waiting for HBase node: /hbase/rs [...] Waiting for HBase node: /hbase/rs Waiting for HBase node: /hbase/rs Failed while checking for HBase node: /hbase/rs Stopping Zookeeper client Could not get one or more nodes. Exiting with errors: 2 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 2: (4 comments) Success output: Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Success: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs Stopping Zookeeper client Success with connection retry output: Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (1 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Success: /hbase/master Waiting for HBase node: /hbase/rs Success: /hbase/rs Stopping Zookeeper client HBase startup scripts succeeded Error output with ConnectionLoss: Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (1 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (2 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Zookeeper connection loss: retrying connection (3 of 3 attempts) Stopping Zookeeper client Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Stopping Zookeeper client Traceback (most recent call last): File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 193, in errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 141, in check_znodes_list_for_errors errors = sum([check_znode(node, zk_client, timeout) for node in nodes]) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode raise ConnectionLoss kazoo.exceptions.ConnectionLoss Random error: Connecting to Zookeeper host(s). Success: Waiting for HBase node: /hbase/master Unexpected error checking HBase node: Stopping Zookeeper client Traceback (most recent call last): File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 188, in errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, args.timeout) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 140, in check_znodes_list_for_errors return sum([check_znode(node, zk_client, timeout) for node in nodes]) File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in check_znode raise RuntimeError RuntimeError http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: PS2, Line 134: errors > error counting seems to be off? You are overwriting it in L140. Perhaps use Actually, I don't think we need to increment errors in the case of exceptions. Thanks for point this out. Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") > Might be worth logging the current connection attempt and the maximum attem Done Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") > Log the exception trace here as well, just incase we need it for debugging? Done Line 145: errors += 1 > I suggest waiting a little before retrying to connect. How about 1s? Done -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: David Knupp Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has uploaded a new patch set (#3). Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions This is the second patch to address IMPALA-4684. The first patch exposed a transient Zookeeper connection error on RHEL7. This patch introduces a retry (up to 3 times), and somewhat better logging. Tested by running tests against an RHEL7 instance and confirming that all HBase nodes start up. Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c --- M testdata/bin/check-hbase-nodes.py 1 file changed, 26 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/3 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
Alex Behm has posted comments on this change. Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py File testdata/bin/check-hbase-nodes.py: Line 143: LOGGER.warn("Zookeeper connection loss: retrying connection") Might be worth logging the current connection attempt and the maximum attempts. (here or elsewhere) Line 145: errors += 1 I suggest waiting a little before retrying to connect. How about 1s? -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David KnuppGerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has uploaded a new patch set (#2). Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions This is the second patch to address IMPALA-4684. The first patch exposed a transient Zookeeper connection error on RHEL7. This patch introduces a retry (up to 3 times), and somewhat better logging. Tested by running tests against an RHEL7 instance and confirming that all HBase nodes start up. Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c --- M testdata/bin/check-hbase-nodes.py 1 file changed, 24 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/2 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp
[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
David Knupp has uploaded a new change for review. http://gerrit.cloudera.org:8080/5554 Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions .. IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions This is the second patch to address IMPALA-4684. The first patch exposed a transient Zookeeper connection error on RHEL7. This patch introduces a retry (up to 3 times), and somewhat better logging. Tested by running tests against an RHEL7 instance. Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c --- M testdata/bin/check-hbase-nodes.py 1 file changed, 24 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/1 -- To view, visit http://gerrit.cloudera.org:8080/5554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp