[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Internal Jenkins
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

This is the second patch to address IMPALA-4684. The first patch exposed
a transient Zookeeper connection error on RHEL7. This patch introduces a
retry (up to 3 times), and somewhat better logging.

Tested by running tests against an RHEL7 instance and confirming that
all HBase nodes start up.

Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Reviewed-on: http://gerrit.cloudera.org:8080/5554
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M testdata/bin/check-hbase-nodes.py
1 file changed, 26 insertions(+), 8 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Internal Jenkins


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 143: LOGGER.warn("Zookeeper connection loss: retrying 
connection")
> Looks like you missed this? err_msg.format(..., str(e))?
Thanks. Cleaned up the logging.


-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 143: 
> Done
Looks like you missed this? err_msg.format(..., str(e))?


-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 2:

Oh, I guess there's one other possible outcome -- NoNodeError:

  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Waiting for HBase node: /hbase/master
  [...]
  Waiting for HBase node: /hbase/master
  Waiting for HBase node: /hbase/master
  Failed while checking for HBase node: /hbase/master
  Waiting for HBase node: /hbase/rs
  Waiting for HBase node: /hbase/rs
  [...]
  Waiting for HBase node: /hbase/rs
  Waiting for HBase node: /hbase/rs
  Failed while checking for HBase node: /hbase/rs
  Stopping Zookeeper client
  Could not get one or more nodes. Exiting with errors: 2

-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread David Knupp (Code Review)
David Knupp has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 2:

(4 comments)

Success output:
  Connecting to Zookeeper host(s). 
  Success: 
  Waiting for HBase node: /hbase/master
  Success: /hbase/master
  Waiting for HBase node: /hbase/rs
  Success: /hbase/rs
  Stopping Zookeeper client

Success with connection retry output:
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (1 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Success: /hbase/master
  Waiting for HBase node: /hbase/rs
  Success: /hbase/rs
  Stopping Zookeeper client
  HBase startup scripts succeeded

Error output with ConnectionLoss:
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (1 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (2 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Zookeeper connection loss: retrying connection (3 of 3 attempts)
  Stopping Zookeeper client
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Stopping Zookeeper client
  Traceback (most recent call last):
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 193, in 

  errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, 
args.timeout)
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 141, in 
check_znodes_list_for_errors
  errors = sum([check_znode(node, zk_client, timeout) for node in nodes])
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in 
check_znode
  raise ConnectionLoss
  kazoo.exceptions.ConnectionLoss

Random error:
  Connecting to Zookeeper host(s).
  Success: 
  Waiting for HBase node: /hbase/master
  Unexpected error checking HBase node:
  Stopping Zookeeper client
  Traceback (most recent call last):
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 188, in 

  errors = check_znodes_list_for_errors(args.nodes, args.zookeeper_hosts, 
args.timeout)
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 140, in 
check_znodes_list_for_errors
  return sum([check_znode(node, zk_client, timeout) for node in nodes])
File "/home/dknupp/Impala/testdata/bin/check-hbase-nodes.py", line 112, in 
check_znode
  raise RuntimeError
  RuntimeError

http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

PS2, Line 134: errors
> error counting seems to be off? You are overwriting it in L140. Perhaps use
Actually, I don't think we need to increment errors in the case of exceptions. 
Thanks for point this out.


Line 143: LOGGER.warn("Zookeeper connection loss: retrying 
connection")
> Might be worth logging the current connection attempt and the maximum attem
Done


Line 143: LOGGER.warn("Zookeeper connection loss: retrying 
connection")
> Log the exception trace here as well, just incase we need it for debugging?
Done


Line 145: errors += 1
> I suggest waiting a little before retrying to connect. How about 1s?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: David Knupp 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-21 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#3).

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..

IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

This is the second patch to address IMPALA-4684. The first patch exposed
a transient Zookeeper connection error on RHEL7. This patch introduces a
retry (up to 3 times), and somewhat better logging.

Tested by running tests against an RHEL7 instance and confirming that
all HBase nodes start up.

Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
---
M testdata/bin/check-hbase-nodes.py
1 file changed, 26 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Bharath Vissapragada 


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-20 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5554/2/testdata/bin/check-hbase-nodes.py
File testdata/bin/check-hbase-nodes.py:

Line 143: LOGGER.warn("Zookeeper connection loss: retrying 
connection")
Might be worth logging the current connection attempt and the maximum attempts. 
(here or elsewhere)


Line 145: errors += 1
I suggest waiting a little before retrying to connect. How about 1s?


-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-20 Thread David Knupp (Code Review)
David Knupp has uploaded a new patch set (#2).

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..

IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

This is the second patch to address IMPALA-4684. The first patch exposed
a transient Zookeeper connection error on RHEL7. This patch introduces a
retry (up to 3 times), and somewhat better logging.

Tested by running tests against an RHEL7 instance and confirming that
all HBase nodes start up.

Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
---
M testdata/bin/check-hbase-nodes.py
1 file changed, 24 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp 


[Impala-ASF-CR] IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

2016-12-20 Thread David Knupp (Code Review)
David Knupp has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5554

Change subject: IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions
..

IMPALA-4684: Handle Zookeeper ConnentionLoss exceptions

This is the second patch to address IMPALA-4684. The first patch exposed
a transient Zookeeper connection error on RHEL7. This patch introduces a
retry (up to 3 times), and somewhat better logging.

Tested by running tests against an RHEL7 instance.

Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
---
M testdata/bin/check-hbase-nodes.py
1 file changed, 24 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/5554/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I44b4eec342addcfe489f94c332bbe14225c9968c
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: David Knupp