[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-16 Thread danielschonfeld
Github user danielschonfeld commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148807939
  
@Parth-Brahmbhatt that's a tricky one.  I haven't found a way to reproduce 
but leaving nimbus work for a day or so with number of nimbuses > 1 and a good 
load on the system we see the number of ZK nodes/keys go up to (X*nimbuses)+1 
under /leader-lock.  When that happens, we have problems trying to do anything 
as no nimbus thinks it's the leader which is exactly what's described in 
CURATOR-202.

If you can think of a way to disconnect the ZK connection but reconnect 
using the same session programmatically you'll have a reproduction of this bug 
as this always starts showing up after something like the following log lines:

```
2015-10-16 18:16:13 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State 
change: RECONNECTED
2015-10-16 18:16:14 o.a.s.s.o.a.z.ClientCnxn [INFO] Client session timed 
out, have not heard from server in 6668ms for sessionid 0x1506caf14ab005f, 
closing socket connection and attempting reconnect
2015-10-16 18:16:14 o.a.s.s.o.a.z.ClientCnxn [INFO] Client session timed 
out, have not heard from server in 6672ms for sessionid 0x1506caf14ab0060, 
closing socket connection and attempting reconnect
2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket 
connection to server 10.101.1.2/10.101.1.2:2181. Will not attempt to 
authenticate using SASL (unknown error)
2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Socket connection 
established to 10.101.1.2/10.101.1.2:2181, initiating session
2015-10-16 18:16:15 o.a.s.s.o.a.z.ClientCnxn [INFO] Session establishment 
complete on server 10.101.1.2/10.101.1.2:2181, sessionid = 0x1506caf14ab005f, 
negotiated timeout = 2
2015-10-16 18:16:15 o.a.s.s.o.a.c.f.s.ConnectionStateManager [INFO] State 
change: RECONNECTED
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-16 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148763402
  
@revans2 The log concerns were from the origin PR that @danielschonfeld 
which he has fixed but I guess he force pushed the branch. I am +1 on this 
change too. 

@danielschonfeld On a side note, can you provide any steps to reproduce 
this locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-16 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148734244
  
@danielschonfeld It looks fine to me.  Personally am +1 on it, but I want 
to hear back from @Parth-Brahmbhatt.  He has a concern about unnecessary log 
statements.  I assume that these are coming directly out of curator itself, so 
I am not really sure how he wants to handle this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-16 Thread danielschonfeld
Github user danielschonfeld commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148731922
  
@revans2 what can I do to make this PR approval ready?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-16 Thread revans2
Github user revans2 commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148728541
  
@danielschonfeld the test failure is because we have too many tests that 
don't use ephemeral ports.  In all likelihood the JDK8 test was running on the 
same box at the same time and got the port before this test could.  JDK8 tends 
to run through tests faster then JDK7 does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-15 Thread danielschonfeld
Github user danielschonfeld commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148527966
  
any idea why the JDK7 test fails? I'm not entirely clear on why DRPC server 
is having problems starting up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-15 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the pull request:

https://github.com/apache/storm/pull/802#issuecomment-148524703
  
lot of unnecessary log statements, can you remove them?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request: [STORM-1115] Stale leader-lock key effectively...

2015-10-15 Thread danielschonfeld
GitHub user danielschonfeld opened a pull request:

https://github.com/apache/storm/pull/802

[STORM-1115] Stale leader-lock key effectively bans all nodes from becoming 
leaders

From the issue:

```
I believe this curator bug is what's in play causing the above described 
situation.
https://issues.apache.org/jira/browse/CURATOR-202
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/schonfeld/storm update-curator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #802


commit 580de48c91ab32a1777f0e375d065d3a01b8e3e9
Author: Michael Schonfeld 
Date:   2015-10-06T17:55:17Z

more logging to nimbusinfo

commit ea2c3a81855dded8c48d946a7017c5ee1ded5e5b
Author: Michael Schonfeld 
Date:   2015-10-09T19:21:52Z

debug logging

commit 25ad78e3c6c47e3da3c6cab86d155274ce5ad561
Author: Michael Schonfeld 
Date:   2015-10-09T19:57:45Z

wrong encasing

commit b7db4434f10b78c6259b82f6b6edaac676900bef
Author: Michael Schonfeld 
Date:   2015-10-09T20:36:53Z

wrap in do()

commit 9ca4d5d4821959b7b9df002fbdc589f6232c37da
Author: Michael Schonfeld 
Date:   2015-10-09T20:39:49Z

missing )

commit 24c4ce0056d40f82f17c15603e5ab83a3b9596ab
Author: Michael Schonfeld 
Date:   2015-10-09T21:15:21Z

more debug

commit 8253f83ece8a63c076a9a5e833e2871728419c83
Author: Michael Schonfeld 
Date:   2015-10-09T21:33:23Z

no need for ()

commit 278206baf7567f270b5bcca41dbeec75c765aaf4
Author: Michael Schonfeld 
Date:   2015-10-09T21:39:05Z

blha

commit 9f3e8d55b9083d59423ac9153f0aff8c912242bb
Author: Michael Schonfeld 
Date:   2015-10-09T23:11:28Z

dont use local var

commit faf9bda82e1c4e1875c87a4920fb2c1c2a33ba56
Author: Daniel Schonfeld 
Date:   2015-10-15T21:11:45Z

update curator to version 2.9.0 to combat stale leader issue as described 
in https://issues.apache.org/jira/browse/CURATOR-202




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---