[jira] [Commented] (ZOOKEEPER-2807) Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278053#comment-16278053
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2807:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/300#discussion_r154851878
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -240,84 +240,14 @@ public void run() {
 }
 
 // Process committed head
-if ((request = committedRequests.poll()) == null) {
-throw new IOException("Error: committed head is 
null");
-}
-
-/*
- * Check if request is pending, if so, update it with 
the committed info
- */
-LinkedList sessionQueue = pendingRequests
-.get(request.sessionId);
-if (sessionQueue != null) {
-// If session queue != null, then it is also not 
empty.
-Request topPending = sessionQueue.poll();
-if (request.cxid != topPending.cxid) {
-/*
- * TL;DR - we should not encounter this 
scenario often under normal load.
- * We pass the commit to the next processor 
and put the pending back with a warning.
- *
- * Generally, we can get commit requests that 
are not at the queue head after
- * a session moved (see ZOOKEEPER-2684). Let's 
denote the previous server of the session
- * with A, and the server that the session 
moved to with B (keep in mind that it is
- * possible that the session already moved 
from B to a new server C, and maybe C=A).
- * 1. If request.cxid < topPending.cxid : this 
means that the session requested this update
- * from A, then moved to B (i.e., which is 
us), and now B receives the commit
- * for the update after the session already 
performed several operations in B
- * (and therefore its cxid is higher than that 
old request).
- * 2. If request.cxid > topPending.cxid : this 
means that the session requested an updated
- * from B with cxid that is bigger than the 
one we know therefore in this case we
- * are A, and we lost the connection to the 
session. Given that we are waiting for a commit
- * for that update, it means that we already 
sent the request to the leader and it will
- * be committed at some point (in this case 
the order of cxid won't follow zxid, since zxid
- * is an increasing order). It is not safe for 
us to delete the session's queue at this
- * point, since it is possible that the 
session has newer requests in it after it moved
- * back to us. We just leave the queue as it 
is, and once the commit arrives (for the old
- * request), the finalRequestProcessor will 
see a closed cnxn handle, and just won't send a
- * response.
- * Also note that we don't have a local 
session, therefore we treat the request
- * like any other commit for a remote request, 
i.e., we perform the update without sending
- * a response.
- */
-LOG.warn("Got request " + request +
-" but we are expecting request " + 
topPending);
-sessionQueue.addFirst(topPending);
-} else {
-/*
- * Generally, we want to send to the next 
processor our version of the request,
- * since it contains the session information 
that is needed for post update processing.
- * In more details, when a request is in the 
local queue, there is (or could be) a client
- * attached to this server waiting for a 
response, and there is other bookkeeping of
- * requests that are outstanding and have 
originated from this server
- * (e.g., for setting 

[GitHub] zookeeper pull request #300: ZOOKEEPER-2807: Flaky test: org.apache.zookeepe...

2017-12-04 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/300#discussion_r154851878
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -240,84 +240,14 @@ public void run() {
 }
 
 // Process committed head
-if ((request = committedRequests.poll()) == null) {
-throw new IOException("Error: committed head is 
null");
-}
-
-/*
- * Check if request is pending, if so, update it with 
the committed info
- */
-LinkedList sessionQueue = pendingRequests
-.get(request.sessionId);
-if (sessionQueue != null) {
-// If session queue != null, then it is also not 
empty.
-Request topPending = sessionQueue.poll();
-if (request.cxid != topPending.cxid) {
-/*
- * TL;DR - we should not encounter this 
scenario often under normal load.
- * We pass the commit to the next processor 
and put the pending back with a warning.
- *
- * Generally, we can get commit requests that 
are not at the queue head after
- * a session moved (see ZOOKEEPER-2684). Let's 
denote the previous server of the session
- * with A, and the server that the session 
moved to with B (keep in mind that it is
- * possible that the session already moved 
from B to a new server C, and maybe C=A).
- * 1. If request.cxid < topPending.cxid : this 
means that the session requested this update
- * from A, then moved to B (i.e., which is 
us), and now B receives the commit
- * for the update after the session already 
performed several operations in B
- * (and therefore its cxid is higher than that 
old request).
- * 2. If request.cxid > topPending.cxid : this 
means that the session requested an updated
- * from B with cxid that is bigger than the 
one we know therefore in this case we
- * are A, and we lost the connection to the 
session. Given that we are waiting for a commit
- * for that update, it means that we already 
sent the request to the leader and it will
- * be committed at some point (in this case 
the order of cxid won't follow zxid, since zxid
- * is an increasing order). It is not safe for 
us to delete the session's queue at this
- * point, since it is possible that the 
session has newer requests in it after it moved
- * back to us. We just leave the queue as it 
is, and once the commit arrives (for the old
- * request), the finalRequestProcessor will 
see a closed cnxn handle, and just won't send a
- * response.
- * Also note that we don't have a local 
session, therefore we treat the request
- * like any other commit for a remote request, 
i.e., we perform the update without sending
- * a response.
- */
-LOG.warn("Got request " + request +
-" but we are expecting request " + 
topPending);
-sessionQueue.addFirst(topPending);
-} else {
-/*
- * Generally, we want to send to the next 
processor our version of the request,
- * since it contains the session information 
that is needed for post update processing.
- * In more details, when a request is in the 
local queue, there is (or could be) a client
- * attached to this server waiting for a 
response, and there is other bookkeeping of
- * requests that are outstanding and have 
originated from this server
- * (e.g., for setting the max outstanding 
requests) - we need to update this info when an
- * outstanding request completes. Note that in 
the other case (above), the operation
- * originated from a different 

[jira] [Commented] (ZOOKEEPER-2807) Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278048#comment-16278048
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2807:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/300
  
@afine You're right. I missed that syncWithLeader() call is on the same 
path and in the same thread as adding commits to the queue.

In which case this must be right:
- syncWithLeader() blocks the follower, 
- Leader sends commits for the sync process, 
- Leader sends UPTODATE at the very end, 
- Follower drains the commit queue
- Follower starts following.




> Flaky test: 
> org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged
> -
>
> Key: ZOOKEEPER-2807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2807
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #300: ZOOKEEPER-2807: Flaky test: org.apache.zookeeper.test....

2017-12-04 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/300
  
@afine You're right. I missed that syncWithLeader() call is on the same 
path and in the same thread as adding commits to the queue.

In which case this must be right:
- syncWithLeader() blocks the follower, 
- Leader sends commits for the sync process, 
- Leader sends UPTODATE at the very end, 
- Follower drains the commit queue
- Follower starts following.




---


[jira] [Commented] (ZOOKEEPER-2807) Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

2017-12-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277625#comment-16277625
 ] 

Hadoop QA commented on ZOOKEEPER-2807:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//console

This message is automatically generated.

> Flaky test: 
> org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged
> -
>
> Key: ZOOKEEPER-2807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2807
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER- PreCommit Build #1333

2017-12-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 87.12 MB...]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1333//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 3a2389afceab9fd16ddece611a0bf7bce7555d70 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 19 minutes 7 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2807
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
2 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.StandaloneDisabledTest.startSingleServerTest

Error Message:
Timeout occurred. Please note the time in the report does not reflect the time 
until the timeout.

Stack Trace:
junit.framework.AssertionFailedError: Timeout occurred. Please note the time in 
the report does not reflect the time until the timeout.
at java.lang.Thread.run(Thread.java:745)


FAILED:  org.apache.zookeeper.test.ReconfigTest.testQuorumSystemChange

Error Message:
client could not connect to reestablished quorum: giving up after 30+ seconds.

Stack Trace:
junit.framework.AssertionFailedError: client could not connect to reestablished 
quorum: giving up after 30+ seconds.
at org.apache.zookeeper.test.ReconfigTest.reconfig(ReconfigTest.java:93)
at 
org.apache.zookeeper.test.ReconfigTest.testQuorumSystemChange(ReconfigTest.java:870)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-2807) Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277582#comment-16277582
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2807:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/300#discussion_r154786233
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -327,6 +257,95 @@ public void run() {
 LOG.info("CommitProcessor exited loop!");
 }
 
+private void processCommittedRequest() throws IOException, 
InterruptedException {
+// In case of a spurious wakeup in waitForCommittedRequests we 
should not
+// remove the request from the queue until it has been processed
+Request request = committedRequests.peek();
+
+if (request == null) {
+committedRequests.poll();
--- End diff --

we don't


> Flaky test: 
> org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged
> -
>
> Key: ZOOKEEPER-2807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2807
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #300: ZOOKEEPER-2807: Flaky test: org.apache.zookeepe...

2017-12-04 Thread afine
Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/300#discussion_r154786233
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java ---
@@ -327,6 +257,95 @@ public void run() {
 LOG.info("CommitProcessor exited loop!");
 }
 
+private void processCommittedRequest() throws IOException, 
InterruptedException {
+// In case of a spurious wakeup in waitForCommittedRequests we 
should not
+// remove the request from the queue until it has been processed
+Request request = committedRequests.peek();
+
+if (request == null) {
+committedRequests.poll();
--- End diff --

we don't


---


Re: Proposal for reworked ZK web site generation: CMS -> jekyll

2017-12-04 Thread Patrick Hunt
Hi folks, final update on this one. INFRA has completed the switch over and
we are now live on the new website generation process. I've updated the
"how to release" page as well as the cwiki describing how our website is
managed.

https://cwiki.apache.org/confluence/display/ZOOKEEPER/WebSiteSetup

I made a small update this morning and pushed the changes live w/o any
issues.

Regards,

Patrick


On Fri, Dec 1, 2017 at 10:26 AM, Patrick Hunt  wrote:

> Thanks for all the feedback everyone. I didn't mean to exclude
> non-committers from providing feedback, all is welcome (just that I do need
> some committers to +1 this thing, thanks for that).
>
> I've pushed the website and asf-site branches to the official git repo and
> created the following jira with INFRA if you want to track:
> https://issues.apache.org/jira/browse/INFRA-15589
> Once this is live I will update the "how to release" instructions. If
> there are any other docs/etc... that I should update lmk.
>
> I looked into Tamas's feedback and oddly "github-markup" tool
> https://github.com/github/markup
> was working fine to convert the tables however the github site itself was
> not rendering properly as Tamas highlighted. Very odd. I fiddled with the
> formatting a bit and now seems to be working properly in both cases (github
> and jekyll rendering of the site html pages). Thanks Tamas!
>
> Patrick
>
>
> On Fri, Dec 1, 2017 at 3:57 AM, Tamas Penzes  wrote:
>
>> Hi Patrick,
>>
>> +1 is just my non-counting vote.
>>
>> Looks really good.
>>
>> I've just found some broken tables in the following file:
>> https://github.com/phunt/zookeeper/blob/website/credits.md
>> which might be because github's markdown is not the same as other
>> implementations.
>>
>> Regards, Tamaas
>>
>> On Fri, Dec 1, 2017 at 1:51 AM, Michael Han  wrote:
>>
>> > +1, the new publish process sounds much better. Thanks Pat.
>> >
>> > On Thu, Nov 30, 2017 at 5:00 AM, Camille Fournier 
>> > wrote:
>> >
>> > > +1 good idea to get this modernized thanks pat
>> > >
>> > > On Nov 30, 2017 2:17 AM, "Patrick Hunt"  wrote:
>> > >
>> > > Hi folks. After the issues a few weeks ago during the 3.4.11 release
>> > trying
>> > > to get the site published and INFRA no longer supporting CMS I've gone
>> > > through the effort to look at what other options are available. I
>> > reviewed
>> > > a number of other ASF sites and it looks like jekyll with markdown is
>> > very
>> > > popular. Additionally INFRA is currently supporting gitpubsub - which
>> > means
>> > > that if we can generate a static site and commit the results to git
>> INFRA
>> > > will take that and update the live production site. Basic workflow
>> would
>> > > then be:
>> > >
>> > > 1) manually edit the markdown pages which are the source of the
>> website
>> > > (similar to what we do today)
>> > > 2) generate the static website using jekyll, review this as the
>> "staged"
>> > > site (locally)
>> > > 3) once we're happy with it commit/push the changes to the markdown
>> > source
>> > > 4) commit/push the changes to the generated/static site content -
>> > gitpubsub
>> > > will then push those live to zookeeper.apache.org.
>> > >
>> > > This is pretty close to what we are doing today as part of a release
>> but
>> > > it's streamlined and takes CMS out of the equation (the old content
>> > > management which is no longer supported by INFRA).
>> > >
>> > > I've converted the current website over to this new model and staged
>> the
>> > > change in my personal github repo. Please take a look as I'd like to
>> move
>> > > over to this new model soon - let's say about a week from today for
>> > > feedback.
>> > >
>> > > Any committers out there please give this a +1 if you're on board -
>> otw
>> > let
>> > > me know your concerns.
>> > >
>> > > This would be the source of the website, markdown/jekyll based:
>> > > https://github.com/phunt/zookeeper/tree/website
>> > >
>> > > This is the generated site (html) - pushing this branch would cause
>> asf
>> > > INFRA to re-publish the site (gitpubsub):
>> > > https://github.com/phunt/zookeeper/tree/asf-site
>> > >
>> > > Notice these are orphan branches (no history aside from the recent
>> docs
>> > > changes) and both of these branches would live within the
>> > existing/current
>> > > zookeeper git repo. So if you clone the zookeeper repo you'll have the
>> > > website as well - no longer necessary to checkout multiple repos in
>> order
>> > > to update the website.
>> > >
>> > > Patrick
>> > >
>> >
>>
>>
>>
>> --
>>
>> *Tamás **Pénzes* | Engineering Manager
>> e. tam...@cloudera.com
>> cloudera.com 
>>
>> [image: Cloudera] 
>>
>> [image: Cloudera on Twitter]  [image:
>> Cloudera on Facebook]  [image:
>> Cloudera
>> on LinkedIn] 
>> 

[jira] [Commented] (ZOOKEEPER-2807) Flaky test: org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277541#comment-16277541
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2807:
---

Github user afine commented on the issue:

https://github.com/apache/zookeeper/pull/300
  
@anmolnar With respect to the code path above, shouldn't that be blocked on 
`syncWithLeader`?

> Even if you drain the committedRequests, I'm not sure that guarantees 
that there are no more that will arrive.

I'm not sure I understand how we don't have this guarantee. My 
understanding is that `syncWithLeader` loops until an `UPTODATE` message is 
received by the follower. Incoming packets from the leader are read by:
```java
syncWithLeader(newEpochZxid);
QuorumPacket qp = new QuorumPacket();
while (this.isRunning()) {
readPacket(qp);
processPacket(qp);
}
```

In addition, my understanding is that requests are only added to 
`CommitProcessor`'s `committedRequests` in `processPacket`. What am I missing?


> Flaky test: 
> org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged
> -
>
> Key: ZOOKEEPER-2807
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2807
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Abraham Fine
>Assignee: Abraham Fine
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #300: ZOOKEEPER-2807: Flaky test: org.apache.zookeeper.test....

2017-12-04 Thread afine
Github user afine commented on the issue:

https://github.com/apache/zookeeper/pull/300
  
@anmolnar With respect to the code path above, shouldn't that be blocked on 
`syncWithLeader`?

> Even if you drain the committedRequests, I'm not sure that guarantees 
that there are no more that will arrive.

I'm not sure I understand how we don't have this guarantee. My 
understanding is that `syncWithLeader` loops until an `UPTODATE` message is 
received by the follower. Incoming packets from the leader are read by:
```java
syncWithLeader(newEpochZxid);
QuorumPacket qp = new QuorumPacket();
while (this.isRunning()) {
readPacket(qp);
processPacket(qp);
}
```

In addition, my understanding is that requests are only added to 
`CommitProcessor`'s `committedRequests` in `processPacket`. What am I missing?


---


Re: ZK Jenkins job updates.

2017-12-04 Thread Abraham Fine
Thank you Patrick!

This should make debugging flaky tests much less painful.

Abe

On Sun, Dec 3, 2017, at 17:33, Patrick Hunt wrote:
> I made a few updates to the jenkins job configs. Abe pointed out that
> while
> we are now generating individual logs files for tests (one per junit test
> class) we were not capturing them as artifacts of the jenkins jobs. I've
> updated the jobs to now capture these files as artifacts which should
> simplify debugging.
> 
> We are running with multiple junit test execution threads now (typ. 8 in
> 3.5+ version jobs). Having a single "console log" output of the jenkins
> job
> with all of these test logs inter-mingled is not useful. As such I've
> configured test.output=no for the jobs. This simplifies the console
> output
> and given we have the individual log files now as artifacts it wasn't
> very
> useful.
> 
> If there are any questions lmk.
> 
> Regards,
> 
> Patrick


[jira] [Commented] (ZOOKEEPER-2915) Use "strict" conflict management in ivy

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277327#comment-16277327
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2915:
---

Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/426


> Use "strict" conflict management in ivy
> ---
>
> Key: ZOOKEEPER-2915
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2915
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.11, 3.5.4, 3.6.0
>Reporter: Abraham Fine
>Assignee: Abraham Fine
> Fix For: 3.4.11, 3.5.4, 3.6.0
>
>
> Currently it is very difficult to tell exactly which dependencies make it 
> into the final classpath of zookeeper. We do not perform any conflict 
> resolution between the test and default classpaths (this has resulted in 
> strange behavior with the slf4j-log4j12 binding) and have no way of telling 
> if a change to the dependencies has altered the transitive dependencies 
> pulled down by the project. 
> Our dependency list is relatively small so we should use "strict" conflict 
> management (break the build when we try to pull two versions of the same 
> dependency) so we can exercise maximum control over the classpath. 
> Note: I also attempted to find a way to see if I could always prefer 
> transitive dependencies from the default configuration over those pulled by 
> the test configuration (to make sure that the zookeeper we test against has 
> the same dependencies as the one we ship) but this appears to be impossible 
> (or at least incredibly difficult) with ivy. Any opinions here would be 
> greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #426: ZOOKEEPER-2915: Use "strict" conflict managemen...

2017-12-04 Thread afine
Github user afine closed the pull request at:

https://github.com/apache/zookeeper/pull/426


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276730#comment-16276730
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user riccardofreixo commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
@sslavic thanks for the suggestion.

We haven't tried that approach, and as far as I can tell it sounds like it 
would work. You'd still have the re-resolution problem if you deleted/recreated 
the service, but that should be quite rare. Had we thought of that before, we 
probably wouldn't have patched the client. Now we have though, we'll keep it 
patched.

I still think this should be fixed on the zk-client, as there are other 
circumstances other than Kube where the IP addresses may change and you 
wouldn't have an easy solution such as ClusterIP.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #150: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...

2017-12-04 Thread riccardofreixo
Github user riccardofreixo commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
@sslavic thanks for the suggestion.

We haven't tried that approach, and as far as I can tell it sounds like it 
would work. You'd still have the re-resolution problem if you deleted/recreated 
the service, but that should be quite rare. Had we thought of that before, we 
probably wouldn't have patched the client. Now we have though, we'll keep it 
patched.

I still think this should be fixed on the zk-client, as there are other 
circumstances other than Kube where the IP addresses may change and you 
wouldn't have an easy solution such as ClusterIP.


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276711#comment-16276711
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user sslavic commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
@riccardofreixo have you tried using ClusterIP Service for ZooKeeper 
StatefulSet and providing that ClusterIP (or service hostname) to Kafka / 
ZooKeeper clients as sole ZooKeeper hostname?

StatefulSet can have multiple replicas, but to ZooKeeper clients all of the 
members no matter how many of them there are (1, 3, 5, ..) would be accessible 
under single ClusterIP.

Even when Pods of StatefulSet die and get re-scheduled for whatever reason, 
they will likely get new IP, but IP of ClusterIP Service remains stable so 
ZooKeeper clients should be able to reconnect, without need to reresolve IP 
address of the host.

If there's a quorum, Pod that died does not necessarily have to become 
available quickly, clients should still be able to connect even without losing 
session.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #150: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...

2017-12-04 Thread sslavic
Github user sslavic commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
@riccardofreixo have you tried using ClusterIP Service for ZooKeeper 
StatefulSet and providing that ClusterIP (or service hostname) to Kafka / 
ZooKeeper clients as sole ZooKeeper hostname?

StatefulSet can have multiple replicas, but to ZooKeeper clients all of the 
members no matter how many of them there are (1, 3, 5, ..) would be accessible 
under single ClusterIP.

Even when Pods of StatefulSet die and get re-scheduled for whatever reason, 
they will likely get new IP, but IP of ClusterIP Service remains stable so 
ZooKeeper clients should be able to reconnect, without need to reresolve IP 
address of the host.

If there's a quorum, Pod that died does not necessarily have to become 
available quickly, clients should still be able to connect even without losing 
session.


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276675#comment-16276675
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user riccardofreixo commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
We're running Kafka in Kubernetes, so this bug was biting us regularly.
We applied the patch in the kafka clusters of our client and are running in 
prod. Solves our problem and created no additional problems for us.


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #150: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...

2017-12-04 Thread riccardofreixo
Github user riccardofreixo commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
We're running Kafka in Kubernetes, so this bug was biting us regularly.
We applied the patch in the kafka clusters of our client and are running in 
prod. Solves our problem and created no additional problems for us.


---


[GitHub] zookeeper issue #150: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...

2017-12-04 Thread jorgheymans
Github user jorgheymans commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
just got stung by this as well, assumed zk clients would be clever enough 
to reresolve :-/ 

Since there is a lot of interest in this why not just rebase-merge and let 
ppl test out the snapshot builds ? 


---


[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276584#comment-16276584
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2184:
---

Github user jorgheymans commented on the issue:

https://github.com/apache/zookeeper/pull/150
  
just got stung by this as well, assumed zk clients would be clever enough 
to reresolve :-/ 

Since there is a lot of interest in this why not just rebase-merge and let 
ppl test out the snapshot builds ? 


> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: easyfix, patch
> Fix For: 3.5.4, 3.4.12
>
> Attachments: ZOOKEEPER-2184.patch
>
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #415: ZOOKEEPER-2939: Added last/min/max proposal size JMX b...

2017-12-04 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/415
  
@afine @phunt Findbugs issues have been resolved. Please review & commit.


---


[jira] [Commented] (ZOOKEEPER-2939) Deal with maxbuffer as it relates to proposals

2017-12-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276570#comment-16276570
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2939:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/415
  
@afine @phunt Findbugs issues have been resolved. Please review & commit.


> Deal with maxbuffer as it relates to proposals
> --
>
> Key: ZOOKEEPER-2939
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2939
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: jute, server
>Reporter: Andor Molnar
>Assignee: Andor Molnar
> Fix For: 3.5.4, 3.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)