Re: ZooKeeper 3.4 to 3.5.x upgrade: "No snapshot found, but there are log entries. Something is broken!"

2018-06-04 Thread Matteo Merli
That is correct, there are only a few transaction so the snapshot has not
been triggered yet.

The question is more on how to plan for seamless upgrade, from 3.4.10 to
3.5.x, from an end users perspective.

On Mon, Jun 4, 2018 at 11:15 PM Michael Han  wrote:

> Hi Matteo,
>
> Maybe your ZK instance did not take a snapshot at all - it's possible if
> your total number of transactions less than the configured snapCount
> (default value is 1) at the time you are doing upgrade. You could check
> your transaction log file and the snapCount configuration see if this is
> the case or not.
>
>
> On Mon, Jun 4, 2018 at 10:02 PM, Matteo Merli  wrote:
>
>>
>> >> Also can you advice the steps for people who using 3.4.x to upgrade
>>> to 3.5.4-beta
>>>
>>> The only catch I remember is that if you are using a version older than
>>> 3.4.6, you'd need to upgrade through 3.4.6 first before upgrading to 3.5.x,
>>> if you are doing a rolling upgrade and want to keep the liveness of the
>>> quorum. See more
>>> https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#ch_reconfig_upgrade
>>> .
>>>
>>
>>
>> Hi Michael,
>>
>> This is happening upgrading from 3.4.10 to 3.5.4. It's single node
>> embedded ZK server, part of a self-contained standalone service.
>>
>> Definitely , in 3.4.10 single mode, the snapshot dosesn't get created on
>> bootstrap.
>>
>> Thanks,
>> Matteo
>> --
>> Matteo Merli
>> 
>>
>
> --
Matteo Merli



Re: ZooKeeper 3.4 to 3.5.x upgrade: "No snapshot found, but there are log entries. Something is broken!"

2018-06-04 Thread Michael Han
Hi Matteo,

Maybe your ZK instance did not take a snapshot at all - it's possible if
your total number of transactions less than the configured snapCount
(default value is 1) at the time you are doing upgrade. You could check
your transaction log file and the snapCount configuration see if this is
the case or not.


On Mon, Jun 4, 2018 at 10:02 PM, Matteo Merli  wrote:

>
> >> Also can you advice the steps for people who using 3.4.x to upgrade to
>> 3.5.4-beta
>>
>> The only catch I remember is that if you are using a version older than
>> 3.4.6, you'd need to upgrade through 3.4.6 first before upgrading to 3.5.x,
>> if you are doing a rolling upgrade and want to keep the liveness of the
>> quorum. See more https://zookeeper.apache.org/doc/r3.5.3-beta/
>> zookeeperReconfig.html#ch_reconfig_upgrade.
>>
>
>
> Hi Michael,
>
> This is happening upgrading from 3.4.10 to 3.5.4. It's single node
> embedded ZK server, part of a self-contained standalone service.
>
> Definitely , in 3.4.10 single mode, the snapshot dosesn't get created on
> bootstrap.
>
> Thanks,
> Matteo
> --
> Matteo Merli
> 
>


Re: ZooKeeper 3.4 to 3.5.x upgrade: "No snapshot found, but there are log entries. Something is broken!"

2018-06-04 Thread Matteo Merli
> >> Also can you advice the steps for people who using 3.4.x to upgrade to
> 3.5.4-beta
>
> The only catch I remember is that if you are using a version older than
> 3.4.6, you'd need to upgrade through 3.4.6 first before upgrading to 3.5.x,
> if you are doing a rolling upgrade and want to keep the liveness of the
> quorum. See more
> https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#ch_reconfig_upgrade
> .
>


Hi Michael,

This is happening upgrading from 3.4.10 to 3.5.4. It's single node embedded
ZK server, part of a self-contained standalone service.

Definitely , in 3.4.10 single mode, the snapshot dosesn't get created on
bootstrap.

Thanks,
Matteo
-- 
Matteo Merli



Re: ZooKeeper 3.4 to 3.5.x upgrade: "No snapshot found, but there are log entries. Something is broken!"

2018-06-04 Thread Michael Han
Hi Sijie,

>> I am just curious why the change was made in such way.

It's a safety guarantee. Consider this case:

* An ensemble of server A, B, and C. A and B have most up to date
transactions (let's say zxid + 1) while C is lagging one transaction behind
(C has zxid). A is the current leader.
* A is partitioned away. And for some reasons B lost its snapshot file (for
example an admin 'rm -rf' the entire dataDir by mistake) at the same time.
* Now with B and C, if we don't do the check, B will be elected as leader
as it has most up to date transaction (zxid + 1). The state of the ensemble
will be set as B's state, which is incorrect as although it has most up to
date transactions, it lost the old state with the missing snapshot file.
* In this case, we'd rather have the system stops working, by disallowing B
participate leader election, rather than having a working system with
incorrect state.

Note the only case that we allow an empty snapshot file is when B is
bootstrapped as a new server joining the quorum.

>> Also can you advice the steps for people who using 3.4.x to upgrade to
3.5.4-beta

The only catch I remember is that if you are using a version older than
3.4.6, you'd need to upgrade through 3.4.6 first before upgrading to 3.5.x,
if you are doing a rolling upgrade and want to keep the liveness of the
quorum. See more
https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html#ch_reconfig_upgrade
.


On Mon, Jun 4, 2018 at 5:40 PM, Sijie Guo  wrote:

> Hi zookeeper team,
>
>
> We hit an issue when upgrading from 3.4.x to 3.5.4-beta. Need some
> helps/advices from the community.
>
> ```
> *10:14:55.607 [main] INFO  org.apache.zookeeper.server.
> NIOServerCnxnFactory
> - binding to port 0.0.0.0/0.0.0.0:2181 *
> *10:14:55.623 [main] ERROR
> org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble - Exception while
> instantiating ZooKeeper*
> *java.io.IOException: No snapshot found, but there are log entries.
> Something is broken!*
> *at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.
> restore(FileTxnSnapLog.java:206)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(
> ZooKeeperServer.java:284)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.zookeeper.server.ZooKeeperServer.startdata(
> ZooKeeperServer.java:444)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(
> NIOServerCnxnFactory.java:764)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.zookeeper.server.ServerCnxnFactory.startup(
> ServerCnxnFactory.java:98)
> ~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runZookeeper(
> LocalBookkeeperEnsemble.java:126)
> [pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone(
> LocalBookkeeperEnsemble.java:242)
> [pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.pulsar.PulsarStandaloneStarter.start(
> PulsarStandaloneStarter.java:171)
> [pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> *at
> org.apache.pulsar.PulsarStandaloneStarter.main(
> PulsarStandaloneStarter.java:266)
> [pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
> ```
>
>
> Looking into the source code,
> https://github.com/apache/zookeeper/blob/release-3.5.4/
> src/java/main/org/apache/zookeeper/server/persistence/
> FileTxnSnapLog.java#L206
>
> A fix was introduced in https://issues.apache.org/
> jira/browse/ZOOKEEPER-2325
> to throw exception when there is no snapshots and txn log is not empty.
>
> I am just curious why the change was made in such way. my feeling in a
> snapshotting-based store, if there is no snapshots but there are log
> entries, it usually doesn't mean the state was corrupted. I guess I might
> miss some context behind ZOOKEEPER-2325.
>
>
> Also can you advice the steps for people who using 3.4.x to upgrade to
> 3.5.4-beta?
>
> Thanks,
> Sijie
>


ZooKeeper 3.4 to 3.5.x upgrade: "No snapshot found, but there are log entries. Something is broken!"

2018-06-04 Thread Sijie Guo
Hi zookeeper team,


We hit an issue when upgrading from 3.4.x to 3.5.4-beta. Need some
helps/advices from the community.

```
*10:14:55.607 [main] INFO  org.apache.zookeeper.server.NIOServerCnxnFactory
- binding to port 0.0.0.0/0.0.0.0:2181 *
*10:14:55.623 [main] ERROR
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble - Exception while
instantiating ZooKeeper*
*java.io.IOException: No snapshot found, but there are log entries.
Something is broken!*
*at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:206)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:284)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:444)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
~[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runZookeeper(LocalBookkeeperEnsemble.java:126)
[pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone(LocalBookkeeperEnsemble.java:242)
[pulsar-zookeeper-utils.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.pulsar.PulsarStandaloneStarter.start(PulsarStandaloneStarter.java:171)
[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
*at
org.apache.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:266)
[pulsar-broker.jar:2.1.0-incubating-SNAPSHOT]*
```


Looking into the source code,
https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206

A fix was introduced in https://issues.apache.org/jira/browse/ZOOKEEPER-2325
to throw exception when there is no snapshots and txn log is not empty.

I am just curious why the change was made in such way. my feeling in a
snapshotting-based store, if there is no snapshots but there are log
entries, it usually doesn't mean the state was corrupted. I guess I might
miss some context behind ZOOKEEPER-2325.


Also can you advice the steps for people who using 3.4.x to upgrade to
3.5.4-beta?

Thanks,
Sijie


Success: ZOOKEEPER- PreCommit Build #1789

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 80.39 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16500128 added to ZOOKEEPER-2184.
 [exec] Session logged out. Session was 
JSESSIONID=AD7883286BD8F22888CCF33BBB81363B.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 18 minutes 7 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2184
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500128#comment-16500128
 ] 

Hadoop QA commented on ZOOKEEPER-2184:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1789//console

This message is automatically generated.

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Andor Molnar
>Priority: Blocker
>  Labels: easyfix, patch, pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
> Attachments: ZOOKEEPER-2184.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: ZOOKEEPER- PreCommit Build #1788

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 84.92 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16500057 added to ZOOKEEPER-2184.
 [exec] Session logged out. Session was 
JSESSIONID=8BBF711EC22F65DC18724A3F3A23ECBE.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1804:
 exec returned: 1

Total time: 14 minutes 48 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2184
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss

Error Message:
expected:<1> but was:<2>

Stack Trace:
junit.framework.AssertionFailedError: expected:<1> but was:<2>
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss(QuorumPeerMainTest.java:969)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500057#comment-16500057
 ] 

Hadoop QA commented on ZOOKEEPER-2184:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1788//console

This message is automatically generated.

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Andor Molnar
>Priority: Blocker
>  Labels: easyfix, patch, pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
> Attachments: ZOOKEEPER-2184.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #1787

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 83.05 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16500027 added to ZOOKEEPER-2184.
 [exec] Session logged out. Session was 
JSESSIONID=E652C3F896178FD673D23042DF0B8427.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 6 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2184
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2184) Zookeeper Client should re-resolve hosts when connection attempts fail

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500027#comment-16500027
 ] 

Hadoop QA commented on ZOOKEEPER-2184:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1787//console

This message is automatically generated.

> Zookeeper Client should re-resolve hosts when connection attempts fail
> --
>
> Key: ZOOKEEPER-2184
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2184
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.6, 3.4.7, 3.4.8, 3.4.9, 3.4.10, 3.5.0, 3.5.1, 3.5.2, 
> 3.5.3, 3.4.11
> Environment: Ubuntu 14.04 host, Docker containers for Zookeeper & 
> Kafka
>Reporter: Robert P. Thille
>Assignee: Andor Molnar
>Priority: Blocker
>  Labels: easyfix, patch, pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
> Attachments: ZOOKEEPER-2184.patch
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Testing in a Docker environment with a single Kafka instance using a single 
> Zookeeper instance. Restarting the Zookeeper container will cause it to 
> receive a new IP address. Kafka will never be able to reconnect to Zookeeper 
> and will hang indefinitely. Updating DNS or /etc/hosts with the new IP 
> address will not help the client to reconnect as the 
> zookeeper/client/StaticHostProvider resolves the connection string hosts at 
> creation time and never re-resolves.
> A solution would be for the client to notice that connection attempts fail 
> and attempt to re-resolve the hostnames in the connectString.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #534: ZOOKEEPER-2184 Zookeeper Client should re-resol...

2018-06-04 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/534#discussion_r192696652
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -96,36 +115,46 @@ public 
StaticHostProvider(Collection serverAddresses) {
  */
 public StaticHostProvider(Collection 
serverAddresses,
 long randomnessSeed) {
-sourceOfRandomness = new Random(randomnessSeed);
+init(serverAddresses, randomnessSeed, new Resolver() {
+@Override
+public InetAddress[] getAllByName(String name) throws 
UnknownHostException {
+return InetAddress.getAllByName(name);
+}
+});
+}
 
-this.serverAddresses = resolveAndShuffle(serverAddresses);
-if (this.serverAddresses.isEmpty()) {
+private void init(Collection serverAddresses, long 
randomnessSeed, Resolver resolver) {
+this.sourceOfRandomness = new Random(randomnessSeed);
+this.resolver = resolver;
+if (serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
-}   
+}
+this.serverAddresses = shuffle(serverAddresses);
 currentIndex = -1;
-lastIndex = -1;  
+lastIndex = -1;
 }
 
-private List 
resolveAndShuffle(Collection serverAddresses) {
-List tmpList = new 
ArrayList(serverAddresses.size());   
-for (InetSocketAddress address : serverAddresses) {
-try {
-InetAddress ia = address.getAddress();
-String addr = (ia != null) ? ia.getHostAddress() : 
address.getHostString();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName(addr);
-for (InetAddress resolvedAddress : resolvedAddresses) {
-InetAddress taddr = 
InetAddress.getByAddress(address.getHostString(), resolvedAddress.getAddress());
-tmpList.add(new InetSocketAddress(taddr, 
address.getPort()));
-}
-} catch (UnknownHostException ex) {
-LOG.warn("No IP address found for server: {}", address, 
ex);
+private InetSocketAddress resolve(InetSocketAddress address) {
+try {
+String curHostString = address.getHostString();
+List resolvedAddresses = new 
ArrayList<>(Arrays.asList(this.resolver.getAllByName(curHostString)));
+if (resolvedAddresses.isEmpty()) {
+return address;
 }
+Collections.shuffle(resolvedAddresses);
+return new InetSocketAddress(resolvedAddresses.get(0), 
address.getPort());
+} catch (UnknownHostException e) {
--- End diff --

That's correct. The caller will end up getting UnknownHostException when 
trying to open the socket to the unresolvable address:
```
2018-06-04 12:31:26,022 [myid:huhuuhujkdshgfjksgd.com:2181] - WARN  
[main-SendThread(huhuuhujkdshgfjksgd.com:2181):ClientCnxn$SendThread@1237] - 
Session 0x0 for server huhuuhujkdshgfjksgd.com:2181, unexpected error, closing 
socket connection and attempting reconnect
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:275)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:285)
at 
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1091)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1133)
```

Logging makes sense, I added an error log entry to make it clear.


---


[GitHub] zookeeper pull request #534: ZOOKEEPER-2184 Zookeeper Client should re-resol...

2018-06-04 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/534#discussion_r192694788
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -314,7 +340,7 @@ public InetSocketAddress next(long spinDelay) {
 addr = nextHostInReconfigMode();
 if (addr != null) {
currentIndex = serverAddresses.indexOf(addr);
-   return addr;
+   return resolve(addr);
--- End diff --

I disagree. We should not do any caching in our codebase, because there're 
multiple levels of caching already present in DNS infrastructure, like JVM 
caching, os-level caching, DNS servers caching, etc. `resolve()` will 
eventually become a no-op if any of these caches find a hit.


---


[GitHub] zookeeper pull request #534: ZOOKEEPER-2184 Zookeeper Client should re-resol...

2018-06-04 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/534#discussion_r192694235
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -73,15 +80,27 @@
  * if serverAddresses is empty or resolves to an empty list
  */
 public StaticHostProvider(Collection 
serverAddresses) {
-   sourceOfRandomness = new Random(System.currentTimeMillis() ^ 
this.hashCode());
+init(serverAddresses,
--- End diff --

Sorry @lvfangmin , I might be missing the point here. Shall I change the 
signature to use non-generic Collection?


---


[GitHub] zookeeper issue #534: ZOOKEEPER-2184 Zookeeper Client should re-resolve host...

2018-06-04 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/534
  
@hanm Thanks, I updated the comments to be consistent with the original PR.
Unfortunately the comment of `HostProvider` wasn't updated, so I also made 
changes in there.


---


Success: ZOOKEEPER- PreCommit Build #1785

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 35.75 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 15 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16500012 added to ZOOKEEPER-3019.
 [exec] Session logged out. Session was 
JSESSIONID=50A2EC2BFC8E9DD8626A0EEA529EA007.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 54 minutes 7 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-3019) Add a metric to track number of slow fsyncs

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500012#comment-16500012
 ] 

Hadoop QA commented on ZOOKEEPER-3019:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1785//console

This message is automatically generated.

> Add a metric to track number of slow fsyncs
> ---
>
> Key: ZOOKEEPER-3019
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3019
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx, server
>Affects Versions: 3.5.3, 3.4.11, 3.6.0
>Reporter: Norbert Kalmar
>Assignee: Norbert Kalmar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add jmx bean and Command to ZooKeeper server to expose the the number of slow 
> fsyncs as a metric.
> FileTxnLog.commit() should count the number of times fsync exceeds 
> fsyncWarningThresholdMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #533: ZOOKEEPER-2989:IPv6 literal address causes problems fo...

2018-06-04 Thread nkalmar
Github user nkalmar commented on the issue:

https://github.com/apache/zookeeper/pull/533
  
I agree with @anmolnar in that the methods are a little robust and test 
multiple things. Whether it should be refactored whenever we touch a unit test, 
that's a tough question. 

But I also agree with what @maoling  said about the inner class: generally, 
unit tests should not be separated when dealing with inner classes, as they 
only makes sense in the context of the outer class.

Anyway, the change looks good to me! 


---


Success: ZOOKEEPER- PreCommit Build #1786

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 85.34 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 9 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16499989 added to ZOOKEEPER-3019.
 [exec] Session logged out. Session was 
JSESSIONID=62998C9C015F67E145A63F62E30889CF.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD SUCCESSFUL
Total time: 20 minutes 14 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-3019) Add a metric to track number of slow fsyncs

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499989#comment-16499989
 ] 

Hadoop QA commented on ZOOKEEPER-3019:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1786//console

This message is automatically generated.

> Add a metric to track number of slow fsyncs
> ---
>
> Key: ZOOKEEPER-3019
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3019
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx, server
>Affects Versions: 3.5.3, 3.4.11, 3.6.0
>Reporter: Norbert Kalmar
>Assignee: Norbert Kalmar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add jmx bean and Command to ZooKeeper server to expose the the number of slow 
> fsyncs as a metric.
> FileTxnLog.commit() should count the number of times fsync exceeds 
> fsyncWarningThresholdMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3040) flaky test EphemeralNodeDeletionTest

2018-06-04 Thread Norbert Kalmar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499959#comment-16499959
 ] 

Norbert Kalmar commented on ZOOKEEPER-3040:
---

Yes, I was afraid you were gonna say that :)
Well, I won't run the whole test locally hundreds of time, I'll see if I can 
get some distributed testing working to verify the change.

> flaky test EphemeralNodeDeletionTest
> 
>
> Key: ZOOKEEPER-3040
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3040
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.5.4, 3.6.0, 3.4.12
>Reporter: Patrick Hunt
>Assignee: Norbert Kalmar
>Priority: Major
>  Labels: flaky
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
>
> Flakey test EphemeralNodeDeletionTest
> {noformat}
> java.lang.AssertionError: After session close ephemeral node must be deleted 
> expected null, but 
> was:<4294967302,4294967302,1525988536834,1525988536834,0,0,0,144127862257483776,1,0,4294967302
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: ZOOKEEPER- PreCommit Build #1784

2018-06-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 85.13 MB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16499949 added to ZOOKEEPER-3019.
 [exec] Session logged out. Session was 
JSESSIONID=6EB5050BA1972280DB54E3A69E2DD1A0.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1806:
 exec returned: 1

Total time: 11 minutes 46 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig

Error Message:
waiting for server 3 being up

Stack Trace:
junit.framework.AssertionFailedError: waiting for server 3 being up
at 
org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig(ReconfigRecoveryTest.java:224)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-3019) Add a metric to track number of slow fsyncs

2018-06-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499949#comment-16499949
 ] 

Hadoop QA commented on ZOOKEEPER-3019:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1784//console

This message is automatically generated.

> Add a metric to track number of slow fsyncs
> ---
>
> Key: ZOOKEEPER-3019
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3019
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: jmx, server
>Affects Versions: 3.5.3, 3.4.11, 3.6.0
>Reporter: Norbert Kalmar
>Assignee: Norbert Kalmar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.4.13, 3.5.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add jmx bean and Command to ZooKeeper server to expose the the number of slow 
> fsyncs as a metric.
> FileTxnLog.commit() should count the number of times fsync exceeds 
> fsyncWarningThresholdMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #510: ZOOKEEPER-3019 add metric for slow fsyncs count

2018-06-04 Thread nkalmar
Github user nkalmar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/510#discussion_r192668700
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java ---
@@ -320,6 +332,11 @@ public synchronized void commit() throws IOException {
 long syncElapsedMS =
 TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - 
startSyncNS);
 if (syncElapsedMS > fsyncWarningThresholdMS) {
+if(serverStats != null) {
+serverStats.incrementFsyncThresholdExceedCount();
+} else {
+LOG.warn("fsyncWarningThresholdMS exceeded, but 
serverStats not added in FileTxnLog!");
--- End diff --

Okay, sounds fair. I will remove it. 
Thanks for pointing this out!


---