[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129963#comment-16129963
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2872:
---

Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
We contemplated doing an fsync for every snapshot and decided against. 
You're taking a guaranteed io spike each time. That's fine when you're just 
syncing with the quorum but during normal operation, it seems best to keep 
snapshot taking a lighter weight operation.


> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #333: ZOOKEEPER-2872: Interrupted snapshot sync causes data ...

2017-08-16 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/333
  
We contemplated doing an fsync for every snapshot and decided against. 
You're taking a guaranteed io spike each time. That's fine when you're just 
syncing with the quorum but during normal operation, it seems best to keep 
snapshot taking a lighter weight operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch34_jdk7 - Build # 1620 - Failure

2017-08-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1620/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 31.12 MB...]
[junit] 2017-08-17 03:13:52,892 [myid:] - INFO  
[main:PrepRequestProcessor@769] - Shutting down
[junit] 2017-08-17 03:13:52,892 [myid:] - INFO  
[main:SyncRequestProcessor@208] - Shutting down
[junit] 2017-08-17 03:13:52,892 [myid:] - INFO  [ProcessThread(sid:0 
cport:11221)::PrepRequestProcessor@144] - PrepRequestProcessor exited loop!
[junit] 2017-08-17 03:13:52,893 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@186] - SyncRequestProcessor exited!
[junit] 2017-08-17 03:13:52,893 [myid:] - INFO  
[main:FinalRequestProcessor@403] - shutdown of request processor complete
[junit] 2017-08-17 03:13:52,893 [myid:] - INFO  
[main:FourLetterWordMain@65] - connecting to 127.0.0.1 11221
[junit] 2017-08-17 03:13:52,894 [myid:] - INFO  [main:JMXEnv@147] - 
ensureOnly:[]
[junit] 2017-08-17 03:13:52,895 [myid:] - INFO  [main:ClientBase@489] - 
STARTING server
[junit] 2017-08-17 03:13:52,895 [myid:] - INFO  [main:ClientBase@410] - 
CREATING server instance 127.0.0.1:11221
[junit] 2017-08-17 03:13:52,896 [myid:] - INFO  
[main:ServerCnxnFactory@116] - Using 
org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
[junit] 2017-08-17 03:13:52,896 [myid:] - INFO  
[main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:11221
[junit] 2017-08-17 03:13:52,896 [myid:] - INFO  [main:ClientBase@385] - 
STARTING server instance 127.0.0.1:11221
[junit] 2017-08-17 03:13:52,897 [myid:] - INFO  [main:ZooKeeperServer@173] 
- Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 
6 datadir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/build/test/tmp/test7087755801297679199.junit.dir/version-2
 snapdir 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/build/test/tmp/test7087755801297679199.junit.dir/version-2
[junit] 2017-08-17 03:13:52,900 [myid:] - ERROR [main:ZooKeeperServer@468] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-08-17 03:13:52,901 [myid:] - INFO  
[main:FourLetterWordMain@65] - connecting to 127.0.0.1 11221
[junit] 2017-08-17 03:13:52,901 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] - 
Accepted socket connection from /127.0.0.1:58141
[junit] 2017-08-17 03:13:52,901 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@892] - Processing 
stat command from /127.0.0.1:58141
[junit] 2017-08-17 03:13:52,902 [myid:] - INFO  
[Thread-4:NIOServerCnxn$StatCommand@683] - Stat command output
[junit] 2017-08-17 03:13:52,902 [myid:] - INFO  
[Thread-4:NIOServerCnxn@1040] - Closed socket connection for client 
/127.0.0.1:58141 (no session established for client)
[junit] 2017-08-17 03:13:52,902 [myid:] - INFO  [main:JMXEnv@230] - 
ensureParent:[InMemoryDataTree, StandaloneServer_port]
[junit] 2017-08-17 03:13:52,904 [myid:] - INFO  [main:JMXEnv@247] - 
expect:InMemoryDataTree
[junit] 2017-08-17 03:13:52,904 [myid:] - INFO  [main:JMXEnv@251] - 
found:InMemoryDataTree 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221,name1=InMemoryDataTree
[junit] 2017-08-17 03:13:52,905 [myid:] - INFO  [main:JMXEnv@247] - 
expect:StandaloneServer_port
[junit] 2017-08-17 03:13:52,905 [myid:] - INFO  [main:JMXEnv@251] - 
found:StandaloneServer_port 
org.apache.ZooKeeperService:name0=StandaloneServer_port11221
[junit] 2017-08-17 03:13:52,905 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@58] - Memory used 35692
[junit] 2017-08-17 03:13:52,905 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@63] - Number of threads 20
[junit] 2017-08-17 03:13:52,905 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@78] - FINISHED TEST METHOD testQuota
[junit] 2017-08-17 03:13:52,906 [myid:] - INFO  [main:ClientBase@566] - 
tearDown starting
[junit] 2017-08-17 03:13:52,927 [myid:] - INFO  [main:ZooKeeper@687] - 
Session: 0x100dad98111 closed
[junit] 2017-08-17 03:13:52,927 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down for 
session: 0x100dad98111
[junit] 2017-08-17 03:13:52,927 [myid:] - INFO  [main:ClientBase@536] - 
STOPPING server
[junit] 2017-08-17 03:13:52,929 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@242] - 
NIOServerCnxn factory exited run method
[junit] 2017-08-17 03:13:52,930 [myid:] - INFO  [main:ZooKeeperServer@501] 
- shutting down
[junit] 2017-08-17 03:13:52,930 [myid:] - ERROR [main:ZooKeeperServer@468] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR 

Re: why is reviewbot no longer -1 on patches with no tests?

2017-08-16 Thread Michael Han
Good catch. I think this is a bug in test github pull request script after
a brief look. Filed https://issues.apache.org/jira/browse/ZOOKEEPER-2876
for the fix.

On Wed, Aug 16, 2017 at 12:33 PM, Camille Fournier 
wrote:

> It seems like every patch without tests is being marked as a
> "documentation" patch but they clearly are not. Who should we ping to look
> at this?
>
> C
>



-- 
Cheers
Michael.


[jira] [Created] (ZOOKEEPER-2876) Github pull request test script should output -1 when there is no tests provided in patch, unless the subject under test is a documentation JIRA

2017-08-16 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-2876:
--

 Summary: Github pull request test script should output -1 when 
there is no tests provided in patch, unless the subject under test is a 
documentation JIRA
 Key: ZOOKEEPER-2876
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2876
 Project: ZooKeeper
  Issue Type: Bug
  Components: build-infrastructure, tests
Reporter: Michael Han


The github pull request test script (which is invoked as part of pre-commit 
workflow) should output -1 on a patch which does not include any tests, unless 
the patch is a documentation only patch.

We had this expected behavior before when we use the old PATCH approach:
{noformat}
-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.
{noformat}

A quick look on the 
[script|https://github.com/apache/zookeeper/blob/master/src/java/test/bin/test-github-pr.sh#L224]
 indicates that we do not set up the $PATCH/jira directory in the github test 
pull script, so it always thinks incoming pull request is a documentation only 
patch. This should be fixed so we get the old behavior and enforce that any new 
pull request must have tests unless explicitly justified not have to.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Process for reviewing submitted patches?

2017-08-16 Thread Patrick Hunt
I typically refer to the HTC on questions like this, it currently says "We
are currently discussing on the list how to adapt our workflow.". Perhaps
it's just a matter or someone cleaning up the doc?
https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute

Also I noticed Dan's JIRA has a patch attached.. and I can't get to JIRA at
the moment (jira is down again) but gh lists the PR being created 7 days
ago. What happens to folks that submitted a patch prior to the cutover,
they are in limbo. etc

Jira used to allow us to prioritize "patch availables", given our limited
resources. gh is just a big long list which makes it difficult to do
similar.

Patrick

On Wed, Aug 16, 2017 at 12:54 PM, Jordan Zimmerman <
jor...@jordanzimmerman.com> wrote:

> I thought we've moved to Pull Requests on Github. I've stopped posting
> patches.
>
> -JZ
>
> > On Aug 16, 2017, at 7:15 PM, Patrick Hunt  wrote:
> >
> > On Wed, Aug 16, 2017 at 9:51 AM, Jordan Zimmerman <
> > jor...@jordanzimmerman.com> wrote:
> >
> >> * Review other people's patch. If you help out, others will be more
> willing
> >> to do the same for you. If someone is kind enough to review your code,
> you
> >> should return the favor to for someone else.
> >>
> >>
> >> That's fair - I should personally try to do more of this. I'll make an
> >> effort here.
> >>
> >>
> > It's not clear to me how we are identifying patches for review today. We
> > used to have a very clear process -  a jira needed to be in the "patch
> > available" state in order to be considered for commit.
> >
> > See "contribute" section here, notice that it's watered down from what it
> > used to be:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> >
> > Dan's patch is not in "patch available" state, is that one of the reasons
> > why it's not being moved forward?
> >
> > Patrick
> >
> >
> >> -Jordan
> >>
>
>


Re: Process for reviewing submitted patches?

2017-08-16 Thread Jordan Zimmerman
I thought we've moved to Pull Requests on Github. I've stopped posting patches.

-JZ

> On Aug 16, 2017, at 7:15 PM, Patrick Hunt  wrote:
> 
> On Wed, Aug 16, 2017 at 9:51 AM, Jordan Zimmerman <
> jor...@jordanzimmerman.com> wrote:
> 
>> * Review other people's patch. If you help out, others will be more willing
>> to do the same for you. If someone is kind enough to review your code, you
>> should return the favor to for someone else.
>> 
>> 
>> That's fair - I should personally try to do more of this. I'll make an
>> effort here.
>> 
>> 
> It's not clear to me how we are identifying patches for review today. We
> used to have a very clear process -  a jira needed to be in the "patch
> available" state in order to be considered for commit.
> 
> See "contribute" section here, notice that it's watered down from what it
> used to be:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> 
> Dan's patch is not in "patch available" state, is that one of the reasons
> why it's not being moved forward?
> 
> Patrick
> 
> 
>> -Jordan
>> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Process for reviewing submitted patches?

2017-08-16 Thread Camille Fournier
A few thoughts:
1) It is impossible for us to set SLAs for ZK patches to be reviewed. If we
were a company making money on ZK and guaranteeing support for customers
who paid us, perhaps we could do that (and for all I know, it's possible
that customers with contracts at various companies that rely on ZK for
their products do get this). I've been watching this project for many years
now, and because there is no one company that is "The ZooKeeper Company",
it's always a hit-or-miss level of participation.
2) Part of the reason it's a hit-or-miss project is that it is, for better
or worse, somewhat complex, and very mission-critical. Especially when we
get new features, determining whether these features make sense for the
operational boundaries of the system is non-trivial. I don't think the
community wants us to rush in patches to just see the project change
(although if you do, please let's hear it).
3) If you want to get your patches committed, you should expect to
follow-up with the group until it happens. This is a community where polite
reminders can be effectively used to cause movement. Again, see: many of us
are truly volunteers. It is also helpful if you make sure that your patches
have tests if at all possible, and generally follow the coding standards.
If you commit something and you get a -1 from reviewbot, actually
addressing that -1 will help. Explaining what you're doing and why helps a
lot. Many of you do this, but it's certainly not something we always see in
every patch.

We're doing better now than we have been in the past, largely thanks to a
lot of attention recently from a subset of the committers (not including me
sorry, I'm writing this email from my vacation which is about the only time
I ever have to focus on the project). Michael had some great comments on
how the community can help, so follow his lead.

Thanks,
C

On Wed, Aug 16, 2017 at 1:26 PM, Michael Han  wrote:

> We are using github pull request instead of the old patch approach since
> last October. So the status of JIRA is irrelevant now (in particular, Patch
> Available will not trigger Jenkins pre-commit workflow now.). This was
> discussed on dev list when we moved to github, the thread's name is "[VOTE]
> move Apache Zookeeper to git".
>
> As for how to identify available patches for review, they should be all
> here:
> https://github.com/apache/zookeeper/pulls
>
> To get notified for new incoming pull requests:
> * Watch our github repo: https://github.com/apache/zookeeper
> Or
> * Subscribe to dev mailing list. Because we have git -> JIRA hook any new
> pull request will get cross posted to JIRA which will then be forwarded to
> dev mailing list.
>
> On Wed, Aug 16, 2017 at 10:15 AM, Patrick Hunt  wrote:
>
> > On Wed, Aug 16, 2017 at 9:51 AM, Jordan Zimmerman <
> > jor...@jordanzimmerman.com> wrote:
> >
> > > * Review other people's patch. If you help out, others will be more
> > willing
> > > to do the same for you. If someone is kind enough to review your code,
> > you
> > > should return the favor to for someone else.
> > >
> > >
> > > That's fair - I should personally try to do more of this. I'll make an
> > > effort here.
> > >
> > >
> > It's not clear to me how we are identifying patches for review today. We
> > used to have a very clear process -  a jira needed to be in the "patch
> > available" state in order to be considered for commit.
> >
> > See "contribute" section here, notice that it's watered down from what it
> > used to be:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
> >
> > Dan's patch is not in "patch available" state, is that one of the reasons
> > why it's not being moved forward?
> >
> > Patrick
> >
> >
> > > -Jordan
> > >
> >
>
>
>
> --
> Cheers
> Michael.
>


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-16 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129317#comment-16129317
 ] 

Jordan Zimmerman commented on ZOOKEEPER-1416:
-

Regarding the performance numbers above... They should be balanced by the 
enormous effort Curator's TreeCache class goes through to emulate 
Persistent/Recursive watches (which is essentially what it does). I argue that 
this change will be much more performant and efficient than what TreeCache is 
doing now.

> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129304#comment-16129304
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
---

Github user Randgalt commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Per 1. I posted some performance numbers in the issue. There's a definite 
hit but it's worth it in my view. We should discuss this.

Per 2. What this PR is aimed at is users of Curator's TreeCache - one of 
the most widely used "recipes" in the library. Many users want to know 
everything that happens to a tree of ZNodes. With the current APIs this is 
extraordinarily difficult (thus the complexity of the TreeCache code) and 
inefficient. You must set 2 watches for every single node in the tree (data and 
children) and then work very hard to keep those watches set as they trigger, 
through network issues, etc.

Per 3. This PR does not guarantee that you will see all events. I'll double 
check the doc to make sure that that's clear. These watches behave exactly as 
other watches in ZK other than they don't remove themselves when triggered.


> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can simply 
> fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch

2017-08-16 Thread Randgalt
Github user Randgalt commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Per 1. I posted some performance numbers in the issue. There's a definite 
hit but it's worth it in my view. We should discuss this.

Per 2. What this PR is aimed at is users of Curator's TreeCache - one of 
the most widely used "recipes" in the library. Many users want to know 
everything that happens to a tree of ZNodes. With the current APIs this is 
extraordinarily difficult (thus the complexity of the TreeCache code) and 
inefficient. You must set 2 watches for every single node in the tree (data and 
children) and then work very hard to keep those watches set as they trigger, 
through network issues, etc.

Per 3. This PR does not guarantee that you will see all events. I'll double 
check the doc to make sure that that's clear. These watches behave exactly as 
other watches in ZK other than they don't remove themselves when triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


why is reviewbot no longer -1 on patches with no tests?

2017-08-16 Thread Camille Fournier
It seems like every patch without tests is being marked as a
"documentation" patch but they clearly are not. Who should we ping to look
at this?

C


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129299#comment-16129299
 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
---

Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Questions I have about this from a high level design perspective:
1. As I asked on the mailing list, have we done load/performance testing or 
addressed what that might look like in the design? (Jordan to get back to us on 
that)
2. I'm not sure I understand why persistent watches are both persistent and 
always set for all children of a node. Is it not useful to imagine that I would 
want a persistent watch on some node but not care about its children? Some 
clarification on that choice would be helpful.
3. What does it really mean to guarantee sending of all watch events? What 
are the implications for a disconnected client upon reconnect? How much do we 
expect ZK to potentially be storing in order to be able to fulfill this 
guarantee? Will this potentially cause unbounded memory overhead or lead to 
full GC? Can we realistically bound this guarantee in order to provide the 
other operational guarantees people expect from ZK such as generally 
predictable memory usage based on size of data tree?


> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding read, then the server can 

[GitHub] zookeeper issue #136: [ZOOKEEPER-1416] Persistent Recursive Watch

2017-08-16 Thread skamille
Github user skamille commented on the issue:

https://github.com/apache/zookeeper/pull/136
  
Questions I have about this from a high level design perspective:
1. As I asked on the mailing list, have we done load/performance testing or 
addressed what that might look like in the design? (Jordan to get back to us on 
that)
2. I'm not sure I understand why persistent watches are both persistent and 
always set for all children of a node. Is it not useful to imagine that I would 
want a persistent watch on some node but not care about its children? Some 
clarification on that choice would be helpful.
3. What does it really mean to guarantee sending of all watch events? What 
are the implications for a disconnected client upon reconnect? How much do we 
expect ZK to potentially be storing in order to be able to fulfill this 
guarantee? Will this potentially cause unbounded memory overhead or lead to 
full GC? Can we realistically bound this guarantee in order to provide the 
other operational guarantees people expect from ZK such as generally 
predictable memory usage based on size of data tree?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

2017-08-16 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129128#comment-16129128
 ] 

Jordan Zimmerman commented on ZOOKEEPER-1416:
-

FYI - I did some micro benchmarking with jmh. The test iterates using 
PathParentIterator over a set of paths. The paths are:

{code}
"/a",
"/a/b",
"/a/b/c",
"/a really long path",
"/a really long path/with more than stuff",
"/a really long path/with more than stuff/and more",
"/a really long path/with more than stuff/and more/and more",
"/a really long path/with more than stuff/and more/and more/and more"
{code}

I did a test using {{PathParentIterator.forPathOnly()}} as a baseline and then 
with {{PathParentIterator.forAll()}}. Results:

* forPathOnly - avg 47,627,862 ops/s
* forAll - avg 22,677,073 ops/s

So, that's a significant difference but still 22+ million ops per second seems 
reasonable to me. I'd be curious what others think. FYI - I played around with 
optimizing PathParentIterator but haven't found a way yet to make it faster. 
Maybe we can just document that using Persistent watches can slightly slow 
overall server performance. Or is this a showstopper? In my view, the small 
performance hit is worth the feature. Importantly, the feature is optimized so 
that those that don't want it don't pay the performance penalty.

> Persistent Recursive Watch
> --
>
> Key: ZOOKEEPER-1416
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: c client, documentation, java client, server
>Reporter: Phillip Liu
>Assignee: Jordan Zimmerman
> Attachments: ZOOKEEPER-1416.patch, ZOOKEEPER-1416.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes 
> a Watch event is sent to the client. If there are thousands of znodes being 
> watched, when a client (re)connect, it would have to send thousands of watch 
> requests. At Facebook, we have this problem storing information for thousands 
> of db shards. Consequently a naming service that consumes the db shard 
> definition issues thousands of watch requests each time the service starts 
> and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent 
> means no Watch reset is necessary after a watch-fire. Recursive means the 
> Watch applies to the node and descendant nodes. A Persistent Recursive Watch 
> behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. 
> Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a 
> corresponding getData(..) on the znode is called, then Recursive Watch 
> automically apply the watch on the znode. This maintains the existing Watch 
> semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. 
> Practically this means the Recursive Watch Watcher callback is the one 
> receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no 
> intermediate watch event until data is read will be maintained. The only 
> difference is we will automatically re-add the watch after read. At the same 
> time we add the convience of reducing the need to add multiple watches for 
> sibling znodes and in turn reduce the number of watch messages sent from the 
> client to the server.
> There are some implementation details that needs to be hashed out. Initial 
> thinking is to have the Recursive Watch create per-node watches. This will 
> cause a lot of watches to be created on the server side. Currently, each 
> watch is stored as a single bit in a bit set relative to a session - up to 3 
> bits per client per znode. If there are 100m znodes with 100k clients, each 
> watching all nodes, then this strategy will consume approximately 3.75TB of 
> ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch 
> setting can be set each time a watch event from a Recursive Watch is fired. 
> The memory utilization is relative to the number of outstanding reads and at 
> worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee 
> is required. If the server can send watch events regardless of one has 
> already been fired without corresponding 

Re: Process for reviewing submitted patches?

2017-08-16 Thread Michael Han
We are using github pull request instead of the old patch approach since
last October. So the status of JIRA is irrelevant now (in particular, Patch
Available will not trigger Jenkins pre-commit workflow now.). This was
discussed on dev list when we moved to github, the thread's name is "[VOTE]
move Apache Zookeeper to git".

As for how to identify available patches for review, they should be all
here:
https://github.com/apache/zookeeper/pulls

To get notified for new incoming pull requests:
* Watch our github repo: https://github.com/apache/zookeeper
Or
* Subscribe to dev mailing list. Because we have git -> JIRA hook any new
pull request will get cross posted to JIRA which will then be forwarded to
dev mailing list.

On Wed, Aug 16, 2017 at 10:15 AM, Patrick Hunt  wrote:

> On Wed, Aug 16, 2017 at 9:51 AM, Jordan Zimmerman <
> jor...@jordanzimmerman.com> wrote:
>
> > * Review other people's patch. If you help out, others will be more
> willing
> > to do the same for you. If someone is kind enough to review your code,
> you
> > should return the favor to for someone else.
> >
> >
> > That's fair - I should personally try to do more of this. I'll make an
> > effort here.
> >
> >
> It's not clear to me how we are identifying patches for review today. We
> used to have a very clear process -  a jira needed to be in the "patch
> available" state in order to be considered for commit.
>
> See "contribute" section here, notice that it's watered down from what it
> used to be:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
>
> Dan's patch is not in "patch available" state, is that one of the reasons
> why it's not being moved forward?
>
> Patrick
>
>
> > -Jordan
> >
>



-- 
Cheers
Michael.


Re: Process for reviewing submitted patches?

2017-08-16 Thread Patrick Hunt
On Wed, Aug 16, 2017 at 9:51 AM, Jordan Zimmerman <
jor...@jordanzimmerman.com> wrote:

> * Review other people's patch. If you help out, others will be more willing
> to do the same for you. If someone is kind enough to review your code, you
> should return the favor to for someone else.
>
>
> That's fair - I should personally try to do more of this. I'll make an
> effort here.
>
>
It's not clear to me how we are identifying patches for review today. We
used to have a very clear process -  a jira needed to be in the "patch
available" state in order to be considered for commit.

See "contribute" section here, notice that it's watered down from what it
used to be:
https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute

Dan's patch is not in "patch available" state, is that one of the reasons
why it's not being moved forward?

Patrick


> -Jordan
>


Re: Process for reviewing submitted patches?

2017-08-16 Thread Jordan Zimmerman
> * Review other people's patch. If you help out, others will be more willing
> to do the same for you. If someone is kind enough to review your code, you
> should return the favor to for someone else.

That's fair - I should personally try to do more of this. I'll make an effort 
here.

-Jordan

smime.p7s
Description: S/MIME cryptographic signature


Re: Process for reviewing submitted patches?

2017-08-16 Thread Jordan Zimmerman
I have to agree with your sentiments. I don't want to overstate it - I'm 
involved with several OSS projects myself - but it does seem that ZooKeeper 
needs either more committers or more engagement from the existing committers. 
It's been very difficult to get traction on issues recently. I've had to be a 
pest to get responses. To be fair, if you keep at it eventually there is a 
response but I think it should be easier. To be clear, I know from personal 
experience how hard this is given that none of us get paid to do this and it's 
usually done in our spare time.

-Jordan

> On Aug 16, 2017, at 5:30 PM, Dan Benediktson 
>  wrote:
> 
> Hi there,
> 
>  Does the Zookeeper project have any formal process for ensuring submitted
> patches get reviewed and subsequently committed?
> 
>  About a week ago I again submitted a patch for
> https://issues.apache.org/jira/browse/ZOOKEEPER-2471. This is something
> like the third time I've submitted a patch to Apache Zookeeper over the
> past year, and none of them has ever been reviewed. While they have all
> fixed real bugs we've seen in production while running Zookeeper, I have
> never urgently needed them to be committed because we maintain a fork where
> we have already taken the bug fixes we need, so I have been inclined to not
> make a nuisance of myself and let the Zookeeper PMC decide the best course
> of action, but this is honestly somewhat frustrating. I would much rather
> run Apache Zookeeper than run a private fork of it, but given the
> experience so far in pushing our patches upstream and the sheer number and
> scope of patches we have, this is a pretty daunting thought right now.
> 
>  I realize this is a volunteer operation and that we all have day jobs,
> but I feel like this situation needs some improvement. Would it be possible
> for the committers to set up some sort of regular review cadence and
> provide some sort of loose expected SLA for reviewing, and assuming review
> is approved, subsequently committing, submitted patches? To be clear, I
> don't want to push a lot of work or strict timelines or anything: like I
> said, I realize this is a volunteer project and that we're all quite busy.
> But if we could even get something like a 1-month intended SLA for
> reviewing a submitted patch, and then a 1-month intended SLA for committing
> after a patch was accepted in review, I think it would be hugely beneficial
> for contributors.
> 
> Thanks,
> Dan



smime.p7s
Description: S/MIME cryptographic signature


Re: Process for reviewing submitted patches?

2017-08-16 Thread Michael Han
Thanks for bringing this issue up. I think it's an important issue for the
ZooKeeper community.

The fundamental issue here is that we don't have enough active code
reviewers and committers, which limits the throughput of the code reviews,
since a patch has to be reviewed and approved by at least one committer to
land. With this constraint the SLA is likely not going to work, unless we
grow the community by increasing code reviewers and committers.

To improve the current situation, my thoughts are:
* Any developers here should participate code reviews as both reviewer
and reviewee. You don't need be a committer to do code reviews.
* Review other people's patch. If you help out, others will be more willing
to do the same for you. If someone is kind enough to review your code, you
should return the favor to for someone else.
* Ping the dev list on your patch. If it's urgent, provide reasons on why
and then ping dev list every couple of days. If it's not urgent, ping dev
list every one or two weeks.
* Ping individual developers directly and / or privately for escalation.
It's less likely such ping will be ignored.

On growing new committer side, PMCs are actively working on bringing new
blood who demonstrates passion and effort on helping out patch reviews,
among other contributions.


On Wed, Aug 16, 2017 at 8:30 AM, Dan Benediktson  wrote:

> Hi there,
>
>   Does the Zookeeper project have any formal process for ensuring submitted
> patches get reviewed and subsequently committed?
>
>   About a week ago I again submitted a patch for
> https://issues.apache.org/jira/browse/ZOOKEEPER-2471. This is something
> like the third time I've submitted a patch to Apache Zookeeper over the
> past year, and none of them has ever been reviewed. While they have all
> fixed real bugs we've seen in production while running Zookeeper, I have
> never urgently needed them to be committed because we maintain a fork where
> we have already taken the bug fixes we need, so I have been inclined to not
> make a nuisance of myself and let the Zookeeper PMC decide the best course
> of action, but this is honestly somewhat frustrating. I would much rather
> run Apache Zookeeper than run a private fork of it, but given the
> experience so far in pushing our patches upstream and the sheer number and
> scope of patches we have, this is a pretty daunting thought right now.
>
>   I realize this is a volunteer operation and that we all have day jobs,
> but I feel like this situation needs some improvement. Would it be possible
> for the committers to set up some sort of regular review cadence and
> provide some sort of loose expected SLA for reviewing, and assuming review
> is approved, subsequently committing, submitted patches? To be clear, I
> don't want to push a lot of work or strict timelines or anything: like I
> said, I realize this is a volunteer project and that we're all quite busy.
> But if we could even get something like a 1-month intended SLA for
> reviewing a submitted patch, and then a 1-month intended SLA for committing
> after a patch was accepted in review, I think it would be hugely beneficial
> for contributors.
>
> Thanks,
> Dan
>



-- 
Cheers
Michael.


Re: ZOOKEEPER-1416

2017-08-16 Thread Jordan Zimmerman
Yeah - that's a fair question. To be honest, I should have done it and I will. 
I'll run PathParentIterator through the java benchmark tool and report back in 
the Issue.

-Jordan

> On Aug 16, 2017, at 2:29 PM, Camille Fournier  wrote:
> 
> A question on this as I begin to look at it:
> Have you done any performance testing of the feature, or written anything
> about what you think the performance considerations might be?
> 
> Thanks,
> C
> 
> On Thu, Aug 10, 2017 at 8:58 PM, Jordan Zimmerman <
> jor...@jordanzimmerman.com> wrote:
> 
>> I really appreciate it
>> 
>>> On Aug 10, 2017, at 7:56 PM, Camille Fournier 
>> wrote:
>>> 
>>> I plan on looking at this soon (within a week) if no one else gets to it.
>>> 
>>> On Thu, Aug 10, 2017 at 1:44 PM, Jordan Zimmerman <
>>> jordan.zimmer...@elastic.co> wrote:
>>> 
 Friendly request for more movement on ZOOKEEPER-1416 - it's now been
 reviewed by multiple people and even has a backport to 3.5.x
 (ZOOKEEPER-2871). This feature will be a huge boon to large Zookeeper
>> users
 and make the project more competitive with etcd et al.
 
 -Jordan
>> 
>> 



smime.p7s
Description: S/MIME cryptographic signature


Process for reviewing submitted patches?

2017-08-16 Thread Dan Benediktson
Hi there,

  Does the Zookeeper project have any formal process for ensuring submitted
patches get reviewed and subsequently committed?

  About a week ago I again submitted a patch for
https://issues.apache.org/jira/browse/ZOOKEEPER-2471. This is something
like the third time I've submitted a patch to Apache Zookeeper over the
past year, and none of them has ever been reviewed. While they have all
fixed real bugs we've seen in production while running Zookeeper, I have
never urgently needed them to be committed because we maintain a fork where
we have already taken the bug fixes we need, so I have been inclined to not
make a nuisance of myself and let the Zookeeper PMC decide the best course
of action, but this is honestly somewhat frustrating. I would much rather
run Apache Zookeeper than run a private fork of it, but given the
experience so far in pushing our patches upstream and the sheer number and
scope of patches we have, this is a pretty daunting thought right now.

  I realize this is a volunteer operation and that we all have day jobs,
but I feel like this situation needs some improvement. Would it be possible
for the committers to set up some sort of regular review cadence and
provide some sort of loose expected SLA for reviewing, and assuming review
is approved, subsequently committing, submitted patches? To be clear, I
don't want to push a lot of work or strict timelines or anything: like I
said, I realize this is a volunteer project and that we're all quite busy.
But if we could even get something like a 1-month intended SLA for
reviewing a submitted patch, and then a 1-month intended SLA for committing
after a patch was accepted in review, I think it would be hugely beneficial
for contributors.

Thanks,
Dan


Re: ZOOKEEPER-1416

2017-08-16 Thread Camille Fournier
A question on this as I begin to look at it:
Have you done any performance testing of the feature, or written anything
about what you think the performance considerations might be?

Thanks,
C

On Thu, Aug 10, 2017 at 8:58 PM, Jordan Zimmerman <
jor...@jordanzimmerman.com> wrote:

> I really appreciate it
>
> > On Aug 10, 2017, at 7:56 PM, Camille Fournier 
> wrote:
> >
> > I plan on looking at this soon (within a week) if no one else gets to it.
> >
> > On Thu, Aug 10, 2017 at 1:44 PM, Jordan Zimmerman <
> > jordan.zimmer...@elastic.co> wrote:
> >
> >> Friendly request for more movement on ZOOKEEPER-1416 - it's now been
> >> reviewed by multiple people and even has a backport to 3.5.x
> >> (ZOOKEEPER-2871). This feature will be a huge boon to large Zookeeper
> users
> >> and make the project more competitive with etcd et al.
> >>
> >> -Jordan
>
>


ZooKeeper_branch35_jdk8 - Build # 637 - Failure

2017-08-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/637/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 68.76 MB...]
[junit] 2017-08-16 12:15:30,883 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30031ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-08-16 12:15:32,579 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:27386. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:15:32,580 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:39974, server: 
127.0.0.1/127.0.0.1:27386
[junit] 2017-08-16 12:15:32,580 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:27386:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:39974
[junit] 2017-08-16 12:15:53,485 [myid:127.0.0.1:27386] - WARN  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30014ms for sessionid 0x20730bbf60c
[junit] 2017-08-16 12:15:53,486 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30014ms for sessionid 
0x20730bbf60c, closing socket connection and attempting reconnect
[junit] 2017-08-16 12:15:55,163 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:27386. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:15:55,165 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:27386:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:40432
[junit] 2017-08-16 12:15:55,167 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:40432, server: 
127.0.0.1/127.0.0.1:27386
[junit] 2017-08-16 12:16:02,606 [myid:127.0.0.1:27386] - WARN  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30027ms for sessionid 0x0
[junit] 2017-08-16 12:16:02,607 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30027ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-08-16 12:16:04,023 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:27386. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:16:04,024 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:40608, server: 
127.0.0.1/127.0.0.1:27386
[junit] 2017-08-16 12:16:04,024 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:27386:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:40608
[junit] 2017-08-16 12:16:25,196 [myid:127.0.0.1:27386] - WARN  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30030ms for sessionid 0x20730bbf60c
[junit] 2017-08-16 12:16:25,198 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30030ms for sessionid 
0x20730bbf60c, closing socket connection and attempting reconnect
[junit] 2017-08-16 12:16:26,337 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:27386. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:16:26,339 [myid:127.0.0.1:27386] - INFO  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:41070, server: 
127.0.0.1/127.0.0.1:27386
[junit] 2017-08-16 12:16:26,340 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:27386:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:41070
[junit] 2017-08-16 12:16:34,054 [myid:127.0.0.1:27386] - WARN  
[main-SendThread(127.0.0.1:27386):ClientCnxn$SendThread@1181] - Client 

ZooKeeper-trunk-jdk8 - Build # 1165 - Still Failing

2017-08-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-jdk8/1165/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.71 MB...]
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-08-16 12:00:50,527 [myid:127.0.0.1:19304] - INFO  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19304. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:00:50,528 [myid:127.0.0.1:19304] - WARN  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1235] - Session 
0x105644ecb480001 for server 127.0.0.1/127.0.0.1:19304, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-08-16 12:00:50,835 [myid:127.0.0.1:19301] - INFO  
[main-SendThread(127.0.0.1:19301):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19301. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:00:50,836 [myid:127.0.0.1:19301] - WARN  
[main-SendThread(127.0.0.1:19301):ClientCnxn$SendThread@1235] - Session 
0x5644ec8d9 for server 127.0.0.1/127.0.0.1:19301, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-08-16 12:00:52,049 [myid:127.0.0.1:19304] - INFO  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19304. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:00:52,050 [myid:127.0.0.1:19304] - WARN  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1235] - Session 
0x105644ecb480001 for server 127.0.0.1/127.0.0.1:19304, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-08-16 12:00:52,123 [myid:127.0.0.1:19301] - INFO  
[main-SendThread(127.0.0.1:19301):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19301. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:00:52,123 [myid:127.0.0.1:19301] - WARN  
[main-SendThread(127.0.0.1:19301):ClientCnxn$SendThread@1235] - Session 
0x5644ec8d9 for server 127.0.0.1/127.0.0.1:19301, unexpected error, closing 
socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-08-16 12:00:52,321 [myid:127.0.0.1:19304] - INFO  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19304. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 12:00:52,321 [myid:127.0.0.1:19304] - WARN  
[main-SendThread(127.0.0.1:19304):ClientCnxn$SendThread@1235] - Session 
0x105644ecb48 for server 127.0.0.1/127.0.0.1:19304, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 

ZooKeeper_branch35_jdk7 - Build # 1078 - Failure

2017-08-16 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/1078/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 67.00 MB...]
[junit] 2017-08-16 08:52:39,498 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30029ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-08-16 08:52:40,722 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:22000. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 08:52:40,722 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:22000:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:42274
[junit] 2017-08-16 08:52:40,723 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:42274, server: 
127.0.0.1/127.0.0.1:22000
[junit] 2017-08-16 08:52:59,337 [myid:127.0.0.1:22000] - WARN  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30028ms for sessionid 0x201683bc93a
[junit] 2017-08-16 08:52:59,338 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30028ms for sessionid 
0x201683bc93a, closing socket connection and attempting reconnect
[junit] 2017-08-16 08:53:00,594 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:22000. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 08:53:00,594 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:42284, server: 
127.0.0.1/127.0.0.1:22000
[junit] 2017-08-16 08:53:00,594 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:22000:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:42284
[junit] 2017-08-16 08:53:10,747 [myid:127.0.0.1:22000] - WARN  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30025ms for sessionid 0x0
[junit] 2017-08-16 08:53:10,748 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30025ms for sessionid 0x0, closing 
socket connection and attempting reconnect
[junit] 2017-08-16 08:53:12,123 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:22000. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 08:53:12,124 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:22000:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:42290
[junit] 2017-08-16 08:53:12,125 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:42290, server: 
127.0.0.1/127.0.0.1:22000
[junit] 2017-08-16 08:53:30,617 [myid:127.0.0.1:22000] - WARN  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1181] - Client session 
timed out, have not heard from server in 30023ms for sessionid 0x201683bc93a
[junit] 2017-08-16 08:53:30,618 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1229] - Client session 
timed out, have not heard from server in 30023ms for sessionid 
0x201683bc93a, closing socket connection and attempting reconnect
[junit] 2017-08-16 08:53:32,605 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:22000. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-08-16 08:53:32,605 [myid:127.0.0.1:22000] - INFO  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@946] - Socket 
connection established, initiating session, client: /127.0.0.1:42302, server: 
127.0.0.1/127.0.0.1:22000
[junit] 2017-08-16 08:53:32,605 [myid:2] - INFO  
[NIOServerCxnFactory.AcceptThread:localhost/127.0.0.1:22000:NIOServerCnxnFactory$AcceptThread@296]
 - Accepted socket connection from /127.0.0.1:42302
[junit] 2017-08-16 08:53:42,147 [myid:127.0.0.1:22000] - WARN  
[main-SendThread(127.0.0.1:22000):ClientCnxn$SendThread@1181] - Client