[jira] [Updated] (ZOOKEEPER-2614) Port ZOOKEEPER-1576 to branch3.4

2017-07-25 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-2614:
---
Fix Version/s: (was: 3.4.9)
   3.4.11

> Port ZOOKEEPER-1576 to branch3.4
> 
>
> Key: ZOOKEEPER-2614
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2614
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.9
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
> Fix For: 3.4.11
>
> Attachments: ZOOKEEPER-2614.branch-3.4.00.patch
>
>
> ZOOKEEPER-1576 handles UnknownHostException and it good to have this change 
> for 3.4 branch as well. Porting the changes to 3.4 after resolving the 
> conflicts



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2849) Quorum port binding needs exponential back-off retry

2017-07-25 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101185#comment-16101185
 ] 

Michael Han commented on ZOOKEEPER-2849:


+1 on the idea. I think this will be a good improvement on the resilience of 
cloud deployment. [~brian.linin...@gmail.com]: are you interested in 
contributing a patch for this?

> Quorum port binding needs exponential back-off retry
> 
>
> Key: ZOOKEEPER-2849
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2849
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Affects Versions: 3.4.6, 3.5.3
>Reporter: Brian Lininger
>Priority: Minor
>
> Recently we upgraded the AWS instance type we use for running out ZooKeeper 
> nodes, and by doing so we're intermittently hitting an issue where ZooKeeper 
> cannot bind to the server election port because the IP is incorrect.  This is 
> due to name resolution in Route53 not being in sync when ZooKeeper starts on 
> the more powerful EC2 instances.  Currently in QuorumCnxManager.Listener, we 
> only attempt to bind 3 times with a 1s sleep between retries, which is not 
> long enough.  
> I'm proposing to change this to follow an exponential back-off type strategy 
> where each failed attempt causes a longer sleep between retry attempts.  This 
> would allow for Zookeeper to gracefully recover when the host is 
> misconfigured, and subsequently corrected, without requiring the process to 
> be restarted while also minimizing the impact to the running instance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException

2017-07-25 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101183#comment-16101183
 ] 

Michael Han commented on ZOOKEEPER-2856:


[~panyuxuan] Thanks for the patch. It's a good improvement. 
Instead of uploading patch you can file a pull request instead. That is our 
recommended approach for new contribution. The benefit is you will get your 
github contribution credit when the patch is merged. Please refer 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute for more 
details. For this case since the change is trivial I can commit your patch 
directly, it's up to you.

For the patch itself, can you use parameterized logging instead?

> ZooKeeperSaslClient#respondToServer should log exception message of 
> SaslException
> -
>
> Key: ZOOKEEPER-2856
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
>Priority: Minor
> Attachments: ZOOKEEPER-2856-1.patch
>
>
> When upstream like HBase call ZooKeeperSaslClient with security enabled, we 
> sometimes get error in HBase logs like:
> {noformat}
> SASL authentication failed using login context 'Client'.
> {noformat}
> This error occures when getting SaslException in 
> ZooKeeperSaslClient#respondToServer :
> {noformat}
>  catch (SaslException e) {
> LOG.error("SASL authentication failed using login context '" +
> this.getLoginContext() + "'.");
> saslState = SaslState.FAILED;
> gotLastPacket = true;
>   }
> {noformat}
> This error makes user confused without explicit exception message. So I think 
> we can add exception message to the log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException

2017-07-25 Thread Michael Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-2856:
--

Assignee: Pan Yuxuan

> ZooKeeperSaslClient#respondToServer should log exception message of 
> SaslException
> -
>
> Key: ZOOKEEPER-2856
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Pan Yuxuan
>Assignee: Pan Yuxuan
>Priority: Minor
> Attachments: ZOOKEEPER-2856-1.patch
>
>
> When upstream like HBase call ZooKeeperSaslClient with security enabled, we 
> sometimes get error in HBase logs like:
> {noformat}
> SASL authentication failed using login context 'Client'.
> {noformat}
> This error occures when getting SaslException in 
> ZooKeeperSaslClient#respondToServer :
> {noformat}
>  catch (SaslException e) {
> LOG.error("SASL authentication failed using login context '" +
> this.getLoginContext() + "'.");
> saslState = SaslState.FAILED;
> gotLastPacket = true;
>   }
> {noformat}
> This error makes user confused without explicit exception message. So I think 
> we can add exception message to the log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101161#comment-16101161
 ] 

Ted Dunning commented on ZOOKEEPER-2770:



Btw I note that there is no metering on this logging.

That raise an obligatory question. Is there a plausible circumstance where 
thousands of nearly identical messages might be logged?



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101158#comment-16101158
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


{quote}
With that said, is 300 ms a good value or even less is better?
{quote}

I would suggest that getting a real time varying histogram is the right answer. 
I suggested that early on for just this kind of reason.



> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Karan Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101114#comment-16101114
 ] 

Karan Mehta commented on ZOOKEEPER-2770:


bq. Operations over 100ms should be vanishingly rare, but I wouldn't leap up to 
find out what is happening. I would be fairly unhappy, though, and would start 
checking.
Let's take this as a motivation. :) 
With that said, is 300 ms a good value or even less is better?

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk - Build # 3476 - Still Failing

2017-07-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/3476/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 64.61 MB...]
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-25 23:26:04,269 [myid:] - INFO  [ProcessThread(sid:0 
cport:19547)::PrepRequestProcessor@614] - Processed session termination for 
sessionid: 0x10068c16e94
[junit] 2017-07-25 23:26:04,270 [myid:] - INFO  
[SyncThread:0:MBeanRegistry@128] - Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=Connections,name2=127.0.0.1,name3=0x10068c16e94]
[junit] 2017-07-25 23:26:04,270 [myid:] - INFO  [main:ZooKeeper@1329] - 
Session: 0x10068c16e94 closed
[junit] 2017-07-25 23:26:04,270 [myid:] - INFO  
[main-EventThread:ClientCnxn$EventThread@513] - EventThread shut down for 
session: 0x10068c16e94
[junit] 2017-07-25 23:26:04,270 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@82] - Memory used 152360
[junit] 2017-07-25 23:26:04,270 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@87] - Number of threads 1648
[junit] 2017-07-25 23:26:04,271 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@102] - FINISHED TEST METHOD 
testWatcherAutoResetWithLocal
[junit] 2017-07-25 23:26:04,271 [myid:] - INFO  [main:ClientBase@601] - 
tearDown starting
[junit] 2017-07-25 23:26:04,271 [myid:] - INFO  [main:ClientBase@571] - 
STOPPING server
[junit] 2017-07-25 23:26:04,271 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:19547
[junit] 2017-07-25 23:26:04,277 [myid:] - INFO  [main:ZooKeeperServer@541] 
- shutting down
[junit] 2017-07-25 23:26:04,277 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-07-25 23:26:04,277 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-07-25 23:26:04,279 [myid:] - INFO  
[main:PrepRequestProcessor@1008] - Shutting down
[junit] 2017-07-25 23:26:04,279 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-07-25 23:26:04,280 [myid:] - INFO  [ProcessThread(sid:0 
cport:19547)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-07-25 23:26:04,280 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-07-25 23:26:04,280 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-07-25 23:26:04,280 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port19547,name1=InMemoryDataTree]
[junit] 2017-07-25 23:26:04,280 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port19547]
[junit] 2017-07-25 23:26:04,281 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 19547
[junit] 2017-07-25 23:26:04,281 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-07-25 23:26:04,287 [myid:] - INFO  [main:ClientBase@626] - 
fdcount after test is: 4837 at start it was 4837
[junit] 2017-07-25 23:26:04,288 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-07-25 23:26:04,288 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
403.914 sec, Thread: 4, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] 2017-07-25 23:26:04,541 [myid:127.0.0.1:19430] - INFO  
[main-SendThread(127.0.0.1:19430):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:19430. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-25 23:26:04,542 [myid:127.0.0.1:19430] - WARN  
[main-SendThread(127.0.0.1:19430):ClientCnxn$SendThread@1235] - Session 
0x30068be339e for server 127.0.0.1/127.0.0.1:19430, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1338: The 
following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1219: The 
following error 

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100945#comment-16100945
 ] 

Ted Dunning commented on ZOOKEEPER-2770:



On second thought, I could imagine that startup transients could cause a long 
operation. Once you have your quorum in a groove, however, >1 second is very 
bad, especially if you don't have something like a quorum leader change 
happening.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100942#comment-16100942
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


To put some color on Camille's surprise, I would consider any operation over a 
second to be indicative of gross failure in the quorum. Operations over 100ms 
should be vanishingly rare, but I wouldn't leap up to find out what is 
happening. I would be fairly unhappy, though, and would start checking.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100940#comment-16100940
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user karanmehta93 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129450258
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

To be frank, I am newbie and haven't debugged this in detail. This value is 
purely seen based on the 'stat' command on our test cluster. @apurtell might be 
able to tell more practical values.

@skamille I would prefer turning this on by default, although the default 
value needs to be discussed. In my understanding, this helps in situations when 
we see timeouts at application level, such a log can might help narrow down 
towards the cause.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-07-25 Thread karanmehta93
Github user karanmehta93 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129450258
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

To be frank, I am newbie and haven't debugged this in detail. This value is 
purely seen based on the 'stat' command on our test cluster. @apurtell might be 
able to tell more practical values.

@skamille I would prefer turning this on by default, although the default 
value needs to be discussed. In my understanding, this helps in situations when 
we see timeouts at application level, such a log can might help narrow down 
towards the cause.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


ZooKeeper_branch34_jdk8 - Build # 1073 - Failure

2017-07-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk8/1073/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 8.11 KB...]
 > git config remote.origin.url git://git.apache.org/zookeeper.git # timeout=10
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
No valid HEAD. Skipping the resetting
 > git clean -fdx # timeout=10
Fetching upstream changes from git://git.apache.org/zookeeper.git
 > git --version # timeout=10
 > git fetch --tags --progress git://git.apache.org/zookeeper.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
git://git.apache.org/zookeeper.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:812)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1079)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1110)
at hudson.scm.SCM.checkout(SCM.java:495)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1276)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:560)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:485)
at hudson.model.Run.execute(Run.java:1735)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:405)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress git://git.apache.org/zookeeper.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: fatal: unable to connect to git.apache.org:
git.apache.org: Temporary failure in name resolution


at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1903)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1622)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:71)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:348)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:153)
at hudson.remoting.UserRequest.perform(UserRequest.java:50)
at hudson.remoting.Request$2.run(Request.java:336)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
at ..remote call to cassandra11(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:253)
at hudson.remoting.Channel.call(Channel.java:830)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor864.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy104.execute(Unknown Source)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:810)
... 11 more
ERROR: Error fetching remote repo 'origin'
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100906#comment-16100906
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129444784
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

You've seen 2.3 seconds latency within the ZK quorum operations? That seems 
worthy of posting to the mailing list along with some information about what 
was happening and why.
I think it sounds like @hanm wants to turn this off by default, which makes 
this moot, and I'm supportive of that, so I'll let him make the call.


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-07-25 Thread skamille
Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129444784
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

You've seen 2.3 seconds latency within the ZK quorum operations? That seems 
worthy of posting to the mailing list along with some information about what 
was happening and why.
I think it sounds like @hanm wants to turn this off by default, which makes 
this moot, and I'm supportive of that, so I'll let him make the call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100881#comment-16100881
 ] 

Michael Han commented on ZOOKEEPER-2770:


For hardcode I meant the default value of "requestWarnThresholdMs" baked in 
code.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100852#comment-16100852
 ] 

Andrew Purtell commented on ZOOKEEPER-2770:
---

>From the original patch the warning threshold has been configurable. Calling 
>it 'hardcoded' isn't correct. Maybe you meant a simple threshold only? That's 
>true. It's better than nothing. FWIW I also like Ted's suggestion as a 
>followup, and in fact would like to carry that over to HBase if it works out 
>well here.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Michael Han (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100846#comment-16100846
 ] 

Michael Han commented on ZOOKEEPER-2770:


A hardcoded default value in code is unlikely to work for everyone and it is 
possible to have false negatives if the value is too small. I am leaning 
towards have this feature as an opt-in feature with the value has its default 
-1 only and for those who want use it they can tune the parameter for their 
deployment but it has to be enabled explicitly.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100843#comment-16100843
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2829:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
Yes please split - that would be easier to land current patch and I expect 
it will take some discussions to nail down the complete set of new APIs to be 
exposed.


> Interface usability / compatibility improvements through Java annotation.
> -
>
> Key: ZOOKEEPER-2829
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Michael Han
>Assignee: Abraham Fine
>  Labels: annotation
>
> Hadoop has interface classification regarding the interfaces' scope and 
> stability. ZK should do something similar, which not only provides additional 
> benefits of making API compatibility easier between releases (or even 
> commits, by automating the checks via some tooling), but also consistent with 
> rest of Hadoop ecosystem.
> See HADOOP-5073 for more context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...

2017-07-25 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
Yes please split - that would be easier to land current patch and I expect 
it will take some discussions to nail down the complete set of new APIs to be 
exposed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100820#comment-16100820
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user karanmehta93 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129431091
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

Is 2 or 3 seconds reasonable? I have seen 2.3 seconds as max latency 
sometimes, however I don't have much experience. 


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-07-25 Thread karanmehta93
Github user karanmehta93 commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129431091
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

Is 2 or 3 seconds reasonable? I have seen 2.3 seconds as max latency 
sometimes, however I don't have much experience. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...

2017-07-25 Thread afine
Github user afine commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
@hanm I am happy to split it up if you insist. My concern is that just 
adding the annotations to our "normal" java classes does not actually do much 
since it is technically incomplete.  

I thought it would be a good idea to do the javadoc generation change here 
because it provides us a reasonably full proof way of verifying that every 
class that should be labeled public has been labeled public. Otherwise it would 
be rather tedious to make sure that we have labeled all of our classes 
appropriately. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100797#comment-16100797
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2829:
---

Github user afine commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
@hanm I am happy to split it up if you insist. My concern is that just 
adding the annotations to our "normal" java classes does not actually do much 
since it is technically incomplete.  

I thought it would be a good idea to do the javadoc generation change here 
because it provides us a reasonably full proof way of verifying that every 
class that should be labeled public has been labeled public. Otherwise it would 
be rather tedious to make sure that we have labeled all of our classes 
appropriately. 


> Interface usability / compatibility improvements through Java annotation.
> -
>
> Key: ZOOKEEPER-2829
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Michael Han
>Assignee: Abraham Fine
>  Labels: annotation
>
> Hadoop has interface classification regarding the interfaces' scope and 
> stability. ZK should do something similar, which not only provides additional 
> benefits of making API compatibility easier between releases (or even 
> commits, by automating the checks via some tooling), but also consistent with 
> rest of Hadoop ecosystem.
> See HADOOP-5073 for more context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100780#comment-16100780
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2770:
---

Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129424302
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

If we're going to implement this let's at least put some sort of realistic 
threshold. 10s is basically saying "don't enable this feature" is that what we 
want?


> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper pull request #307: ZOOKEEPER-2770 ZooKeeper slow operation log

2017-07-25 Thread skamille
Github user skamille commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/307#discussion_r129424302
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java ---
@@ -61,6 +61,7 @@
 
 private static boolean standaloneEnabled = true;
 private static boolean reconfigEnabled = false;
+private static int requestWarnThresholdMs = 1;
--- End diff --

If we're going to implement this let's at least put some sort of realistic 
threshold. 10s is basically saying "don't enable this feature" is that what we 
want?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (ZOOKEEPER-2829) Interface usability / compatibility improvements through Java annotation.

2017-07-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100737#comment-16100737
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2829:
---

Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
>> The javadoc generated by this patch should be identical to our javadoc 
before with a few extra classes (that I think should have been included before 
anyway).

I suggest we scope this JIRA so it only focuses on the first part:  "The 
javadoc generated by this patch should be identical to our javadoc before".  
The remaining part such as whether or not include jute and other new APIs can 
be discussed on dev list and done in a separate JIRA.


> Interface usability / compatibility improvements through Java annotation.
> -
>
> Key: ZOOKEEPER-2829
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2829
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, server
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Michael Han
>Assignee: Abraham Fine
>  Labels: annotation
>
> Hadoop has interface classification regarding the interfaces' scope and 
> stability. ZK should do something similar, which not only provides additional 
> benefits of making API compatibility easier between releases (or even 
> commits, by automating the checks via some tooling), but also consistent with 
> rest of Hadoop ecosystem.
> See HADOOP-5073 for more context.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] zookeeper issue #316: ZOOKEEPER-2829: Interface usability / compatibility im...

2017-07-25 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/316
  
>> The javadoc generated by this patch should be identical to our javadoc 
before with a few extra classes (that I think should have been included before 
anyway).

I suggest we scope this JIRA so it only focuses on the first part:  "The 
javadoc generated by this patch should be identical to our javadoc before".  
The remaining part such as whether or not include jute and other new APIs can 
be discussed on dev list and done in a separate JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706
 ] 

Andrew Purtell edited comment on ZOOKEEPER-2770 at 7/25/17 8:26 PM:


The originally proposed change is hardly complex. I don't understand that 
aspect of this discussion. Whether or not the metric is useful, on the other 
hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op 
takes longer than a second to complete, and how often that might happen, and on 
what host(s)/quorum it is happening. We have fleet of thousands of servers. We 
have tens of ZooKeeper installations, each on five servers. Hardware does funny 
things from time to time. We'd like to be proactive. 

Edit: More like 160 quorums, I think. 


was (Author: apurtell):
The originally proposed change is hardly complex. I don't understand that 
aspect of this discussion. Whether or not the metric is useful, on the other 
hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op 
takes longer than a second to complete, and how often that might happen, and on 
what host it is happening. We have fleet of thousands of servers. We have tens 
of ZooKeeper installations, each on five servers. Hardware does funny things 
from time to time. We'd like to be proactive. 

Edit: More like 160 quorums, I think. 

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706
 ] 

Andrew Purtell edited comment on ZOOKEEPER-2770 at 7/25/17 8:23 PM:


The originally proposed change is hardly complex. I don't understand that 
aspect of this discussion. Whether or not the metric is useful, on the other 
hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op 
takes longer than a second to complete, and how often that might happen, and on 
what host it is happening. We have fleet of thousands of servers. We have tens 
of ZooKeeper installations, each on five servers. Hardware does funny things 
from time to time. We'd like to be proactive. 

Edit: More like 160 quorums, I think. 


was (Author: apurtell):
The originally proposed change is hardly complex. I don't understand that 
aspect of this discussion. Whether or not the metric is useful, on the other 
hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op 
takes longer than a second to complete, and how often that might happen, and on 
what host it is happening. We have fleet of thousands of servers. We have tens 
of ZooKeeper installations, each on five servers. Hardware does funny things 
from time to time. We'd like to be proactive. 

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100706#comment-16100706
 ] 

Andrew Purtell commented on ZOOKEEPER-2770:
---

The originally proposed change is hardly complex. I don't understand that 
aspect of this discussion. Whether or not the metric is useful, on the other 
hand... ok. That is a matter of opinion. I think we'd like to know if any ZK op 
takes longer than a second to complete, and how often that might happen, and on 
what host it is happening. We have fleet of thousands of servers. We have tens 
of ZooKeeper installations, each on five servers. Hardware does funny things 
from time to time. We'd like to be proactive. 

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


ZooKeeper-trunk-openjdk7 - Build # 1556 - Still Failing

2017-07-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1556/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 64.60 MB...]
[junit] 2017-07-25 20:14:56,652 [myid:] - INFO  [main:ClientBase@601] - 
tearDown starting
[junit] 2017-07-25 20:14:56,652 [myid:] - INFO  [main:ClientBase@571] - 
STOPPING server
[junit] 2017-07-25 20:14:56,652 [myid:] - INFO  
[main:NettyServerCnxnFactory@464] - shutdown called 0.0.0.0/0.0.0.0:11468
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  [main:ZooKeeperServer@541] 
- shutting down
[junit] 2017-07-25 20:14:56,658 [myid:] - ERROR [main:ZooKeeperServer@505] 
- ZKShutdownHandler is not registered, so ZooKeeper server won't take any 
action on ERROR or SHUTDOWN server state changes
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  
[main:SessionTrackerImpl@232] - Shutting down
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  
[main:PrepRequestProcessor@1008] - Shutting down
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  
[main:SyncRequestProcessor@191] - Shutting down
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  
[SyncThread:0:SyncRequestProcessor@169] - SyncRequestProcessor exited!
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  [ProcessThread(sid:0 
cport:11468)::PrepRequestProcessor@155] - PrepRequestProcessor exited loop!
[junit] 2017-07-25 20:14:56,658 [myid:] - INFO  
[main:FinalRequestProcessor@481] - shutdown of request processor complete
[junit] 2017-07-25 20:14:56,659 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean 
[org.apache.ZooKeeperService:name0=StandaloneServer_port11468,name1=InMemoryDataTree]
[junit] 2017-07-25 20:14:56,660 [myid:] - INFO  [main:MBeanRegistry@128] - 
Unregister MBean [org.apache.ZooKeeperService:name0=StandaloneServer_port11468]
[junit] 2017-07-25 20:14:56,660 [myid:] - INFO  
[main:FourLetterWordMain@85] - connecting to 127.0.0.1 11468
[junit] 2017-07-25 20:14:56,661 [myid:] - INFO  [main:JMXEnv@146] - 
ensureOnly:[]
[junit] 2017-07-25 20:14:56,672 [myid:] - INFO  [main:ClientBase@626] - 
fdcount after test is: 7141 at start it was 7141
[junit] 2017-07-25 20:14:56,672 [myid:] - INFO  [main:ZKTestCase$1@68] - 
SUCCEEDED testWatcherAutoResetWithLocal
[junit] 2017-07-25 20:14:56,672 [myid:] - INFO  [main:ZKTestCase$1@63] - 
FINISHED testWatcherAutoResetWithLocal
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
405.159 sec, Thread: 1, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] 2017-07-25 20:14:56,694 [myid:127.0.0.1:11351] - INFO  
[main-SendThread(127.0.0.1:11351):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11351. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-25 20:14:56,694 [myid:127.0.0.1:11351] - WARN  
[main-SendThread(127.0.0.1:11351):ClientCnxn$SendThread@1235] - Session 
0x305a5ff5baa for server 127.0.0.1/127.0.0.1:11351, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-25 20:14:56,716 [myid:127.0.0.1:11222] - INFO  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11222. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-25 20:14:56,717 [myid:127.0.0.1:11222] - WARN  
[main-SendThread(127.0.0.1:11222):ClientCnxn$SendThread@1235] - Session 
0x105a5fc70c2 for server 127.0.0.1/127.0.0.1:11222, unexpected error, 
closing socket connection and attempting reconnect
[junit] java.net.ConnectException: Connection refused
[junit] at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
[junit] at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
[junit] at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:357)
[junit] at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1214)
[junit] 2017-07-25 20:14:56,892 [myid:127.0.0.1:11271] - INFO  
[main-SendThread(127.0.0.1:11271):ClientCnxn$SendThread@1113] - Opening socket 
connection to server 127.0.0.1/127.0.0.1:11271. Will not attempt to 
authenticate using SASL (unknown error)
[junit] 2017-07-25 20:14:56,892 [myid:127.0.0.1:11271] - WARN  
[main-SendThread(127.0.0.1:11271):ClientCnxn$SendThread@1235] - Session 
0x105a5fcbee20001 for server 127.0.0.1/127.0.0.1:11271, unexpected error, 

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100640#comment-16100640
 ] 

Camille Fournier commented on ZOOKEEPER-2770:
-

Are there really 10s long slow requests? It's defaults like this that make me 
skeptical about the usefulness of this particular implementation. If we have a 
request through ZK that takes 10s to process your whole system is completely 
effed. 

I don't think we should add complexity to the code base without suitable 
justification for the value of the new feature. With that in mind, I'd like to 
understand what, specifically, the circumstances we're trying to measure are. 
It looks like processing time for a request through the ZK quorum alone, 
correct? The only network time that might be captured would be, in the case of 
a write, the quorum voting time.

I'm all for making ZK more operable and exposing metrics but I don't think 
exposing low-value metrics is worth the additional code complexity without 
justification.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER-2614 PreCommit Build #3639

2017-07-25 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2614
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3639/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 884 B...]
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision f60928787a908f358a64763f802a6d0371ad4404 
(refs/remotes/origin/master)
Commit message: "ZOOKEEPER-2841: ZooKeeper public include files leak porting 
changes"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f60928787a908f358a64763f802a6d0371ad4404
 > git rev-list f60928787a908f358a64763f802a6d0371ad4404 # timeout=10
No emails were triggered.
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/jenkins7698180655943185224.sh
/home/jenkins/tools/java/latest1.7/bin/java
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386177
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 6
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
Exception in thread "main" java.lang.UnsupportedClassVersionError: 
org/apache/tools/ant/launch/Launcher : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2841
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100603#comment-16100603
 ] 

Ted Dunning commented on ZOOKEEPER-2770:


[~fournc],

I am not so sure that *I* agree with me at this point.

It is fair to say that on occasion there are slow operations in ZK and it would 
be good to know about them. 

This kind of problem is almost always due, in my own vicarious experience,  to 
bad configuration. Often the bad configuration is simply collocation with a 
noisy neighbor on a deficient storage layer.  There might be situations where 
an operation is slow due to the content of the query itself, but I cannot 
imagine what those situations might be.  Writing a large value (but that is 
strictly limited in size), or even doing a huge multi-op (which has the same 
limited size in aggregate) should never take very long.

As such, I would expect that the highest diagnostic value would not be 
something that dumped the contents of slow queries, but rather a capability 
that characterizes the entire distribution of query times. The frequency of 
slow queries is a diagnostic of sorts, but is one that could be inferred from 
the time-varying distributional information I was suggesting.

That said, I don't think that a slow query log is a BAD thing (except a bit bad 
in terms of security if it logs the actual query). And I wouldn't want the BEST 
thing (a distribution log) to stop somebody contributing something.




> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2614) Port ZOOKEEPER-1576 to branch3.4

2017-07-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100569#comment-16100569
 ] 

Thomas Schüttel commented on ZOOKEEPER-2614:


I just tested the patch with my Zookeeper ensemble running in Kubernetes. It 
was works fine now. Previously without the patch, my Kafka-cluster failed as 
soon as one Zookeeper node died even though a healthy ensemble was still 
present.
Please merge this patch and release it.

> Port ZOOKEEPER-1576 to branch3.4
> 
>
> Key: ZOOKEEPER-2614
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2614
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.9
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
> Fix For: 3.4.9
>
> Attachments: ZOOKEEPER-2614.branch-3.4.00.patch
>
>
> ZOOKEEPER-1576 handles UnknownHostException and it good to have this change 
> for 3.4 branch as well. Porting the changes to 3.4 after resolving the 
> conflicts



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100551#comment-16100551
 ] 

Camille Fournier commented on ZOOKEEPER-2770:
-

I completely agree with [~tdunning] I don't understand the motivation for this. 
Are we just timing the internal processing time for the request? ZK is not the 
same type of system as HBase so I'm not sure we are comparing apples to oranges 
trying to cross-implement this feature.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER-2856 PreCommit Build #3638

2017-07-25 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2856
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3638/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 1.25 KB...]
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision f60928787a908f358a64763f802a6d0371ad4404 
(refs/remotes/origin/master)
Commit message: "ZOOKEEPER-2841: ZooKeeper public include files leak porting 
changes"
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f60928787a908f358a64763f802a6d0371ad4404
 > git rev-list f60928787a908f358a64763f802a6d0371ad4404 # timeout=10
No emails were triggered.
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[PreCommit-ZOOKEEPER-Build] $ /bin/bash /tmp/jenkins4918952795639117506.sh
/home/jenkins/tools/java/latest1.7/bin/java
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 386172
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 6
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 10240
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited
Exception in thread "main" java.lang.UnsupportedClassVersionError: 
org/apache/tools/ant/launch/Launcher : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2841
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
No tests ran.

Re: Is there a benchmark performance test to reveal how the disk's iops effect zookeeper's tps/qps?

2017-07-25 Thread Michael Han
For public benchmark:
* ZK has a systest in src/java/systest.
* Checkout https://coreos.com/blog/performance-of-etcd.html. There is a
github link on the benchmark.

If you end up writing your own benchmark, please consider contribute it
back to open source :)

On Mon, Jul 24, 2017 at 10:35 PM, gp  wrote:

> As the document says
> "incorrect placement of transasction log
> The most performance critical part of ZooKeeper is the transaction log.
> ZooKeeper syncs transactions to media before it returns a response. A
> dedicated transaction log device is key to consistent good performance.
> Putting the log on a busy device will adversely effect performance..."
> However, is there a benchmark perf test to reveal how the disk's iops
> effect zookeeper's tps/qps?
>
>
>
>
>
>
>
>
>
>


-- 
Cheers
Michael.


[jira] [Updated] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException

2017-07-25 Thread Pan Yuxuan (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pan Yuxuan updated ZOOKEEPER-2856:
--
Attachment: ZOOKEEPER-2856-1.patch

Attache a simple patch.

> ZooKeeperSaslClient#respondToServer should log exception message of 
> SaslException
> -
>
> Key: ZOOKEEPER-2856
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.4.10, 3.5.3
>Reporter: Pan Yuxuan
>Priority: Minor
> Attachments: ZOOKEEPER-2856-1.patch
>
>
> When upstream like HBase call ZooKeeperSaslClient with security enabled, we 
> sometimes get error in HBase logs like:
> {noformat}
> SASL authentication failed using login context 'Client'.
> {noformat}
> This error occures when getting SaslException in 
> ZooKeeperSaslClient#respondToServer :
> {noformat}
>  catch (SaslException e) {
> LOG.error("SASL authentication failed using login context '" +
> this.getLoginContext() + "'.");
> saslState = SaslState.FAILED;
> gotLastPacket = true;
>   }
> {noformat}
> This error makes user confused without explicit exception message. So I think 
> we can add exception message to the log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2856) ZooKeeperSaslClient#respondToServer should log exception message of SaslException

2017-07-25 Thread Pan Yuxuan (JIRA)
Pan Yuxuan created ZOOKEEPER-2856:
-

 Summary: ZooKeeperSaslClient#respondToServer should log exception 
message of SaslException
 Key: ZOOKEEPER-2856
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2856
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.5.3, 3.4.10
Reporter: Pan Yuxuan
Priority: Minor


When upstream like HBase call ZooKeeperSaslClient with security enabled, we 
sometimes get error in HBase logs like:
{noformat}
SASL authentication failed using login context 'Client'.
{noformat}
This error occures when getting SaslException in 
ZooKeeperSaslClient#respondToServer :
{noformat}
 catch (SaslException e) {
LOG.error("SASL authentication failed using login context '" +
this.getLoginContext() + "'.");
saslState = SaslState.FAILED;
gotLastPacket = true;
  }
{noformat}
This error makes user confused without explicit exception message. So I think 
we can add exception message to the log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2770) ZooKeeper slow operation log

2017-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099714#comment-16099714
 ] 

Hadoop QA commented on ZOOKEEPER-2770:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//console

This message is automatically generated.

> ZooKeeper slow operation log
> 
>
> Key: ZOOKEEPER-2770
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2770
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Assignee: Karan Mehta
> Attachments: ZOOKEEPER-2770.001.patch, ZOOKEEPER-2770.002.patch, 
> ZOOKEEPER-2770.003.patch
>
>
> ZooKeeper is a complex distributed application. There are many reasons why 
> any given read or write operation may become slow: a software bug, a protocol 
> problem, a hardware issue with the commit log(s), a network issue. If the 
> problem is constant it is trivial to come to an understanding of the cause. 
> However in order to diagnose intermittent problems we often don't know where, 
> or when, to begin looking. We need some sort of timestamped indication of the 
> problem. Although ZooKeeper is not a datastore, it does persist data, and can 
> suffer intermittent performance degradation, and should consider implementing 
> a 'slow query' log, a feature very common to services which persist 
> information on behalf of clients which may be sensitive to latency while 
> waiting for confirmation of successful persistence.
> Log the client and request details if the server discovers, when finally 
> processing the request, that the current time minus arrival time of the 
> request is beyond a configured threshold. 
> Look at the HBase {{responseTooSlow}} feature for inspiration. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Failed: ZOOKEEPER- PreCommit Build #900

2017-07-25 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 70.91 MB...]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/900//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] c45b420d4ece535ee9e0f92e46aaf8893d3b7735 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1642:
 exec returned: 2

Total time: 12 minutes 20 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2770
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
4 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync

Error Message:
zoo.cfg.dynamic.next is not deleted.

Stack Trace:
junit.framework.AssertionFailedError: zoo.cfg.dynamic.next is not deleted.
at 
org.apache.zookeeper.server.quorum.ReconfigDuringLeaderSyncTest.testDuringLeaderSync(ReconfigDuringLeaderSyncTest.java:165)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)


FAILED:  org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput(FourLetterWordsTest.java:158)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.lang.Thread.run(Thread.java:745)


FAILED:  org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.zookeeper.test.FourLetterWordsTest.testValidateStatOutput(FourLetterWordsTest.java:158)
at 

Is there a benchmark performance test to reveal how the disk's iops effect zookeeper's tps/qps?

2017-07-25 Thread gp
As the document says
"incorrect placement of transasction log
The most performance critical part of ZooKeeper is the transaction log. 
ZooKeeper syncs transactions to media before it returns a response. A dedicated 
transaction log device is key to consistent good performance. Putting the log 
on a busy device will adversely effect performance..."
However, is there a benchmark perf test to reveal how the disk's iops effect 
zookeeper's tps/qps?