Re: svn commit: r1660498 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

2015-02-18 Thread Rémy Maucherat
2015-02-17 22:52 GMT+01:00 Mark Thomas ma...@apache.org:

 On 17/02/2015 21:02, ma...@apache.org wrote:
  Author: markt
  Date: Tue Feb 17 21:02:09 2015
  New Revision: 1660498
 
  URL: http://svn.apache.org/r1660498
  Log:
  Possible fix for occasional NIO2 CI failures. Without the sync it is
 possible for a write registration to get lost.

 I still see the error but less frequently. So I think this patch is a
 step in the right direction. The logs still indicate that a write
 registration is being lost somewhere so my plan is to continue the code
 review.

 I'm a bit puzzled as to why blocking is related to that. Anyway, last time
this failure occurred with the same symptom, this was caused by
SecureNio2Channel: r1586789. Since this wasn't changed during the
refactoring, I don't think it needs to be suspected this time though.

Following the NPE fix I made, there are signs of double closing of the
socket. I doubt this can be avoided simply with adding a null check.
17-Feb-2015 22:00:20.936 WARNING [https-nio2-127.0.0.1-auto-2-Acceptor-0]
org.apache.tomcat.util.net.AbstractEndpoint.countDownConnection Incorrect
connection count, multiple socket.close called on the same socket.

Rémy


Re: svn commit: r1660498 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

2015-02-18 Thread Mark Thomas
On 18/02/2015 09:19, Rémy Maucherat wrote:
 2015-02-17 22:52 GMT+01:00 Mark Thomas ma...@apache.org:
 
 On 17/02/2015 21:02, ma...@apache.org wrote:
 Author: markt
 Date: Tue Feb 17 21:02:09 2015
 New Revision: 1660498

 URL: http://svn.apache.org/r1660498
 Log:
 Possible fix for occasional NIO2 CI failures. Without the sync it is
 possible for a write registration to get lost.

 I still see the error but less frequently. So I think this patch is a
 step in the right direction. The logs still indicate that a write
 registration is being lost somewhere so my plan is to continue the code
 review.

 I'm a bit puzzled as to why blocking is related to that.

I was too. I'm beginning to think what looked like less frequent
occurrence of the error was just random effects. I'm leaning towards
reverting this patch.

 Anyway, last time
 this failure occurred with the same symptom, this was caused by
 SecureNio2Channel: r1586789. Since this wasn't changed during the
 refactoring, I don't think it needs to be suspected this time though.
 
 Following the NPE fix I made, there are signs of double closing of the
 socket. I doubt this can be avoided simply with adding a null check.
 17-Feb-2015 22:00:20.936 WARNING [https-nio2-127.0.0.1-auto-2-Acceptor-0]
 org.apache.tomcat.util.net.AbstractEndpoint.countDownConnection Incorrect
 connection count, multiple socket.close called on the same socket.

That might be a different issue. I'm not sure.

I'm fairly confident that the problem we are seeing with
TestWebSocketFrameClientSSL is related to a write registration not
happening / getting lost. The symptom is that the server just stops
writing, the client times out after 60s and the test fails.

I found a few places where this might be going wrong but - much like the
commit above - I'm not convinced that the affected code path is used at
all - let alone used in this test. I have a few other ideas about where
it might be going wrong that I want to look at today. If those don't pan
out it will be back to adding debug log statements.

If folks want to follow along then I'll be using this branch in my fork
of the Tomcat trunk git mirror:
https://github.com/markt-asf/tomcat/tree/linux-debug

At the moment that branch is an exact copy of trunk. I am just running
the unit tests in a loop waiting for the first failure.

Mark


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: svn commit: r1660498 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

2015-02-18 Thread Rémy Maucherat
2015-02-18 10:39 GMT+01:00 Mark Thomas ma...@apache.org:

 I'm fairly confident that the problem we are seeing with
 TestWebSocketFrameClientSSL is related to a write registration not
 happening / getting lost. The symptom is that the server just stops
 writing, the client times out after 60s and the test fails.


There's one failure on the last run, but it's on the non SSL variant of the
test, and it doesn't timeout.

Testcase: testConnectToServerEndpoint took 23.317 sec FAILED
expected:10 but was:81754 junit.framework.AssertionFailedError:
expected:10 but was:81754 at
org.apache.tomcat.websocket.TestWebSocketFrameClient.testConnectToServerEndpoint(TestWebSocketFrameClient.java:76)


Rémy


Re: svn commit: r1660498 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

2015-02-18 Thread Mark Thomas
On 18/02/2015 09:47, Rémy Maucherat wrote:
 2015-02-18 10:39 GMT+01:00 Mark Thomas ma...@apache.org:

 I'm fairly confident that the problem we are seeing with
 TestWebSocketFrameClientSSL is related to a write registration not
 happening / getting lost. The symptom is that the server just stops
 writing, the client times out after 60s and the test fails.

 
 There's one failure on the last run, but it's on the non SSL variant of the
 test, and it doesn't timeout.
 
 Testcase: testConnectToServerEndpoint took 23.317 sec FAILED
 expected:10 but was:81754 junit.framework.AssertionFailedError:
 expected:10 but was:81754 at
 org.apache.tomcat.websocket.TestWebSocketFrameClient.testConnectToServerEndpoint(TestWebSocketFrameClient.java:76)

Looks like we have multiple issues to track down then. Right now I'm
having difficulty getting reproducing the problem that results in a timeout.

Mark


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: svn commit: r1660498 - /tomcat/trunk/java/org/apache/tomcat/util/net/Nio2Endpoint.java

2015-02-17 Thread Mark Thomas
On 17/02/2015 21:02, ma...@apache.org wrote:
 Author: markt
 Date: Tue Feb 17 21:02:09 2015
 New Revision: 1660498
 
 URL: http://svn.apache.org/r1660498
 Log:
 Possible fix for occasional NIO2 CI failures. Without the sync it is possible 
 for a write registration to get lost.

I still see the error but less frequently. So I think this patch is a
step in the right direction. The logs still indicate that a write
registration is being lost somewhere so my plan is to continue the code
review.

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org