[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907820#comment-15907820
 ] 

ASF subversion and git services commented on GEODE-1793:


Commit 064362e90e150ad4ef5b269fab78f0cf2d6e5f4f in geode's branch 
refs/heads/develop from [~bschuchardt]
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=064362e ]

GEODE-1793 LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

This was a product issue.  When the locator using plain-text sockets is
contacted by a TcpClient using SSL the locator often just closes the socket.
On some platforms this causes a SSLHandshakeException but on others it
just causes a "SocketException: connection reset".  Writing some text to
the socket forces the TcpClient to get a SSLException (which is the superclass
of SSLHandshakeException).

The test class is still marked as Flaky due to GEODE-2542.


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907800#comment-15907800
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user bschuchardt closed the pull request at:

https://github.com/apache/geode/pull/412


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907799#comment-15907799
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user bschuchardt commented on the issue:

https://github.com/apache/geode/pull/412
  
remote: geode git commit: GEODE_1793 spotless fixes and removal of dead code
remote: geode git commit: GEODE-1793 
LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
To https://git-wip-us.apache.org/repos/asf/geode.git
   c09a856..4112204  develop -> develop



> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895192#comment-15895192
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user bschuchardt commented on a diff in the pull request:

https://github.com/apache/geode/pull/412#discussion_r104259153
  
--- Diff: 
geode-core/src/main/java/org/apache/geode/distributed/internal/tcpserver/TcpServer.java
 ---
@@ -77,9 +77,9 @@
* 
* This should be incremented if the gossip message structures change
* 
-   * 1000 - gemfire 5.5 - using java serialization 1001 - 5.7 - using 
DataSerializable and
--- End diff --

I didn't check but that's my assumption.  It likes the comment with the 
HTML breaks in place.


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895189#comment-15895189
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user galen-pivotal commented on a diff in the pull request:

https://github.com/apache/geode/pull/412#discussion_r104258974
  
--- Diff: 
geode-core/src/main/java/org/apache/geode/distributed/internal/tcpserver/TcpServer.java
 ---
@@ -77,9 +77,9 @@
* 
* This should be incremented if the gossip message structures change
* 
-   * 1000 - gemfire 5.5 - using java serialization 1001 - 5.7 - using 
DataSerializable and
--- End diff --

Did spotless break this?  


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895181#comment-15895181
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user galen-pivotal commented on a diff in the pull request:

https://github.com/apache/geode/pull/412#discussion_r104258182
  
--- Diff: 
geode-core/src/main/java/org/apache/geode/distributed/internal/tcpserver/TcpServer.java
 ---
@@ -360,6 +360,13 @@ private void processRequest(final Socket sock) {
   versionOrdinal = (short) 
GOSSIP_TO_GEMFIRE_VERSION_MAP.get(gossipVersion);
 } else {
   // Close the socket. We can not accept requests from a newer 
version
+  try {
+sock.getOutputStream().write("unknown protocol 
version".getBytes());
--- End diff --

Is there any risk of this being interpreted by anything other than garbage 
on the other side?


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895123#comment-15895123
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

Github user bschuchardt commented on the issue:

https://github.com/apache/geode/pull/412
  
There are "spotless" problems I'm cleaning up, and I'm removing the 
commented out code from the test.


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895108#comment-15895108
 ] 

ASF GitHub Bot commented on GEODE-1793:
---

GitHub user bschuchardt opened a pull request:

https://github.com/apache/geode/pull/412

GEODE-1793 LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOther…

This was a product issue.  When the locator using plain-text sockets is
contacted by a TcpClient using SSL the locator often just closes the socket.
On some platforms this causes a SSLHandshakeException but on others it
just causes a "SocketException: connection reset".  Writing some text to
the socket forces the TcpClient to get a SSLException (which is the 
superclass
of SSLHandshakeException).

The test class is still marked as Flaky due to GEODE-2542.

I deleted one of the tests in LocatorDUnitTest as it wasn't doing any 
useful validation and really served no purpose.

I also increased the joinTimeout in this test.  The original 1-second 
timeout was intended to make the tests run faster but I think it's probably the 
source of some of the flaky-ness in this set of tests.  Some of them were also 
overriding the joinTimeout established by the DUnit framework, so that was 
actually a bad thing to be doing.  The tests all run in a few seconds with the 
default joinTimeout setting anyway.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/geode feature/GEODE-1793

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/geode/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #412


commit 866dc5ca1583c5fab49ec96c48d261c0367427f3
Author: Bruce Schuchardt 
Date:   2017-03-03T21:47:42Z

GEODE-1793 LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

This was a product issue.  When the locator using plain-text sockets is
contacted by a TcpClient using SSL the locator often just closes the socket.
On some platforms this causes a SSLHandshakeException but on others it
just causes a "SocketException: connection reset".  Writing some text to
the socket forces the TcpClient to get a SSLException (which is the 
superclass
of SSLHandshakeException).

The test class is still marked as Flaky due to GEODE-2542.




> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895084#comment-15895084
 ] 

ASF subversion and git services commented on GEODE-1793:


Commit 866dc5ca1583c5fab49ec96c48d261c0367427f3 in geode's branch 
refs/heads/feature/GEODE-1793 from [~bschuchardt]
[ https://git-wip-us.apache.org/repos/asf?p=geode.git;h=866dc5c ]

GEODE-1793 LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

This was a product issue.  When the locator using plain-text sockets is
contacted by a TcpClient using SSL the locator often just closes the socket.
On some platforms this causes a SSLHandshakeException but on others it
just causes a "SocketException: connection reset".  Writing some text to
the socket forces the TcpClient to get a SSLException (which is the superclass
of SSLHandshakeException).

The test class is still marked as Flaky due to GEODE-2542.


> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-02 Thread Bruce Schuchardt (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893271#comment-15893271
 ] 

Bruce Schuchardt commented on GEODE-1793:
-

In another run with modified logging I can see that the SSL locator did not see 
an SSLHandshakeException as expected but merely an IOException

[vm2] [info 2017/03/02 15:40:00.701 PST  
tid=0x13] Peer locator could not recover membership view from 
trout.gemstone.com/10.118.32.92:38254: Connection reset

So this appears to be an inconsistency in SSL implementations used by the JVM.

> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-03-02 Thread Bruce Schuchardt (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15893251#comment-15893251
 ] 

Bruce Schuchardt commented on GEODE-1793:
-

I managed to get this test to fail & see that the second locator, in vm_2, was 
supposed to shut down but instead it created its own cluster:

{noformat}
[vm2] [info 2017/03/02 15:19:56.969 PST  
tid=0x13] Starting membership services

[vm2] [info 2017/03/02 15:19:56.979 PST  
tid=0x13] JGroups channel created (took 9ms)

[vm2] [info 2017/03/02 15:19:56.980 PST  
tid=0x13] GemFire P2P Listener started on /10.118.32.92:39824

[vm2] [info 2017/03/02 15:19:56.980 PST  tid=0x21c] Started failure detection server thread on /10.118.32.92:59150.

[vm2] [info 2017/03/02 15:19:56.980 PST  
tid=0x13] Peer locator is connecting to local membership services with ID 
trout(2830:locator):32771

[vm2] [info 2017/03/02 15:19:57.164 PST  
tid=0x13] This member is becoming the membership coordinator with address 
trout(2830:locator):32771
{noformat}

It was correctly configured with two locator addresses and its SSL 
configuration appeared to be okay.  The other locator, in vm_1, correctly had 
SSL disabled.  The SSL locator was unable to recover from the non-SSL locator:

{noformat}
[vm2] [info 2017/03/02 15:19:56.961 PST  
tid=0x13] GemFire peer location service starting.  Other locators: 
trout.gemstone.com[46843]  Locators preferred as coordinators: false  Network 
partition detection enabled: false  View persistence file: locator0view.dat

[vm2] [info 2017/03/02 15:19:56.961 PST  
tid=0x13] Peer locator attempting to recover from 
trout.gemstone.com/10.118.32.92:46843

[vm2] [info 2017/03/02 15:19:56.963 PST  
tid=0x13] Peer locator was unable to recover state from this locator

[vm2] [info 2017/03/02 15:19:56.963 PST  
tid=0x13] recovery file not found: 
/export/trout1/users/bschuchardt/devel/gfdev/open/geode-core/build/distributedTest/dunit/vm2/locator0view.dat

[vm2] [info 2017/03/02 15:19:56.963 PST  
tid=0x13] Starting distributed system
{noformat}

So this appears to be a problem with Geode, not the test.  It is the Geode 
functionality that is "flaky", working most of the time but not 100%.

> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Assignee: Galen O'Sullivan
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-01-18 Thread Galen O'Sullivan (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828477#comment-15828477
 ] 

Galen O'Sullivan commented on GEODE-1793:
-

Of note: the Locator makes two socket connections, one to get the header, and 
then one to get the rest of the connection. I feel that thi

> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEODE-1793) Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL

2017-01-18 Thread Galen O'Sullivan (JIRA)

[ 
https://issues.apache.org/jira/browse/GEODE-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828472#comment-15828472
 ] 

Galen O'Sullivan commented on GEODE-1793:
-

Sometimes, there is a RMIException thrown to the actual caller as a result of 
the connect failing, rather than it happening asynchronously. I haven't gone 
deep enough into the locator to find the reason for this, but a fix for the 
test is to catch the exception before it propagates as an RMIException.

> Flaky: LocatorDUnitTest.testStartTwoLocatorsOneWithSSLAndTheOtherNonSSL
> ---
>
> Key: GEODE-1793
> URL: https://issues.apache.org/jira/browse/GEODE-1793
> Project: Geode
>  Issue Type: Bug
>  Components: locator
>Reporter: Udo Kohlmeyer
>Priority: Minor
>
> This test fails due to something not cleaning itself properly. Undetermined 
> what the problem is, but it will run perfectly by itself everytime, but once 
> run inside of the TestClass it fails



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)