[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2020-01-07 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009477#comment-17009477
 ] 

Dawid Weiss commented on SOLR-13778:


I think we can consider this issue closed (with a workaround). The folks at 
net-dev consider this platform-specific behavior [1] and I don't think we can 
count on consistent behavior across systems. Chris Hegarty suggested 
(independently) a similar workaround to what I implemented in my patch so I 
think it's fine to leave it in. Perhaps we should also backport it to 8x?

https://mail.openjdk.java.net/pipermail/net-dev/2020-January/thread.html#13469

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Assignee: Dawid Weiss
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2020-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007487#comment-17007487
 ] 

ASF subversion and git services commented on SOLR-13778:


Commit 2b00d633a5805bf75eb594688e9dd0a4255b02be in lucene-solr's branch 
refs/heads/branch_8x from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2b00d63 ]

SOLR-13778: Solrj client will retry requests on SSLException with a suppressed 
SocketException (very likely a hard-closed socket connection)


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2020-01-03 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007475#comment-17007475
 ] 

Dawid Weiss commented on SOLR-13778:


I committed a more generous version of the patch where SSLException with a 
suppressed SocketException would be retried by the client. I don't have any 
better solutions for this at the moment and the current situation with broken 
sockets makes testing on Windows pointless.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2020-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007473#comment-17007473
 ] 

ASF subversion and git services commented on SOLR-13778:


Commit 985af957324b8abcf06dfdfe0eb0fa8f1d4cb40b in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=985af95 ]

SOLR-13778: Solrj client will retry requests on SSLException with a suppressed 
SocketException (very likely a hard-closed socket connection)


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2020-01-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17007472#comment-17007472
 ] 

ASF subversion and git services commented on SOLR-13778:


Commit 985af957324b8abcf06dfdfe0eb0fa8f1d4cb40b in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=985af95 ]

SOLR-13778: Solrj client will retry requests on SSLException with a suppressed 
SocketException (very likely a hard-closed socket connection)


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-29 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004885#comment-17004885
 ] 

Dawid Weiss commented on SOLR-13778:


Ouch. That's makes it even worse than I thought! An additional element of 
difficulty is that the implementation in the JDK is different from JDK13 on and 
the error message is different (and localized). Uwe and I have been in touch 
with Alan Bateman -- I'll also post this issue to OpenJDK's net-dev mailing 
list.

These differences in handling the situation of reading/writing to a closed 
socket only enforce my opinion that this isn't something we should really try 
to fix. If a socket/ node goes down it should be up to a higher application 
layer to retry the request -- not even http client but higher than that. I 
don't think it'll be possible to detect what's retriable and what isn't if the 
socket behavior varies between systems and everything sits on top of an SSL 
layer (which is also changing between different jdk versions).

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-28 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004586#comment-17004586
 ] 

Robert Muir commented on SOLR-13778:


With the attached RecvRepro.java I see the following: (all with jdk13):

Linux:
Exception in thread "main" java.lang.RuntimeException: Unreachable?
at RecvRepro.main(RecvRepro.java:46)

Mac OS X:
Received: java.net.SocketException: Connection reset

FreeBSD:
Received: java.net.SocketException: Connection reset

So besides the linux vs windows difference (where windows gets SocketException: 
recv failed), we see on the BSDs (including mac os X) that you also get a 
SocketException. It is just that the JDK must be handling this one different? I 
ran truss to trace the system calls, just so we are sure:

{noformat}
read(7,0x824416000,100)  ERR#54 'Connection reset by 
peer'
...
errno.h:#define ECONNRESET  54  /* Connection reset by peer */
{noformat}



> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-27 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17004288#comment-17004288
 ] 

Dawid Weiss commented on SOLR-13778:


I attached a suggested patch to SolrHttpRequestRetryHandler to deal with this 
issue. It is pretty selective (Windows-only, SSLException with a hardcoded 
message) and I don't think it should interfere with anything.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, RecvRepro.java, SOLR-13778.patch, 
> dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-25 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17003206#comment-17003206
 ] 

Dawid Weiss commented on SOLR-13778:


Thanks Uwe!

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, 
> logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-23 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17002197#comment-17002197
 ] 

Uwe Schindler commented on SOLR-13778:
--

Hi, about the JDK error handling I opened 
[https://bugs.openjdk.java.net/browse/JDK-8236498] on behalf of [~dweiss]. 
Thanks Dawid!

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, 
> logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-20 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001194#comment-17001194
 ] 

Robert Muir commented on SOLR-13778:


{quote}
Perhaps there are other reasons for an SSLException which should not cause a 
retry... I'm not really sure.
{quote}

Yes, even looking at the class hierarchy in javadocs. For example it make no 
sense to retry on many of the subclasses: handshake exception, if you can't 
agree on ciphers, its not gonna happen. keyexception, unverified host, retry is 
not helpful for those either.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, 
> logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-20 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001156#comment-17001156
 ] 

Dawid Weiss commented on SOLR-13778:


bq. I believe we also see these exceptions when "test solr nodeA" talks to 
"test solr nodeB" (although i suspect you are correct that this is also only 
after "nodeB" has been stoped/started

This will happen in any kind of situation when a hard-closed socket is reused 
from client side. I understand the winsock api description and the example I 
attached reproduces this behavior easily. I'm sure you could extract a simpler 
setup without jetty at all (just with java socket APIs).

bq. Which seems to raise the question: (How) Can we reliably ensure that 
SolrClients get re-instantiated (or have existing connections dropped) if the 
"remote" server is restarted?

Technically it's already done -- the only unhandled situation is SSLException 
that wraps socket exception with recv failed. This could be detected and 
handled like other socket exceptions in solr retry handler.

bq. Could/Should we make SolrHttpRequestRetryHandler close & re-open any 
existing connections (prior to retry) if there was a Socket/SSL Exception?

One thing is that a better way would be to fix this in Jetty. Those socket 
connections should be dropped gracefully then windows wouldn't throw those odd 
exceptions and the SSL layer would handle them better (hopefully). Perhaps 
there are other reasons for an SSLException which should *not* cause a retry... 
I'm not really sure.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, 
> logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-20 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001096#comment-17001096
 ] 

Dawid Weiss commented on SOLR-13778:


I attach a much simpler repro that results in:
{code}
java.net.SocketException: Software caused connection abort: recv failed
{code}
on Windows. This is still stupidly large because it starts Jetty via the jetty 
config builder, etc., but I can't pinpoint which socket/ channel config jetty 
uses which results in hard socket close (which in turn results in 
WSAECONNABORTED).

[~hossman] - the tests pass with SSL disabled because if you disable SSL the 
http client is by default configured to retry failed requests. The "recv 
failed" is still thrown but since it's a simple SocketException the http client 
retries it and proceeds. With SSL enabled the exception is deeply nested in SSL 
stack and rewrapped as an SSLException, resulting in http client not retrying 
it (and finally causing test exceptions).


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, 
> logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-20 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001078#comment-17001078
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

{quote}Going back to failures in Solr tests: I think the reason is that we 
shutdown jetty in the middle of the test but then reuse the same client that 
was previously connected to an existing instance. If it's an SSL connection 
then there may be SSL comms flying around in addition to user messages and if 
they're issued on a closed socket connection they trigger this enigmatic recv 
failed error.

I think the client should be reinstantiated (or at least any existing 
connections dropped) for the tests to work reliably. ...
{quote}
Interesting ... but taking a step back, this isn't just about these tests and 
the "test clients" talking to the "test solr nodes", so we shouldn't just 
re-instantiate all "test clients" right after any call to {{jetty.stop()}} ... 
I believe we also see these exceptions when "test solr nodeA" talks to "test 
solr nodeB" (although i suspect you are correct that this is also only after 
"nodeB" has been stoped/started) ... and IIUC "real users" could see these 
errors on windows as well  (Because this seems like something that could happen 
to any solrj users running (Cloud|Http)SolrClient on a windows box, if it's 
talking to a remote solr node using using SSL that gets restarted.)

Which seems to raise the question: (How) Can we reliably ensure that 
SolrClients get re-instantiated (or have existing connections dropped) if the 
"remote" server is restarted?

Could/Should we make SolrHttpRequestRetryHandler close & re-open any existing 
connections (prior to retry) if there was a Socket/SSL Exception?

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, 
> recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-19 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000434#comment-17000434
 ] 

Dawid Weiss commented on SOLR-13778:


I recompiled the jdk for Windows today to debug at native code level... I am 
not afraid of assassins ;)

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, 
> recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-19 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000418#comment-17000418
 ] 

Dawid Weiss commented on SOLR-13778:


I know what it is. I'll try to reproduce first to showcase why it's happening. 
:)

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, 
> recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-18 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999356#comment-16999356
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

{quote}
The context of these failures is not clear to me. Chris M. Hostetter - could 
you take a look at these logs (grep for "recv failed" and look at a few lines 
of prior context) and tell if it's expected that these sockets are closed? I 
mean – is the server really dropping those connections?
{quote}

I honestly have no idea.  I am in no way adapt/fluent in the low level 
HTTP/connection management stuff in solr -- that's why i was hopping [~shalin], 
[~caomanhdat2], or [~markrmil...@gmail.com] could chime in.

(My level of involvement/knowledge on this issue is really just limited to 
noticing that the pattern existed on windows boxes when doing high level 
analyzing the jenkins failure rates & failure logs, and trying to raise 
awareness of it with people who: a) understand windows, b) understand the 
HTTP/SSL code involved.)

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip, 
> recv-multiple-2019-12-18.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997614#comment-16997614
 ] 

Dawid Weiss commented on SOLR-13778:


The current code in SolrHttpRequestRetryHandler is already sensitive to this 
(retryRequest method):
{code}
if (handleAsIdempotent(clientContext)) {
  log.debug("Retry, request should be idempotent");
  return true;
}
{code}

Unfortunately the SSLException is a showstopper before this check has a chance 
to run because it disables retries for the whole class of exceptions.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997590#comment-16997590
 ] 

Robert Muir commented on SOLR-13778:


Wouldn't retry only partially work around the issue. For example, the retry is 
only going to happen for idempotent HTTP verbs. If this "recv failed" problem 
happens while reading the response to a POST request, what can be done?

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997580#comment-16997580
 ] 

Dawid Weiss commented on SOLR-13778:


Given the complexity of the SSL code I peeked at I don't think I can reliably 
answer this. :) I mean -- check out this method in full glory (and it's not the 
only one!).

https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/381c817fa41d549420b1f3a173d9147aa7a679cd/src/java.base/share/classes/sun/security/ssl/TransportContext.java#L267

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997579#comment-16997579
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

bq. I'm am half-convinced SSLException should be retriable... so many things 
can go wrong if the SSL layer is closed that I think it should be allowed to 
just try to re-establish SSL connection from scratch. 

only partially thought out question/counter-suggestion: would it make sense to 
retry on SSLException if and only if the SSLException wraps another exception 
which is already on the retry-able list?  ... so if the SSLException wraps 
something we wouldn't normally retry w/o using SSL, (or "SSL Broke on it's own" 
SSLException) -- then we don't retry the request.  but if the SSLException 
wraps something we would normally retry if we'd caught it w/o using SSL, then 
we do retry ... ?


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997572#comment-16997572
 ] 

Dawid Weiss commented on SOLR-13778:


I looked at classes involved in this complex stack trace. It's quite a 
nightmare. :) There are various checks and exception-handling routines trying 
to figure out what went wrong and replying exceptions or wrapping them. For 
example:

https://github.com/AdoptOpenJDK/openjdk-jdk11u/blob/381c817fa41d549420b1f3a173d9147aa7a679cd/src/java.base/share/classes/sun/security/ssl/TransportContext.java#L344-L360

{code}
// send fatal alert
//
// If we haven't even started handshaking yet, or we are the recipient
// of a fatal alert, no need to generate a fatal close alert.
if (!recvFatalAlert && !isOutboundClosed() && !isBroken &&
(isNegotiated || handshakeContext != null)) {
try {
outputRecord.encodeAlert(Alert.Level.FATAL.level, alert.id);
} catch (IOException ioe) {
if (SSLLogger.isOn && SSLLogger.isOn("ssl")) {
SSLLogger.warning(
"Fatal: failed to send fatal alert " + alert, ioe);
}

closeReason.addSuppressed(ioe);
}
}
{code}

So it looks like the SSL code is trying to send an alert message over a socket 
that's been closed and fails miserably to do both. And it also looks like it's 
really sensitive to timing and operating system since some of the "socket 
close" handlers are done in object finalizers so they're naturally asynchronous 
to the main code.

I'll try to reproduce this on a smaller piece of code - then it'll be easier to 
tell why this behaved different previously. My guess is that it's probably some 
other refactoring in the JDK that triggered this... 

I'm am half-convinced SSLException should be retriable... so many things can go 
wrong if the SSL layer is closed that I think it should be allowed to just try 
to re-establish SSL connection from scratch. 

But I'll try to provide an example of this happening on a smaller piece of 
code. Maybe we'll have a better understanding of what interaction can lead to 
this.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-16 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997415#comment-16997415
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

bq. I don't think my analysis includes the answers to questions you mentioned – 
I'd be interested in these myself. ... I'll try to take a look but it consumes 
tons of time to debug this stuff and I'm not an expert on SSL which would be 
helpful (diagnostic messages).

Right ... sorry if i wasn't clear before, but to clarify: I wasn't saying i 
expected you to provide those answers, just that based on the info we had ththe 
answers to those questions (which seemed important to making any decisions on 
code changes) weren't clear to me, and i wasn't sure if that's because "we" 
didn't have those answers , or if it's just because i wasn't recognizing them 
in your data.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16996212#comment-16996212
 ] 

Dawid Weiss commented on SOLR-13778:


I don't think my analysis includes the answers to questions you mentioned -- 
I'd be interested in these myself. I said "Neither can I explain the difference 
between the two (seems like a different underlying close operation on 
sockets)." and I suspect this holds true: somehow the socket operations are 
propagated differently in Linux and Windows, resulting in different behavior on 
both systems. As for JDK differences - the code around this evolves from JDK to 
JDK; could be also a different protocol used. 

I'll try to take a look but it consumes tons of time to debug this stuff and 
I'm not an expert on SSL which would be helpful (diagnostic messages).

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-13 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995775#comment-16995775
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

{quote}What do you think - should we just add SSLException to the retry list?
{quote}
I have no idea ... why/what exceptions are considered retry-able isn't 
something i really understand.

All of the logic in SolrHttpRequestRetryHandler appears to have been added new 
(not refactored from anywhere else) by [~markrmil...@gmail.com] in SOLR-8450 - 
but i don't really see a discussion of why that specific list of exceptions was 
chosen.

/cc [~shalin]  & [~caomanhdat]  as well

 

TBH: I've re-read Dawid's analysis twice and I'm still not really understanding:
 * why "windows>=jdk11" is throwing a different (or differently wrapped?) 
exception then "windows Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995664#comment-16995664
 ] 

Dawid Weiss commented on SOLR-13778:


{quote}bq. But here SocketException gets boxed in an SSLException so it doesn't 
happen?

 
{quote}

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-13 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995548#comment-16995548
 ] 

Dawid Weiss commented on SOLR-13778:


[~hossman] What do you think - should we just add SSLException to the retry 
list?

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-13 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995524#comment-16995524
 ] 

Robert Muir commented on SOLR-13778:


{quote}
Adding javax.net.ssl.SSLException to retriable exceptions makes the test pass. 
But I don't know if it's a good fix; I don't know who came up with "retriable" 
classes or why they're there in the first place. Neither can I explain the 
difference between the two (seems like a different underlying close operation 
on sockets).
{quote}

The list comes from http client. Some retry is listed in http/1.1 specs, 
because it reuses one underlying connection for multiple requests and so on 
(e.g. keep-alive can time out). But here SocketException gets boxed in an 
SSLException so it doesn't happen?


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip, logs-2019-12-12-1.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-12 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994611#comment-16994611
 ] 

Dawid Weiss commented on SOLR-13778:


So, this one was fun. Please read if you like good suspense...

The bug reproduces for me on Windows without any problems on newest JDKs 
(openjdk-11+28, openjdk-13+33). 
 I used an older JDK (11.0.4+11) to dump parallel logs from Linux and Windows. 
Linux passes, Windows doesn't.

Disabling TLS1.3 at JVM level doesn't help (although you can see int the logs 
that TLS13 is no longer used in the communication) - see 
"adoptopenjdk-11.0.4+11-no-tls13" folder in the attached zipped logs.

I see a bunch of odd error messages in the log but I don't know the SSL layer 
well enough to tell what they're actually doing and why. The oddest one to me 
is the perceived assymmetry in inbound/ outbound sockets:
{code:java}
  2> javax.net.ssl|ERROR|6B|closeThreadPool-31-thread-2|2019-12-12 10:49:45.234 
WAT|TransportContext.java:312|Fatal (INTERNAL_ERROR): closing inbound before 
receiving peer's close_notify 
{code}
The stack trace of the recv exception contains so many frames that I decided to 
dump full logs (so that apache's httpclient is included).

"adoptopenjdk-11.0.4+11-full-debug" contain these logs (with full debug from 
javax.net and jetty/ httpclient loggers). The
 problem/ difference starts around this moment:
{code:java}
o.a.h.i.e.MainClientExec Executing request GET 
/solr/admin/collections?action=CLUSTERPROP=legacyCloud=false=javabin=2
{code}
While the linux code receives a close_notify at SSL level and then proceeds to 
write to a broken pipe:
{code:java}
   [junit4]   2> 
javax.net.ssl|WARNING|10|TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF]|2019-12-12
 12:44:56.450 WAT|TransportContext.java:245|Warning: failed to send warning 
alert CLOSE_NOTIFY (
   [junit4]   2> "throwable" : {
   [junit4]   2>   java.net.SocketException: Broken pipe (Write failed)
   [junit4]   2>at 
java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
   [junit4]   2>at 
java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
{code}
...and httpclient retries:
{code:java}
   [junit4]   2> 
javax.net.ssl|DEBUG|10|TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF]|2019-12-12
 12:44:56.450 WAT|SSLSocketImpl.java:636|close inbound of SSLSocket
   [junit4]   2> 
javax.net.ssl|DEBUG|10|TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF]|2019-12-12
 12:44:56.450 WAT|SSLSocketImpl.java:473|duplex close of SSLSocket
   [junit4]   2> 
javax.net.ssl|DEBUG|10|TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF]|2019-12-12
 12:44:56.450 WAT|SSLSocketImpl.java:1381|close the SSL connection (passive)
   [junit4]   2> ## 7753 DEBUG 
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.h.i.c.DefaultManagedHttpClientConnection http-outgoing-0: Shutdown 
connection
   [junit4]   2> ## 7753 DEBUG 
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.h.i.e.MainClientExec Connection discarded
   [junit4]   2> ## 7753 DEBUG 
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.h.i.c.PoolingHttpClientConnectionManager Connection released: [id: 
0][route: {s}->https://127.0.0.1:36121][state: class 
org.apache.solr.client.solrj.impl.HttpSolrClient][total kept alive: 0; route 
allocated: 0 of 1; total allocated: 0 of 1]
   [junit4]   2> ## 7753 DEBUG 
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.s.c.s.i.SolrHttpRequestRetryHandler Retry http request 1 out of 1
   [junit4]   2> ## 7753 DEBUG 
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.s.c.s.i.SolrHttpRequestRetryHandler Retry, request should be 
idempotent
   [junit4]   2> ## 7753 INFO  
(TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF])
 [ ] o.a.h.i.e.RetryExec I/O exception 
(org.apache.http.NoHttpResponseException) caught when processing request to 
{s}->https://127.0.0.1:36121: The target server failed to respond
{code}
the Windows counterpart fails on recv (?)...
{code:java}
   [junit4]   2> 
javax.net.ssl|WARNING|11|TEST-LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud-seed#[DEADBEEF]|2019-12-12
 12:35:49.820 WAT|SSLSocketImpl.java:1289|handling exception (
   [junit4]   2> "throwable" : {
   [junit4]   2>   java.net.SocketException: Software caused connection abort: 
recv failed
   [junit4]   2>at 
java.base/java.net.SocketInputStream.socketRead0(Native Method)
   [junit4]   2>at 

[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993896#comment-16993896
 ] 

Dawid Weiss commented on SOLR-13778:


Thanks. I'll try all the suggestions but tomorrow. Will get back to you with 
results.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993859#comment-16993859
 ] 

Robert Muir commented on SOLR-13778:


Jetty blog about TLS 1.3 struggles: 
https://webtide.com/openjdk-11-and-tls-1-3-issues/

It includes snippets of how to disable it in code or jetty.xml, so that users 
don't run into these issues.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993845#comment-16993845
 ] 

Robert Muir commented on SOLR-13778:


There is also related bug (fixed only in 14 it seems), again with TLSv1.3: 
https://bugs.openjdk.java.net/browse/JDK-8224984
So it may explain some of the other related issues you see.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993840#comment-16993840
 ] 

Robert Muir commented on SOLR-13778:


As a quick hack you can edit {{$JAVA_HOME/conf/security/java.security}} and add 
{{TLSv1.3}} to the list of {{jdk.tls.disabledAlgorithms}}. Then run the test 
and see if it passes.

The default for this security property on my system looks like this:
{noformat}
jdk.tls.disabledAlgorithms=SSLv3, RC4, DES, MD5withRSA, DH keySize < 1024, \
EC keySize < 224, 3DES_EDE_CBC, anon, NULL
{noformat}

Add TLSv1.3 to the end of the list like this:
{noformat}
jdk.tls.disabledAlgorithms=SSLv3, RC4, DES, MD5withRSA, DH keySize < 1024, \
EC keySize < 224, 3DES_EDE_CBC, anon, NULL, TLSv1.3
{noformat}

I confirmed that it works by fetching https://suche.org/SslHandshakeInfo


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993835#comment-16993835
 ] 

Robert Muir commented on SOLR-13778:


I agree with initial theory that it looks like JDK-8209333. Since it is 
specific to TLS 1.3, would it be difficult to try just disabling TLS 1.3 to 
work around the issue? Probably makes sense why you see don't see it on 8 or 
older versions since they didn't support TLS 1.3 at all

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993806#comment-16993806
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

FWIW: I think what would be most helpful to try and move forward with this is 
if someone who has access to a windows VM and some time to dig into it could:
 * try to find an existing test that seems to tickle this bug often
 ** ideally on branch 8x where we can show it fails with java>11, but passes 
with java==8
 * if possible, modify the test to:
 ** _force_ ssl on every seed \@RandomizeSSL can do this
 ** slim down the test to remove anything that doesn't seem to contribute to 
the failure
 ** see if anything pops out that suggests what the root cause is, even if 
not...
 * run the test with {{javax.net.debug=all}} enabled on windows
 ** using both java11 and java8 (and maybe java13)

...then we can take those javax.net.debug logs to the jetty discussion list, or 
to the openjdk SSL list, and ask "WTF is happening here on these diff versions 
of java?

/cc [~dweiss] following up on discussion in SOLR-14033

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-12-11 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993804#comment-16993804
 ] 

Dawid Weiss commented on SOLR-13778:


The LegacyCloudClusterPropTest fails for me reliably on Windows. I tried:
{code}
ant test -Dtestcase=LegacyCloudClusterPropTest -Dtests.seed=deadbeef 
-Dargs="-Djavax.net.debug=all"
{code}
I attach three dumps -- two runs without net dumps (you can see it's 
consistent) and one with (didn't have time to analyze).


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: dumps-LegacyCloud.zip
>
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-10-17 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954128#comment-16954128
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

{quote}Should I reopen the JDK bug report?
{quote}

I don't know for certain that that JDK-8209333 is the problem we are seeing – 
particularly since (AFAICT) we're seeing the same basic problem (w/slightly 
differently worded error messages) on JDK13.  It was just my best guess at the 
time.

Since i can't reproduce it locally, I have no clue how to go about trying to 
createa test case to file a new JDK jira.

The only things i know for certain are:
* the _*only*_ times we see either the SSLException, or the root cause of the 
reported SSLException, in any jenkins failure log is from your Windows VM.
* these SSLExceptions acount for ~25% of all of our jenkins build failures, and 
~66% of all windows jenkins build failures
** (they occur in the logs for 14/55 failed builds, 14/21 failed windows builds 
-- but sometimes we might be seeing multiple test failures from these 
exceptions in a single build, and other times we might be seeing unrelated 
failures from other tests in the same builds as a test that failed for this 
reason)


> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-10-17 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953905#comment-16953905
 ] 

Uwe Schindler commented on SOLR-13778:
--

Ah sorry you also posted a different stack trace with jdk 13.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-10-17 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953897#comment-16953897
 ] 

Uwe Schindler commented on SOLR-13778:
--

No idea how to handle that. Should I reopen the JDK bug report? I can do this, 
as I have issue tracker access.
Does it happen with later versions, too?

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-10-17 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953886#comment-16953886
 ] 

Chris M. Hostetter commented on SOLR-13778:
---

Apparently i keep forgetting to actually tag [~uschindler] on this Jira so 
he'll see it.

> Windows JDK SSL Test Failure trend: SSLException: Software caused connection 
> abort: recv failed
> ---
>
> Key: SOLR-13778
> URL: https://issues.apache.org/jira/browse/SOLR-13778
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> Now that Uwe's jenkins build has been correctly reporting it's build results 
> for my [automated 
> reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick 
> up, I've noticed a pattern of failures that indicate a definite problem with 
> using SSL on Windows (even with java 11.0.4
>  )
>  The symptommatic stack traces all contain...
> {noformat}
> ...
>[junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
> ...
>[junit4]> Caused by: java.net.SocketException: Software caused 
> connection abort: recv failed
>[junit4]>at 
> java.base/java.net.SocketInputStream.socketRead0(Native Method)
> ...
> {noformat}
> I suspect this may be related to 
> [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete 
> evidence to back this up.
> I'll post some details of my analysis in comments...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-10-11 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949681#comment-16949681
 ] 

Chris M. Hostetter commented on SOLR-13778:
---


I just realized we're seeing a slightly _different_ SSLException from Uwe's 
java13 windows VMs...

{noformat}
   [junit4]> Throwable #1: 
org.apache.solr.client.solrj.SolrServerException: IOException occurred when 
talking to server at: https://127.0.0.1:551
21/solr
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([E2C1EFE3F69FB5C6:35E9A23BE77FFC28]:0)
   [junit4]>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:679)
   [junit4]>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265)
   [junit4]>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
   [junit4]>at 
org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368)
   [junit4]>at 
org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296)
   [junit4]>at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1128)
   [junit4]>at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:897)
   [junit4]>at 
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:829)
   [junit4]>at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
   [junit4]>at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:228)
   [junit4]>at 
org.apache.solr.cloud.MiniSolrCloudCluster.deleteAllCollections(MiniSolrCloudCluster.java:549)
   [junit4]>at 
org.apache.solr.cloud.TestCloudSearcherWarming.tearDown(TestCloudSearcherWarming.java:79)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]>at 
java.base/java.lang.reflect.Method.invoke(Method.java:567)
   [junit4]>at java.base/java.lang.Thread.run(Thread.java:830)
   [junit4]> Caused by: javax.net.ssl.SSLException: An established 
connection was aborted by the software in your host machine
   [junit4]>at 
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1652)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1038)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
   [junit4]>at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
   [junit4]>at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
   [junit4]>at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
   [junit4]>at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
   [junit4]>at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
   [junit4]>at 
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
   [junit4]>at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
   [junit4]>at 

[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed

2019-09-18 Thread Hoss Man (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932942#comment-16932942
 ] 

Hoss Man commented on SOLR-13778:
-

Here's a full example of what one of these stack traces tends to look like...
{noformat}
...
   [junit4]> Caused by: javax.net.ssl.SSLException: Software caused 
connection abort: recv failed
   [junit4]>at 
java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:320)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:258)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1342)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:844)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
   [junit4]>at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
   [junit4]>at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
   [junit4]>at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
   [junit4]>at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
   [junit4]>at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
   [junit4]>at 
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
   [junit4]>at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
   [junit4]>at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
   [junit4]>at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
   [junit4]>at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
   [junit4]>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:564)
   [junit4]>... 46 more
   [junit4]>Suppressed: java.net.SocketException: Software caused 
connection abort: socket write error
   [junit4]>at 
java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
   [junit4]>at 
java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
   [junit4]>at 
java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81)
   [junit4]>at 
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:351)
   [junit4]>... 68 more
   [junit4]> Caused by: java.net.SocketException: Software caused 
connection abort: recv failed
   [junit4]>at 
java.base/java.net.SocketInputStream.socketRead0(Native Method)
   [junit4]>at 
java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
   [junit4]>at 
java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
   [junit4]>at 
java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:448)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1132)
   [junit4]>at 
java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:828)
   [junit4]>... 64 more
{noformat}
Allthough it's not obvious from the public view of my reports, grepping all the 
available logs (available