Re: Cluster freeze with SSL enabled and JDK 11

2019-02-13 Thread Loredana Radulescu Ivanoff
Perfect, thank you for the update!

On Wed, Feb 13, 2019 at 5:58 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> For TLSv1.2 on Windows the fix is ready and tests are running for it. Hope
> that it will be integrated soon.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 12 февр. 2019 г. в 20:46, Loredana Radulescu Ivanoff <
> lradu...@tibco.com>:
>
>> Thank you very much for the info, it was very helpful.
>>
>> I assume it worked on Linux because I specifically set TLS v1.2 as a JVM
>> argument, by specifying -Djdk.tls.server.protocols="TLSv1.2"
>> -Djdk.tls.client.protocols="TLSv1.2"
>>
>> Would you be able to provide a (very) loose estimate for the fix? Is it
>> likely to go into 2.8?
>>
>> Thank you again!
>>
>> On Tue, Feb 12, 2019 at 7:10 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> It seems that you have problems due not just one but two issues:
>>>
>>> 1) Java 11 has TLSv1.3 by default and Ignite does not support that -
>>> https://issues.apache.org/jira/browse/IGNITE-11298
>>> why it worked for you on CentOS is a mystery. For some reason by Ubuntu
>>> has Java 10 in openjdk-11-jdk package and it worked. When I manually
>>> installed proper Java 11 it would not work on Linux just the same as on
>>> Windows. Falling back to TLSv1.2 could help, but,
>>>
>>> 2) on Windows SSL fails to work on Java 11 due to mistake in Ignite's
>>> NIO code. I also has created the ticket and currently devising a patch:
>>> https://issues.apache.org/jira/browse/IGNITE-11299
>>> More details in JIRA.
>>>
>>> I'm afraid your options are limited on Windows - use older Java or move
>>> to Linux.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пт, 8 февр. 2019 г. в 02:31, Loredana Radulescu Ivanoff <
>>> lradu...@tibco.com>:
>>>
 Hello,

 I would like to restart this topic because I can get a repro on Windows
 10 with Java 11 and SSL enabled by starting two nodes using just the 2.7
 Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
 only added a few extra JVM options to allow Ignite to start with Java 11,
 as follows:

 --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
 --add-exports=java.base/sun.nio.ch=ALL-UNNAMED
 -Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
 -Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
 -DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=6

 I'm attaching the logs from work/log and the configuration I've used.
 Could you please take a look and let me know if you see something wrong in
 the configuration, or a possible explanation?

 What is also interesting is that I used the same setup on two CentOS
 machines, and the same type of configuration, and the nodes do connect
 (with SSL and Java 11), without any errors. Could there be a platform issue
 here?

 Additionally, I confirmed that the nodes are able to connect as
 expected on both Windows and CentOS when SSL is disabled (I used the same
 configuration, but with the sslContextFactory bean commented out.

 Any help on the issue would be greatly appreciated. Thank you!



 On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
 lradu...@tibco.com> wrote:

> Hello,
>
> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
> SSL enabled:
>
>
>- the second node that I bring up joins, and then shortly after
>freezes and prints this message every minute:
>
> "WARN ...[*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange"
>
>
>- once the second node joins, the first node starts experiencing
>very frequent 100% CPU spikes; these are the messages I see:
>
> WARN 2018-10-18T13:50:52,728-0700 []
> communication.tcp.TcpCommunicationSpi: Communication SPI session write
> timed out (consider increasing 'socketWriteTimeout' configuration 
> property)
> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
> WARN 2018-10-18T13:50:52,737-0700 []
> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect 
> SSL
> engine status after closeOutbound call [status=OK,
> handshakeStatus=NEED_WRAP,
> WARN 2018-10-18T13:51:01,441-0700 []
> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
> release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>
> Other observations:
>
> I can reproduce this every time I start the nodes, and it doesn't
> matter which node comes up first.
>
>
> The issue goes away if I disable SSL.
>

Re: Cluster freeze with SSL enabled and JDK 11

2019-02-13 Thread Ilya Kasnacheev
Hello!

For TLSv1.2 on Windows the fix is ready and tests are running for it. Hope
that it will be integrated soon.

Regards,
-- 
Ilya Kasnacheev


вт, 12 февр. 2019 г. в 20:46, Loredana Radulescu Ivanoff :

> Thank you very much for the info, it was very helpful.
>
> I assume it worked on Linux because I specifically set TLS v1.2 as a JVM
> argument, by specifying -Djdk.tls.server.protocols="TLSv1.2"
> -Djdk.tls.client.protocols="TLSv1.2"
>
> Would you be able to provide a (very) loose estimate for the fix? Is it
> likely to go into 2.8?
>
> Thank you again!
>
> On Tue, Feb 12, 2019 at 7:10 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> It seems that you have problems due not just one but two issues:
>>
>> 1) Java 11 has TLSv1.3 by default and Ignite does not support that -
>> https://issues.apache.org/jira/browse/IGNITE-11298
>> why it worked for you on CentOS is a mystery. For some reason by Ubuntu
>> has Java 10 in openjdk-11-jdk package and it worked. When I manually
>> installed proper Java 11 it would not work on Linux just the same as on
>> Windows. Falling back to TLSv1.2 could help, but,
>>
>> 2) on Windows SSL fails to work on Java 11 due to mistake in Ignite's NIO
>> code. I also has created the ticket and currently devising a patch:
>> https://issues.apache.org/jira/browse/IGNITE-11299
>> More details in JIRA.
>>
>> I'm afraid your options are limited on Windows - use older Java or move
>> to Linux.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 8 февр. 2019 г. в 02:31, Loredana Radulescu Ivanoff <
>> lradu...@tibco.com>:
>>
>>> Hello,
>>>
>>> I would like to restart this topic because I can get a repro on Windows
>>> 10 with Java 11 and SSL enabled by starting two nodes using just the 2.7
>>> Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
>>> only added a few extra JVM options to allow Ignite to start with Java 11,
>>> as follows:
>>>
>>> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
>>> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED
>>> -Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
>>> -Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
>>> -DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=6
>>>
>>> I'm attaching the logs from work/log and the configuration I've used.
>>> Could you please take a look and let me know if you see something wrong in
>>> the configuration, or a possible explanation?
>>>
>>> What is also interesting is that I used the same setup on two CentOS
>>> machines, and the same type of configuration, and the nodes do connect
>>> (with SSL and Java 11), without any errors. Could there be a platform issue
>>> here?
>>>
>>> Additionally, I confirmed that the nodes are able to connect as expected
>>> on both Windows and CentOS when SSL is disabled (I used the same
>>> configuration, but with the sslContextFactory bean commented out.
>>>
>>> Any help on the issue would be greatly appreciated. Thank you!
>>>
>>>
>>>
>>> On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
>>> lradu...@tibco.com> wrote:
>>>
 Hello,

 I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
 SSL enabled:


- the second node that I bring up joins, and then shortly after
freezes and prints this message every minute:

 "WARN ...[*Initialization*]
 processors.cache.GridCachePartitionExchangeManager: Still waiting for
 initial partition map exchange"


- once the second node joins, the first node starts experiencing
very frequent 100% CPU spikes; these are the messages I see:

 WARN 2018-10-18T13:50:52,728-0700 []
 communication.tcp.TcpCommunicationSpi: Communication SPI session write
 timed out (consider increasing 'socketWriteTimeout' configuration property)
 [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
 WARN 2018-10-18T13:50:52,737-0700 []
 communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
 gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
 engine status after closeOutbound call [status=OK,
 handshakeStatus=NEED_WRAP,
 WARN 2018-10-18T13:51:01,441-0700 []
 dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
 release latch within timeout: ServerLatch [permits=1,
 pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
 [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]

 Other observations:

 I can reproduce this every time I start the nodes, and it doesn't
 matter which node comes up first.


 The issue goes away if I disable SSL.


 Increasing the socketWriteTimeout, networkTimeout or the
 failureDetectionTimeout does not help.

 It seems to be happening only with JDK 11, and not with JDK 8.


 Do you have any suggestions/known issues about this?

 Thank you,

 Loredana
>>>

Re: Cluster freeze with SSL enabled and JDK 11

2019-02-12 Thread Loredana Radulescu Ivanoff
Thank you very much for the info, it was very helpful.

I assume it worked on Linux because I specifically set TLS v1.2 as a JVM
argument, by specifying -Djdk.tls.server.protocols="TLSv1.2"
-Djdk.tls.client.protocols="TLSv1.2"

Would you be able to provide a (very) loose estimate for the fix? Is it
likely to go into 2.8?

Thank you again!

On Tue, Feb 12, 2019 at 7:10 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> It seems that you have problems due not just one but two issues:
>
> 1) Java 11 has TLSv1.3 by default and Ignite does not support that -
> https://issues.apache.org/jira/browse/IGNITE-11298
> why it worked for you on CentOS is a mystery. For some reason by Ubuntu
> has Java 10 in openjdk-11-jdk package and it worked. When I manually
> installed proper Java 11 it would not work on Linux just the same as on
> Windows. Falling back to TLSv1.2 could help, but,
>
> 2) on Windows SSL fails to work on Java 11 due to mistake in Ignite's NIO
> code. I also has created the ticket and currently devising a patch:
> https://issues.apache.org/jira/browse/IGNITE-11299
> More details in JIRA.
>
> I'm afraid your options are limited on Windows - use older Java or move to
> Linux.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 8 февр. 2019 г. в 02:31, Loredana Radulescu Ivanoff <
> lradu...@tibco.com>:
>
>> Hello,
>>
>> I would like to restart this topic because I can get a repro on Windows
>> 10 with Java 11 and SSL enabled by starting two nodes using just the 2.7
>> Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
>> only added a few extra JVM options to allow Ignite to start with Java 11,
>> as follows:
>>
>> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
>> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED
>> -Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
>> -Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
>> -DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=6
>>
>> I'm attaching the logs from work/log and the configuration I've used.
>> Could you please take a look and let me know if you see something wrong in
>> the configuration, or a possible explanation?
>>
>> What is also interesting is that I used the same setup on two CentOS
>> machines, and the same type of configuration, and the nodes do connect
>> (with SSL and Java 11), without any errors. Could there be a platform issue
>> here?
>>
>> Additionally, I confirmed that the nodes are able to connect as expected
>> on both Windows and CentOS when SSL is disabled (I used the same
>> configuration, but with the sslContextFactory bean commented out.
>>
>> Any help on the issue would be greatly appreciated. Thank you!
>>
>>
>>
>> On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
>> lradu...@tibco.com> wrote:
>>
>>> Hello,
>>>
>>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
>>> SSL enabled:
>>>
>>>
>>>- the second node that I bring up joins, and then shortly after
>>>freezes and prints this message every minute:
>>>
>>> "WARN ...[*Initialization*]
>>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>>> initial partition map exchange"
>>>
>>>
>>>- once the second node joins, the first node starts experiencing
>>>very frequent 100% CPU spikes; these are the messages I see:
>>>
>>> WARN 2018-10-18T13:50:52,728-0700 []
>>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>>> WARN 2018-10-18T13:50:52,737-0700 []
>>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>>> engine status after closeOutbound call [status=OK,
>>> handshakeStatus=NEED_WRAP,
>>> WARN 2018-10-18T13:51:01,441-0700 []
>>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>>> release latch within timeout: ServerLatch [permits=1,
>>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>>
>>> Other observations:
>>>
>>> I can reproduce this every time I start the nodes, and it doesn't matter
>>> which node comes up first.
>>>
>>>
>>> The issue goes away if I disable SSL.
>>>
>>>
>>> Increasing the socketWriteTimeout, networkTimeout or the
>>> failureDetectionTimeout does not help.
>>>
>>> It seems to be happening only with JDK 11, and not with JDK 8.
>>>
>>>
>>> Do you have any suggestions/known issues about this?
>>>
>>> Thank you,
>>>
>>> Loredana
>>>
>>>
>>>
>>>
>>>


Re: Cluster freeze with SSL enabled and JDK 11

2019-02-12 Thread Ilya Kasnacheev
Hello!

It seems that you have problems due not just one but two issues:

1) Java 11 has TLSv1.3 by default and Ignite does not support that -
https://issues.apache.org/jira/browse/IGNITE-11298
why it worked for you on CentOS is a mystery. For some reason by Ubuntu has
Java 10 in openjdk-11-jdk package and it worked. When I manually installed
proper Java 11 it would not work on Linux just the same as on Windows.
Falling back to TLSv1.2 could help, but,

2) on Windows SSL fails to work on Java 11 due to mistake in Ignite's NIO
code. I also has created the ticket and currently devising a patch:
https://issues.apache.org/jira/browse/IGNITE-11299
More details in JIRA.

I'm afraid your options are limited on Windows - use older Java or move to
Linux.

Regards,
-- 
Ilya Kasnacheev


пт, 8 февр. 2019 г. в 02:31, Loredana Radulescu Ivanoff :

> Hello,
>
> I would like to restart this topic because I can get a repro on Windows 10
> with Java 11 and SSL enabled by starting two nodes using just the 2.7
> Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
> only added a few extra JVM options to allow Ignite to start with Java 11,
> as follows:
>
> --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
> --add-exports=java.base/sun.nio.ch=ALL-UNNAMED
> -Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
> -Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
> -DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=6
>
> I'm attaching the logs from work/log and the configuration I've used.
> Could you please take a look and let me know if you see something wrong in
> the configuration, or a possible explanation?
>
> What is also interesting is that I used the same setup on two CentOS
> machines, and the same type of configuration, and the nodes do connect
> (with SSL and Java 11), without any errors. Could there be a platform issue
> here?
>
> Additionally, I confirmed that the nodes are able to connect as expected
> on both Windows and CentOS when SSL is disabled (I used the same
> configuration, but with the sslContextFactory bean commented out.
>
> Any help on the issue would be greatly appreciated. Thank you!
>
>
>
> On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
> lradu...@tibco.com> wrote:
>
>> Hello,
>>
>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
>> enabled:
>>
>>
>>- the second node that I bring up joins, and then shortly after
>>freezes and prints this message every minute:
>>
>> "WARN ...[*Initialization*]
>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>> initial partition map exchange"
>>
>>
>>- once the second node joins, the first node starts experiencing very
>>frequent 100% CPU spikes; these are the messages I see:
>>
>> WARN 2018-10-18T13:50:52,728-0700 []
>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>> WARN 2018-10-18T13:50:52,737-0700 []
>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>> engine status after closeOutbound call [status=OK,
>> handshakeStatus=NEED_WRAP,
>> WARN 2018-10-18T13:51:01,441-0700 []
>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>> release latch within timeout: ServerLatch [permits=1,
>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>
>> Other observations:
>>
>> I can reproduce this every time I start the nodes, and it doesn't matter
>> which node comes up first.
>>
>>
>> The issue goes away if I disable SSL.
>>
>>
>> Increasing the socketWriteTimeout, networkTimeout or the
>> failureDetectionTimeout does not help.
>>
>> It seems to be happening only with JDK 11, and not with JDK 8.
>>
>>
>> Do you have any suggestions/known issues about this?
>>
>> Thank you,
>>
>> Loredana
>>
>>
>>
>>
>>


Re: Cluster freeze with SSL enabled and JDK 11

2019-02-07 Thread Loredana Radulescu Ivanoff
Hello,

I would like to restart this topic because I can get a repro on Windows 10
with Java 11 and SSL enabled by starting two nodes using just the 2.7
Ignite distribution. I'm starting the Ignite nodes via ignite.bat, and I've
only added a few extra JVM options to allow Ignite to start with Java 11,
as follows:

--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
--add-exports=java.base/sun.nio.ch=ALL-UNNAMED
-Djdk.tls.server.protocols="TLSv1.2" -Djdk.tls.client.protocols="TLSv1.2"
-Djdk.tls.acknowledgeCloseNotify=true -DIGNITE_QUIET=false
-DIGNITE_SYSTEM_WORKER_BLOCKED_TIMEOUT=6

I'm attaching the logs from work/log and the configuration I've used. Could
you please take a look and let me know if you see something wrong in the
configuration, or a possible explanation?

What is also interesting is that I used the same setup on two CentOS
machines, and the same type of configuration, and the nodes do connect
(with SSL and Java 11), without any errors. Could there be a platform issue
here?

Additionally, I confirmed that the nodes are able to connect as expected on
both Windows and CentOS when SSL is disabled (I used the same
configuration, but with the sslContextFactory bean commented out.

Any help on the issue would be greatly appreciated. Thank you!



On Thu, Oct 18, 2018 at 2:56 PM Loredana Radulescu Ivanoff <
lradu...@tibco.com> wrote:

> Hello,
>
> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
> enabled:
>
>
>- the second node that I bring up joins, and then shortly after
>freezes and prints this message every minute:
>
> "WARN ...[*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange"
>
>
>- once the second node joins, the first node starts experiencing very
>frequent 100% CPU spikes; these are the messages I see:
>
> WARN 2018-10-18T13:50:52,728-0700 []
> communication.tcp.TcpCommunicationSpi: Communication SPI session write
> timed out (consider increasing 'socketWriteTimeout' configuration property)
> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
> WARN 2018-10-18T13:50:52,737-0700 []
> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
> engine status after closeOutbound call [status=OK,
> handshakeStatus=NEED_WRAP,
> WARN 2018-10-18T13:51:01,441-0700 []
> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
> release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>
> Other observations:
>
> I can reproduce this every time I start the nodes, and it doesn't matter
> which node comes up first.
>
>
> The issue goes away if I disable SSL.
>
>
> Increasing the socketWriteTimeout, networkTimeout or the
> failureDetectionTimeout does not help.
>
> It seems to be happening only with JDK 11, and not with JDK 8.
>
>
> Do you have any suggestions/known issues about this?
>
> Thank you,
>
> Loredana
>
>
>
>
>
<>


Re: Cluster freeze with SSL enabled and JDK 11

2018-11-28 Thread Ilya Kasnacheev
Hello!

I have tried running SSL tests again and they seem to pass (only one test
fails for some different reason)

Can you try running those 2 nodes in stand-alone processes, see if problem
persists? I can see that you have SSL-enabled Tomcat running in same VM,
which I imaging could interfere with Ignite's SSL.

Note that you will need to do some load (such as REST cache operations) to
see if communication indeed works (or doesn't).

Regards,
-- 
Ilya Kasnacheev


ср, 28 нояб. 2018 г. в 16:01, Loredana Radulescu Ivanoff :

> Hello again,
>
> I haven't been able to solve this issue on my own, so I'm hoping you'd be
> able to take another look.
>
> To recap: only with Java 11 and TLS enabled, the second node I bring in
> the cluster never starts up, and remains stuck at "Still waiting for
> initial partition map exchange". The first nodes  keeps logging "Unable to
> await partitions release latch within timeout". To me, this looks like an
> Ignite issue, and no matter what causes the situation (arguably in this
> case an SSL error), there should be a more elegant exit out of it, i.e. the
> second node should give up after a while, if there isn't a better way to
> retry and achieve successful communication. The two nodes are able to
> communicate, and increasing various network timeouts/failure detection
> timeout does not help.
>
> Previously it was mentioned that the Ignite unit test did not show a
> repro. I suggest running a test that uses two different machines, because
> when I run the nodes on the same machine, I do not get a repro either.
>
> Attaching here logging from the two nodes including SSL messages.
>
> Is Ignite support for Java 11 going to be available before Oracle ends
> free commercial support for Java 1.8 in Jan 2019?
>
> Thank you.
>
> On Thu, Oct 25, 2018 at 9:29 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> I have tried to run the test with protocol "TLSv1.2", didn't see any
>> difference.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 24 окт. 2018 г. в 20:23, Loredana Radulescu Ivanoff <
>> lradu...@tibco.com>:
>>
>>> Hello again,
>>>
>>> I am working on getting the full SSL logs over to you, but I have one
>>> more question in between: TLS v1.3 is enabled by default in JDK 11, and my
>>> app is using TLS v1.2 specifically. There's a known issue that's recently
>>> addressed by the JDK related to TLS v1.3 half close policy, details here:
>>> https://bugs.java.com/view_bug.do?bug_id=8207009
>>>
>>> Would you be able to confirm whether your SSL test runs successfully
>>> when the connecting client/server use TLS v.12 specifically ?
>>>
>>> FYI, I have tried specifically disabling TLS v1.3 using both the
>>> "jdk.tls.client.protocols" and "jdk.tls.server.protocols" system
>>> properties, and also set "jdk.tls.acknowledgeCloseNotify" to true on
>>> both sides as indicated here -
>>> https://bugs.java.com/view_bug.do?bug_id=8208526
>>>
>>> Based on my explorations so far, I think this may be a JDK issue
>>> (specifically in the JSSE provider) that has not been addressed yet. Either
>>> way, do you think three is anything that could be done in Ignite to
>>> explicitly close the connection on both sides in a scenario like this ?
>>>
>>> What I can safely share on the SSL logs so far is this (both nodes get
>>> stuck, node 1 in failing to close the SSL connection, node 2 in waiting for
>>> partition exchange)
>>>
>>> Node 1:
>>>
>>> "2018-10-23 14:18:40.981 PDT|SSLEngineImpl.java:715|Closing inbound of
>>> SSLEngine
>>> javax.net.ssl|ERROR|3F|grid-nio-worker-tcp-comm-1-#26%%|2018-10-23
>>> 14:18:40.981 PDT|TransportContext.java:313|Fatal (INTERNAL_ERROR): closing
>>> inbound before receiving peer's close_notify (
>>> "throwable" : {
>>>   javax.net.ssl.SSLException: closing inbound before receiving peer's
>>> close_notify
>>>   at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:129)
>>>   at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
>>>   at
>>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:308)
>>>   at
>>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
>>>   at
>>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:255)
>>>   at
>>> java.base/sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:724)
>>>   at
>>> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.shutdown(GridNioSslHandler.java:185)
>>>   at
>>> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.onSessionClosed(GridNioSslFilter.java:223)
>>>   at
>>> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95)
>>>   at
>>> org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3447)
>>>   at
>>> org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:149)
>>>   at
>>> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridN

Re: Cluster freeze with SSL enabled and JDK 11

2018-10-25 Thread Ilya Kasnacheev
Hello!

I have tried to run the test with protocol "TLSv1.2", didn't see any
difference.

Regards,
-- 
Ilya Kasnacheev


ср, 24 окт. 2018 г. в 20:23, Loredana Radulescu Ivanoff :

> Hello again,
>
> I am working on getting the full SSL logs over to you, but I have one more
> question in between: TLS v1.3 is enabled by default in JDK 11, and my app
> is using TLS v1.2 specifically. There's a known issue that's recently
> addressed by the JDK related to TLS v1.3 half close policy, details here:
> https://bugs.java.com/view_bug.do?bug_id=8207009
>
> Would you be able to confirm whether your SSL test runs successfully when
> the connecting client/server use TLS v.12 specifically ?
>
> FYI, I have tried specifically disabling TLS v1.3 using both the
> "jdk.tls.client.protocols" and "jdk.tls.server.protocols" system
> properties, and also set "jdk.tls.acknowledgeCloseNotify" to true on both
> sides as indicated here - https://bugs.java.com/view_bug.do?bug_id=8208526
>
> Based on my explorations so far, I think this may be a JDK issue
> (specifically in the JSSE provider) that has not been addressed yet. Either
> way, do you think three is anything that could be done in Ignite to
> explicitly close the connection on both sides in a scenario like this ?
>
> What I can safely share on the SSL logs so far is this (both nodes get
> stuck, node 1 in failing to close the SSL connection, node 2 in waiting for
> partition exchange)
>
> Node 1:
>
> "2018-10-23 14:18:40.981 PDT|SSLEngineImpl.java:715|Closing inbound of
> SSLEngine
> javax.net.ssl|ERROR|3F|grid-nio-worker-tcp-comm-1-#26%%|2018-10-23
> 14:18:40.981 PDT|TransportContext.java:313|Fatal (INTERNAL_ERROR): closing
> inbound before receiving peer's close_notify (
> "throwable" : {
>   javax.net.ssl.SSLException: closing inbound before receiving peer's
> close_notify
>   at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:129)
>   at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
>   at
> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:308)
>   at
> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
>   at
> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:255)
>   at
> java.base/sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:724)
>   at
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.shutdown(GridNioSslHandler.java:185)
>   at
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.onSessionClosed(GridNioSslFilter.java:223)
>   at
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95)
>   at
> org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3447)
>   at
> org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:149)
>   at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2689)
>   at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2002)
>   at
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
>   at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.base/java.lang.Thread.run(Thread.java:834)}
>
> )
> 2018-10-23 14:18:40.981 PDT|SSLSessionImpl.java:753|Invalidated session:
> Session(1540329505125|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
> DEBUG 14:18:40,981-0700 [] communication.tcp.TcpCommunicationSpi: Unable
> to correctly close inbound data stream (will ignore) "
>
>
> Server 2
>
> "javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
> PDT|SSLSocketOutputRecord.java:310|WRITE: TLS12 application_data, length = 1
> javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
> PDT|SSLCipher.java:1727|Plaintext before ENCRYPTION (
>   : 01 .
> )
> javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
> PDT|SSLSocketOutputRecord.java:324|Raw write (
>   : 17 03 03 00 19 00 00 00   00 00 00 00 A1 45 E5 C8  .E..
>   0010: 27 3D EE E2 89 4A 0F DD   B6 9A 2A A8 5F 6B'=...J*._k
> )
> WARN 14:18:10,585-0700 [*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange "
>
> On Mon, Oct 22, 2018 at 12:31 AM Ilya Kasnacheev <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> I would suggest regular (or DEBUG) Ignite logs + SSL debug logs.
>>
>>
>> https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 19 окт. 2018 г. в 20:58, Loredana Radulescu Ivanoff <
>> lradu...@tibco.com>:
>>
>>> Definitely - do you want debug logs, and if yes, should I enable them on
>>> all Ignite packages, or just for certain ones? Any other diagn

Re: Cluster freeze with SSL enabled and JDK 11

2018-10-24 Thread Loredana Radulescu Ivanoff
Hello again,

I am working on getting the full SSL logs over to you, but I have one more
question in between: TLS v1.3 is enabled by default in JDK 11, and my app
is using TLS v1.2 specifically. There's a known issue that's recently
addressed by the JDK related to TLS v1.3 half close policy, details here:
https://bugs.java.com/view_bug.do?bug_id=8207009

Would you be able to confirm whether your SSL test runs successfully when
the connecting client/server use TLS v.12 specifically ?

FYI, I have tried specifically disabling TLS v1.3 using both the
"jdk.tls.client.protocols" and "jdk.tls.server.protocols" system
properties, and also set "jdk.tls.acknowledgeCloseNotify" to true on both
sides as indicated here - https://bugs.java.com/view_bug.do?bug_id=8208526

Based on my explorations so far, I think this may be a JDK issue
(specifically in the JSSE provider) that has not been addressed yet. Either
way, do you think three is anything that could be done in Ignite to
explicitly close the connection on both sides in a scenario like this ?

What I can safely share on the SSL logs so far is this (both nodes get
stuck, node 1 in failing to close the SSL connection, node 2 in waiting for
partition exchange)

Node 1:

"2018-10-23 14:18:40.981 PDT|SSLEngineImpl.java:715|Closing inbound of
SSLEngine
javax.net.ssl|ERROR|3F|grid-nio-worker-tcp-comm-1-#26%%|2018-10-23
14:18:40.981 PDT|TransportContext.java:313|Fatal (INTERNAL_ERROR): closing
inbound before receiving peer's close_notify (
"throwable" : {
  javax.net.ssl.SSLException: closing inbound before receiving peer's
close_notify
  at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:129)
  at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:117)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:308)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:264)
  at
java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:255)
  at
java.base/sun.security.ssl.SSLEngineImpl.closeInbound(SSLEngineImpl.java:724)
  at
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.shutdown(GridNioSslHandler.java:185)
  at
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.onSessionClosed(GridNioSslFilter.java:223)
  at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95)
  at
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3447)
  at
org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:149)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2689)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2002)
  at
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1764)
  at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at java.base/java.lang.Thread.run(Thread.java:834)}

)
2018-10-23 14:18:40.981 PDT|SSLSessionImpl.java:753|Invalidated session:
Session(1540329505125|TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384)
DEBUG 14:18:40,981-0700 [] communication.tcp.TcpCommunicationSpi: Unable to
correctly close inbound data stream (will ignore) "


Server 2

"javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLSocketOutputRecord.java:310|WRITE: TLS12 application_data, length = 1
javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLCipher.java:1727|Plaintext before ENCRYPTION (
  : 01 .
)
javax.net.ssl|DEBUG|46|tcp-disco-sock-reader-#4%%|2018-10-23 14:18:09.194
PDT|SSLSocketOutputRecord.java:324|Raw write (
  : 17 03 03 00 19 00 00 00   00 00 00 00 A1 45 E5 C8  .E..
  0010: 27 3D EE E2 89 4A 0F DD   B6 9A 2A A8 5F 6B'=...J*._k
)
WARN 14:18:10,585-0700 [*Initialization*]
processors.cache.GridCachePartitionExchangeManager: Still waiting for
initial partition map exchange "

On Mon, Oct 22, 2018 at 12:31 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> I would suggest regular (or DEBUG) Ignite logs + SSL debug logs.
>
>
> https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 19 окт. 2018 г. в 20:58, Loredana Radulescu Ivanoff <
> lradu...@tibco.com>:
>
>> Definitely - do you want debug logs, and if yes, should I enable them on
>> all Ignite packages, or just for certain ones? Any other diagnostic flags
>> that I should set?
>>
>> On Fri, Oct 19, 2018 at 7:48 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Apache Ignite does not claim support JDK 11 at this moment. However, I
>>> was able to run SSL test (TcpDiscoverySslTrustedUntrustedTest) and it
>>> turned out mostly fine.
>>>
>>> More info is needed from your side, such as full instances logs.
>

Re: Cluster freeze with SSL enabled and JDK 11

2018-10-22 Thread Ilya Kasnacheev
Hello!

I would suggest regular (or DEBUG) Ignite logs + SSL debug logs.

https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/ReadDebug.html

Regards,
-- 
Ilya Kasnacheev


пт, 19 окт. 2018 г. в 20:58, Loredana Radulescu Ivanoff :

> Definitely - do you want debug logs, and if yes, should I enable them on
> all Ignite packages, or just for certain ones? Any other diagnostic flags
> that I should set?
>
> On Fri, Oct 19, 2018 at 7:48 AM Ilya Kasnacheev 
> wrote:
>
>> Hello!
>>
>> Apache Ignite does not claim support JDK 11 at this moment. However, I
>> was able to run SSL test (TcpDiscoverySslTrustedUntrustedTest) and it
>> turned out mostly fine.
>>
>> More info is needed from your side, such as full instances logs.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 19 окт. 2018 г. в 0:56, Loredana Radulescu Ivanoff <
>> lradu...@tibco.com>:
>>
>>> Hello,
>>>
>>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and
>>> SSL enabled:
>>>
>>>
>>>- the second node that I bring up joins, and then shortly after
>>>freezes and prints this message every minute:
>>>
>>> "WARN ...[*Initialization*]
>>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>>> initial partition map exchange"
>>>
>>>
>>>- once the second node joins, the first node starts experiencing
>>>very frequent 100% CPU spikes; these are the messages I see:
>>>
>>> WARN 2018-10-18T13:50:52,728-0700 []
>>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>>> WARN 2018-10-18T13:50:52,737-0700 []
>>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>>> engine status after closeOutbound call [status=OK,
>>> handshakeStatus=NEED_WRAP,
>>> WARN 2018-10-18T13:51:01,441-0700 []
>>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>>> release latch within timeout: ServerLatch [permits=1,
>>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>>
>>> Other observations:
>>>
>>> I can reproduce this every time I start the nodes, and it doesn't matter
>>> which node comes up first.
>>>
>>>
>>> The issue goes away if I disable SSL.
>>>
>>>
>>> Increasing the socketWriteTimeout, networkTimeout or the
>>> failureDetectionTimeout does not help.
>>>
>>> It seems to be happening only with JDK 11, and not with JDK 8.
>>>
>>>
>>> Do you have any suggestions/known issues about this?
>>>
>>> Thank you,
>>>
>>> Loredana
>>>
>>>
>>>
>>>
>>>


Re: Cluster freeze with SSL enabled and JDK 11

2018-10-19 Thread Loredana Radulescu Ivanoff
Definitely - do you want debug logs, and if yes, should I enable them on
all Ignite packages, or just for certain ones? Any other diagnostic flags
that I should set?

On Fri, Oct 19, 2018 at 7:48 AM Ilya Kasnacheev 
wrote:

> Hello!
>
> Apache Ignite does not claim support JDK 11 at this moment. However, I was
> able to run SSL test (TcpDiscoverySslTrustedUntrustedTest) and it turned
> out mostly fine.
>
> More info is needed from your side, such as full instances logs.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 19 окт. 2018 г. в 0:56, Loredana Radulescu Ivanoff  >:
>
>> Hello,
>>
>> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
>> enabled:
>>
>>
>>- the second node that I bring up joins, and then shortly after
>>freezes and prints this message every minute:
>>
>> "WARN ...[*Initialization*]
>> processors.cache.GridCachePartitionExchangeManager: Still waiting for
>> initial partition map exchange"
>>
>>
>>- once the second node joins, the first node starts experiencing very
>>frequent 100% CPU spikes; these are the messages I see:
>>
>> WARN 2018-10-18T13:50:52,728-0700 []
>> communication.tcp.TcpCommunicationSpi: Communication SPI session write
>> timed out (consider increasing 'socketWriteTimeout' configuration property)
>> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
>> WARN 2018-10-18T13:50:52,737-0700 []
>> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
>> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
>> engine status after closeOutbound call [status=OK,
>> handshakeStatus=NEED_WRAP,
>> WARN 2018-10-18T13:51:01,441-0700 []
>> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
>> release latch within timeout: ServerLatch [permits=1,
>> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
>> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>>
>> Other observations:
>>
>> I can reproduce this every time I start the nodes, and it doesn't matter
>> which node comes up first.
>>
>>
>> The issue goes away if I disable SSL.
>>
>>
>> Increasing the socketWriteTimeout, networkTimeout or the
>> failureDetectionTimeout does not help.
>>
>> It seems to be happening only with JDK 11, and not with JDK 8.
>>
>>
>> Do you have any suggestions/known issues about this?
>>
>> Thank you,
>>
>> Loredana
>>
>>
>>
>>
>>


Re: Cluster freeze with SSL enabled and JDK 11

2018-10-19 Thread Ilya Kasnacheev
Hello!

Apache Ignite does not claim support JDK 11 at this moment. However, I was
able to run SSL test (TcpDiscoverySslTrustedUntrustedTest) and it turned
out mostly fine.

More info is needed from your side, such as full instances logs.

Regards,
-- 
Ilya Kasnacheev


пт, 19 окт. 2018 г. в 0:56, Loredana Radulescu Ivanoff :

> Hello,
>
> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
> enabled:
>
>
>- the second node that I bring up joins, and then shortly after
>freezes and prints this message every minute:
>
> "WARN ...[*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange"
>
>
>- once the second node joins, the first node starts experiencing very
>frequent 100% CPU spikes; these are the messages I see:
>
> WARN 2018-10-18T13:50:52,728-0700 []
> communication.tcp.TcpCommunicationSpi: Communication SPI session write
> timed out (consider increasing 'socketWriteTimeout' configuration property)
> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
> WARN 2018-10-18T13:50:52,737-0700 []
> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
> engine status after closeOutbound call [status=OK,
> handshakeStatus=NEED_WRAP,
> WARN 2018-10-18T13:51:01,441-0700 []
> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
> release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>
> Other observations:
>
> I can reproduce this every time I start the nodes, and it doesn't matter
> which node comes up first.
>
>
> The issue goes away if I disable SSL.
>
>
> Increasing the socketWriteTimeout, networkTimeout or the
> failureDetectionTimeout does not help.
>
> It seems to be happening only with JDK 11, and not with JDK 8.
>
>
> Do you have any suggestions/known issues about this?
>
> Thank you,
>
> Loredana
>
>
>
>
>


Re: Cluster freeze with SSL enabled and JDK 11

2018-10-19 Thread Gianluca Bonetti
Hello

I think that the only supported version is Java 8 right now, and Java 11
support should come with 2.7 or 2.8 release.
https://issues.apache.org/jira/browse/IGNITE-8441

Cheers
Gianluca

Il giorno gio 18 ott 2018 alle ore 23:56 Loredana Radulescu Ivanoff <
lradu...@tibco.com> ha scritto:

> Hello,
>
> I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
> enabled:
>
>
>- the second node that I bring up joins, and then shortly after
>freezes and prints this message every minute:
>
> "WARN ...[*Initialization*]
> processors.cache.GridCachePartitionExchangeManager: Still waiting for
> initial partition map exchange"
>
>
>- once the second node joins, the first node starts experiencing very
>frequent 100% CPU spikes; these are the messages I see:
>
> WARN 2018-10-18T13:50:52,728-0700 []
> communication.tcp.TcpCommunicationSpi: Communication SPI session write
> timed out (consider increasing 'socketWriteTimeout' configuration property)
> [remoteAddr=/10.100.36.82:51620, writeTimeout=15000]
> WARN 2018-10-18T13:50:52,737-0700 []
> communication.tcp.TcpCommunicationSpi: Failed to shutdown SSL session
> gracefully (will force close) [ex=javax.net.ssl.SSLException: Incorrect SSL
> engine status after closeOutbound call [status=OK,
> handshakeStatus=NEED_WRAP,
> WARN 2018-10-18T13:51:01,441-0700 []
> dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
> release latch within timeout: ServerLatch [permits=1,
> pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
> [id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]
>
> Other observations:
>
> I can reproduce this every time I start the nodes, and it doesn't matter
> which node comes up first.
>
>
> The issue goes away if I disable SSL.
>
>
> Increasing the socketWriteTimeout, networkTimeout or the
> failureDetectionTimeout does not help.
>
> It seems to be happening only with JDK 11, and not with JDK 8.
>
>
> Do you have any suggestions/known issues about this?
>
> Thank you,
>
> Loredana
>
>
>
>
>


Cluster freeze with SSL enabled and JDK 11

2018-10-18 Thread Loredana Radulescu Ivanoff
Hello,

I can consistently reproduce this issue with Ignite 2.6.0, JDK 11 and SSL
enabled:


   - the second node that I bring up joins, and then shortly after freezes
   and prints this message every minute:

"WARN ...[*Initialization*]
processors.cache.GridCachePartitionExchangeManager: Still waiting for
initial partition map exchange"


   - once the second node joins, the first node starts experiencing very
   frequent 100% CPU spikes; these are the messages I see:

WARN 2018-10-18T13:50:52,728-0700 [] communication.tcp.TcpCommunicationSpi:
Communication SPI session write timed out (consider increasing
'socketWriteTimeout' configuration property) [remoteAddr=/10.100.36.82:51620,
writeTimeout=15000]
WARN 2018-10-18T13:50:52,737-0700 [] communication.tcp.TcpCommunicationSpi:
Failed to shutdown SSL session gracefully (will force close)
[ex=javax.net.ssl.SSLException: Incorrect SSL engine status after
closeOutbound call [status=OK, handshakeStatus=NEED_WRAP,
WARN 2018-10-18T13:51:01,441-0700 []
dht.preloader.GridDhtPartitionsExchangeFuture: Unable to await partitions
release latch within timeout: ServerLatch [permits=1,
pendingAcks=[aeba8bb7-c9b8-4d46-be8a-df361eaa8fc5], super=CompletableLatch
[id=exchange, topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0]]]

Other observations:

I can reproduce this every time I start the nodes, and it doesn't matter
which node comes up first.


The issue goes away if I disable SSL.


Increasing the socketWriteTimeout, networkTimeout or the
failureDetectionTimeout does not help.

It seems to be happening only with JDK 11, and not with JDK 8.


Do you have any suggestions/known issues about this?

Thank you,

Loredana