[jira] [Commented] (IGNITE-11262) Compression on Discovery data bag

2019-02-21 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774047#comment-16774047
 ] 

Pavel Voronkin commented on IGNITE-11262:
-

Thanks [~v.pyatkov] changes looks good to me

> Compression on Discovery data bag
> -
>
> Key: IGNITE-11262
> URL: https://issues.apache.org/jira/browse/IGNITE-11262
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Size of GridComponetns data may increase significantly in large deployment.
> Examples:
> 1) In case of more then 3K caches with QueryEntry configured - size of 
> {{DiscoveryDataBag}}{{GridCacheProcessor}} data bag consume more then 20 Mb
> 2) If cluster contain more then 13K objects - 
> {{GridMarshallerMappingProcessor}} size more then 1 Mb
> 3) Cluster with more then 3К types in binary format - 
> {{CacheObjectBinaryProcessorImpl}} size can grow to 10Mb
> The data in most cases contain duplicated structure and simple zip 
> compression can led to seriously reduce size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IGNITE-11358) Bug in ZK tests occurs periodically

2019-02-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin closed IGNITE-11358.
---

Will be fixed in IGNITE-11255.

> Bug in ZK tests occurs periodically
> ---
>
> Key: IGNITE-11358
> URL: https://issues.apache.org/jira/browse/IGNITE-11358
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> java.lang.NullPointerException
>   at 
> org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11358) Bug in ZK tests occurs periodically

2019-02-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin resolved IGNITE-11358.
-
Resolution: Duplicate

> Bug in ZK tests occurs periodically
> ---
>
> Key: IGNITE-11358
> URL: https://issues.apache.org/jira/browse/IGNITE-11358
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> java.lang.NullPointerException
>   at 
> org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277)
>   at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525)
>   at 
> org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997)
>   at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11262) Compression on Discovery data bag

2019-02-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772987#comment-16772987
 ] 

Pavel Voronkin commented on IGNITE-11262:
-

Hi, [~v.pyatkov] i've put comments, please have a look.

> Compression on Discovery data bag
> -
>
> Key: IGNITE-11262
> URL: https://issues.apache.org/jira/browse/IGNITE-11262
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Vladislav Pyatkov
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Size of GridComponetns data may increase significantly in large deployment.
> Examples:
> 1) In case of more then 3K caches with QueryEntry configured - size of 
> {{DiscoveryDataBag}}{{GridCacheProcessor}} data bag consume more then 20 Mb
> 2) If cluster contain more then 13K objects - 
> {{GridMarshallerMappingProcessor}} size more then 1 Mb
> 3) Cluster with more then 3К types in binary format - 
> {{CacheObjectBinaryProcessorImpl}} size can grow to 10Mb
> The data in most cases contain duplicated structure and simple zip 
> compression can led to seriously reduce size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.

2019-02-20 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Description: 
We need to fix:
 * 
CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
 * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes

 

ZookeeperDiscovery1:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv]

Platform NET:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv]

 

 

 

  was:
We need to fix:

 
 * 
CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
 * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes

 

ZookeeperDiscovery1:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv]

Platform NET:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv]

 

 

 


> Fix test failure after IGNITE-7648.
> ---
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  
> ZookeeperDiscovery1:
> [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv]
> Platform NET:
> [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv]
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.

2019-02-19 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Description: 
We need to fix:

 
 * 
CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
 * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes

 

ZookeeperDiscovery1:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv]

Platform NET:

[https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv]

 

 

 

  was:
We need to fix:

 
 * 
CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
 * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes

 


> Fix test failure after IGNITE-7648.
> ---
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  
> ZookeeperDiscovery1:
> [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv]
> Platform NET:
> [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv]
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11255) Fix test failure after IGNITE-7648.

2019-02-19 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771928#comment-16771928
 ] 

Pavel Voronkin commented on IGNITE-11255:
-

I've found another bug running tests related to AllNodesSupport is called while 
Spi is not initialized.

> Fix test failure after IGNITE-7648.
> ---
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11358) Bug in ZK tests occurs periodically

2019-02-19 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11358:
---

 Summary: Bug in ZK tests occurs periodically
 Key: IGNITE-11358
 URL: https://issues.apache.org/jira/browse/IGNITE-11358
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin


java.lang.NullPointerException
at 
org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430)
at 
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58)
at 
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88)
at 
org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525)
at 
org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139)
at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639)
at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997)
at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10928) After huge load on cluster and restart with walCompactionEnabled=True errors on log

2019-02-19 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771753#comment-16771753
 ] 

Pavel Voronkin commented on IGNITE-10928:
-

Looks good to me

> After huge load on cluster and restart with walCompactionEnabled=True errors 
> on log
> ---
>
> Key: IGNITE-10928
> URL: https://issues.apache.org/jira/browse/IGNITE-10928
> Project: Ignite
>  Issue Type: Bug
>  Components: data structures
>Affects Versions: 2.5
>Reporter: ARomantsov
>Assignee: Sergey Antonov
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java}
> 
>  class="org.apache.ignite.configuration.DataRegionConfiguration">
> 
> 
> 
> 
> 
> {code}
> {code:java}
> [15:30:56,809][INFO][wal-file-compressor-%null%-1-#68][FileWriteAheadLogManager]
>  Stopping WAL iteration due to an exception: Failed to read WAL record at 
> position: 28310114 size: -1, ptr=FileWALPointer [idx=35, fileOff=28310114, 
> len=0]
> [15:30:56,811][INFO][wal-file-compressor-%null%-3-#70][FileWriteAheadLogManager]
>  Stopping WAL iteration due to an exception: Failed to read WAL record at 
> position: 28303753 size: -1, ptr=FileWALPointer [idx=36, fileOff=28303753, 
> len=0]
> [15:30:56,811][SEVERE][wal-file-compressor-%null%-1-#68][FileWriteAheadLogManager]
>  Compression of WAL segment [idx=35] was skipped due to unexpected error
> class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at 
> position: 28310114 size: -1
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:292)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:258)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:154)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.SingleSegmentLogicalRecordsIterator.advance(SingleSegmentLogicalRecordsIterator.java:119)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:123)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:52)
> at 
> org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:41)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.compressSegmentToFile(FileWriteAheadLogManager.java:2039)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:1974)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:1950)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL 
> record at position: 28310114 size: -1
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:394)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:235)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:243)
> ... 10 more
> Caused by: java.nio.channels.ClosedByInterruptException
> at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.read(RandomAccessFileIO.java:58)
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.read(FileIODecorator.java:51)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.io.SimpleFileInput.ensure(SimpleFileInput.java:119)
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.io.FileInput$Crc32CheckingFileInput.ensure(FileInput.java:89)
> at 
> 

[jira] [Created] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.

2019-02-18 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11350:
---

 Summary: doInParallel interruption is not properly handled in 
ExchangeFuture.
 Key: IGNITE-11350
 URL: https://issues.apache.org/jira/browse/IGNITE-11350
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.

2019-02-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11350:

Attachment: GridGain_Tests_8.4_Java_8_Binary_Objects_DR_7222.log

> doInParallel interruption is not properly handled in ExchangeFuture.
> 
>
> Key: IGNITE-11350
> URL: https://issues.apache.org/jira/browse/IGNITE-11350
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> While sys pool tasks interrupted on stop
> detectLostPartitions() and resetLostPartitions() might endup 
> IgniteCheckedInterruptedException thrown which will cause node hang on stop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.

2019-02-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11350:

Description: 
While sys pool tasks interrupted on stop

detectLostPartitions() and resetLostPartitions() might endup 
IgniteCheckedInterruptedException thrown which will cause node hang on stop.

> doInParallel interruption is not properly handled in ExchangeFuture.
> 
>
> Key: IGNITE-11350
> URL: https://issues.apache.org/jira/browse/IGNITE-11350
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> While sys pool tasks interrupted on stop
> detectLostPartitions() and resetLostPartitions() might endup 
> IgniteCheckedInterruptedException thrown which will cause node hang on stop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.

2019-02-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11350:

Attachment: (was: GridGain_Tests_8.4_Java_8_Binary_Objects_DR_7222.log)

> doInParallel interruption is not properly handled in ExchangeFuture.
> 
>
> Key: IGNITE-11350
> URL: https://issues.apache.org/jira/browse/IGNITE-11350
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> While sys pool tasks interrupted on stop
> detectLostPartitions() and resetLostPartitions() might endup 
> IgniteCheckedInterruptedException thrown which will cause node hang on stop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-9113) Allocate memory for a data region when first cache assigned to this region is created

2019-02-14 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768268#comment-16768268
 ] 

Pavel Voronkin commented on IGNITE-9113:


[~NIzhikov] i also observe that persistent(true) is ignored and region always 
created non-persistent. Seems like a bug we shouuld either fail on start or 
support persistent regions.

> Allocate memory for a data region when first cache assigned to this region is 
> created
> -
>
> Key: IGNITE-9113
> URL: https://issues.apache.org/jira/browse/IGNITE-9113
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.6
>Reporter: Valentin Kulichenko
>Assignee: Nikolay Izhikov
>Priority: Major
> Fix For: 2.8
>
>
> Currently we do not create any regions or allocate any offheap memory on 
> client nodes unless it's explicitly configured. This is good behavior, 
> however there is a usability issue caused by the fact that many users have 
> the same config file for both server and clients. This can lead to unexpected 
> excessive memory usage on client side and forces users to maintain two config 
> files in most cases.
> Same issue is applied to server nodes that do not store any data (e.g. nodes 
> running only services).
> It's better to allocate memory dynamically, when first cache assigned to a 
> data region is created.
> More detailed discussion here: 
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-9113) Allocate memory for a data region when first cache assigned to this region is created

2019-02-14 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768268#comment-16768268
 ] 

Pavel Voronkin edited comment on IGNITE-9113 at 2/14/19 1:51 PM:
-

[~NIzhikov] i also observe that persistent(true) is ignored and region always 
created non-persistent on client. Seems like a bug we shouuld either fail on 
start or support persistent regions.


was (Author: voropava):
[~NIzhikov] i also observe that persistent(true) is ignored and region always 
created non-persistent. Seems like a bug we shouuld either fail on start or 
support persistent regions.

> Allocate memory for a data region when first cache assigned to this region is 
> created
> -
>
> Key: IGNITE-9113
> URL: https://issues.apache.org/jira/browse/IGNITE-9113
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Affects Versions: 2.6
>Reporter: Valentin Kulichenko
>Assignee: Nikolay Izhikov
>Priority: Major
> Fix For: 2.8
>
>
> Currently we do not create any regions or allocate any offheap memory on 
> client nodes unless it's explicitly configured. This is good behavior, 
> however there is a usability issue caused by the fact that many users have 
> the same config file for both server and clients. This can lead to unexpected 
> excessive memory usage on client side and forces users to maintain two config 
> files in most cases.
> Same issue is applied to server nodes that do not store any data (e.g. nodes 
> running only services).
> It's better to allocate memory dynamically, when first cache assigned to a 
> data region is created.
> More detailed discussion here: 
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().

2019-02-13 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767054#comment-16767054
 ] 

Pavel Voronkin commented on IGNITE-11288:
-

Thanks

> TcpDiscovery locks forever on SSLSocket.close().
> 
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();
> }
> } catch (InterruptedException var14) 
> { var3 = true; }
> if (var3) 
> { Thread.currentThread().interrupt(); }
> } else
> { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }{code}
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
> Solution:
> 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
> using iptables.
> 2) Set SO_LINGER to some reasonable positive value.
> Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
> Guys end up setting SO_LINGER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11308) Add soLinger parameter support in TcpDiscoverySpi .NET configuration.

2019-02-13 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11308:

Description: NET client should support TcpDiscoverry.soLinger parameter.

> Add soLinger parameter support in TcpDiscoverySpi .NET configuration.
> -
>
> Key: IGNITE-11308
> URL: https://issues.apache.org/jira/browse/IGNITE-11308
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> NET client should support TcpDiscoverry.soLinger parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11308) Add soLinger parameter support in TcpDiscoverySpi .NET configuration.

2019-02-13 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11308:
---

 Summary: Add soLinger parameter support in TcpDiscoverySpi .NET 
configuration.
 Key: IGNITE-11308
 URL: https://issues.apache.org/jira/browse/IGNITE-11308
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
using iptables.

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

Guys end up setting SO_LINGER.

 

  was:
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
using iptables.

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

Guys end up setting SO_LINGER.

 


> TcpDiscovery locks forever on SSLSocket.close().
> 
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();
> }
> } catch (InterruptedException var14) 
> { var3 = 

[jira] [Updated] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Summary: TcpDiscovery locks forever on SSLSocket.close().  (was: 
TcpDiscovery deadlock on SSLSocket.close().)

> TcpDiscovery locks forever on SSLSocket.close().
> 
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
> onTimeout hangs on writeLock. 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();
> }
> } catch (InterruptedException var14) 
> { var3 = true; }
> if (var3) 
> { Thread.currentThread().interrupt(); }
> } else
> { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }{code}
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
> Solution:
> 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
> using iptables.
> 2) Set SO_LINGER to some reasonable positive value.
> Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
> Guys end up setting SO_LINGER.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
using iptables.

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

Guys end up setting SO_LINGER.

 

  was:
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
using iptables.

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

Guys end up setting SO_LINGER> 

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
> onTimeout hangs on writeLock. 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();

[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets 
using iptables.

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

Guys end up setting SO_LINGER> 

 

  was:
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets 
using iptables .

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
> onTimeout hangs on writeLock. 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();
> }
> } catch 

[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \\{ this.writeLock.unlock(); }
 } else
 
 \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \\{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \\{ var3 = true; }
 
 if (var3) \\{ Thread.currentThread().interrupt(); }
 } else
 
 \\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 \{code}

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.


Solution:

1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets 
using iptables .

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 

  was:
Rootcause is we not set SO_TIMEOUT on discovery socket on retry:

RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs 
on writeLock. 

 

According to java8 SSLSocketImpl:
{code}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \{ this.writeLock.unlock(); }
 } else
 
 \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else
 
 \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 {code}

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

 

Solution:

1) Set proper SO_TIMEOUT

2) Possibly add ability to override SO_LINGER to some reasonable value.

 

Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
> RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
> onTimeout hangs on writeLock. 
> According to java8 SSLSocketImpl:
> {code}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
>  
>  finally \\{ this.writeLock.unlock(); }
>  } else
>  
>  \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
>  
>  else if (debug != null && Debug.isOn("ssl")) \\{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
>  
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \\{ var3 = true; }
> 

[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
 RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code:java}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
boolean var3 = Thread.interrupted();

try {
if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
try

{ this.writeRecordInternal(var1, var2); }

finally 
{ this.writeLock.unlock(); }
} else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) 
{ System.out.println(Thread.currentThread().getName() + ", received Exception: 
" + var4); }

this.sess.invalidate();
}
} catch (InterruptedException var14) 
{ var3 = true; }

if (var3) 
{ Thread.currentThread().interrupt(); }
} else

{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}{code}
In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

Solution:

1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets 
using iptables .

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 

  was:
Rootcause is java bug locking on SSLSocketImpl.close() on write lock:

//we create socket with soTimeout(0) here, but setting it here won't help 
anyway.
RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

//After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
onTimeout hangs on writeLock. 

According to java8 SSLSocketImpl:
{code}
if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \\{ this.writeLock.unlock(); }
 } else
 
 \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \\{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \\{ var3 = true; }
 
 if (var3) \\{ Thread.currentThread().interrupt(); }
 } else
 
 \\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 \{code}

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.


Solution:

1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets 
using iptables .

2) Set SO_LINGER to some reasonable positive value.

Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is java bug locking on SSLSocketImpl.close() on write lock:
> //we create socket with soTimeout(0) here, but setting it here won't help 
> anyway.
>  RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() 
> onTimeout hangs on writeLock. 
> According to java8 SSLSocketImpl:
> {code:java}
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
> boolean var3 = Thread.interrupted();
> try {
> if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
> try
> { this.writeRecordInternal(var1, var2); }
> finally 
> { this.writeLock.unlock(); }
> } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) 
> { System.out.println(Thread.currentThread().getName() + ", received 
> Exception: " + var4); }
> this.sess.invalidate();
> }
> } catch 

[jira] [Assigned] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-11288:
---

Assignee: Pavel Voronkin

> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is we not set SO_TIMEOUT on discovery socket on retry:
> RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout 
> hangs on writeLock. 
>  
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
>  
>  finally \{ this.writeLock.unlock(); }
>  } else
>  
>  \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
>  
>  else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
>  
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
>  
>  if (var3) \{ Thread.currentThread().interrupt(); }
>  } else
>  
>  \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
> U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
>  
> Solution:
> 1) Set proper SO_TIMEOUT
> 2) Possibly add ability to override SO_LINGER to some reasonable value.
>  
> Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is we not set SO_TIMEOUT on discovery socket on retry:

RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs 
on writeLock. 

 

According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \{ this.writeLock.unlock(); }
 } else
 
 \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else
 
 \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

 

Solution:

1) Set proper SO_TIMEOUT

2) Possibly add ability to override SO_LINGER to some reasonable value.

 

Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 

  was:
Rootcause is we not set SO_TIMEOUT on discovery socket on retry:

RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs 
on writeLock. 

 

According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \{ this.writeLock.unlock(); }
 } else
 
 { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) 
{ this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else
 
 { this.writeLock.lock(); try 
{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

 

Solution:

1) Set proper SO_TIMEOUT

2) Possibly add ability to override SO_LINGER to some reasonable value.

 

 

 

Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is we not set SO_TIMEOUT on discovery socket on retry:
> RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout 
> hangs on writeLock. 
>  
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
>  
>  finally \{ this.writeLock.unlock(); }
>  } else
>  
>  \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) { 
> this.fatal((byte)-1, (Throwable)var4); }
>  
>  else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
>  
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
>  
>  if (var3) \{ Thread.currentThread().interrupt(); }
>  } else
>  
>  \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to 

[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
Rootcause is we not set SO_TIMEOUT on discovery socket on retry:

RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);

So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs 
on writeLock. 

 

According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }
 
 finally \{ this.writeLock.unlock(); }
 } else
 
 { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
message cannot be sent."); if (this.isLayered() && !this.autoClose) 
{ this.fatal((byte)-1, (Throwable)var4); }
 
 else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else
 
 { this.writeLock.lock(); try 
{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

 

Solution:

1) Set proper SO_TIMEOUT

2) Possibly add ability to override SO_LINGER to some reasonable value.

 

 

 

Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261].

 

  was:
According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }

finally \{ this.writeLock.unlock(); }
 } else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }

this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }

if (var3) \{ Thread.currentThread().interrupt(); }
 } else

{ this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default 
value 0.

 

Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261.

 


> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rootcause is we not set SO_TIMEOUT on discovery socket on retry:
> RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper);
> So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout 
> hangs on writeLock. 
>  
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
>  
>  finally \{ this.writeLock.unlock(); }
>  } else
>  
>  { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) 
> { this.fatal((byte)-1, (Throwable)var4); }
>  
>  else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
>  
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
>  
>  if (var3) \{ Thread.currentThread().interrupt(); }
>  } else
>  
>  { this.writeLock.lock(); try 
> { this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero.
> U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
>  
> Solution:
> 1) Set proper SO_TIMEOUT
> 2) Possibly add ability to override SO_LINGER to some reasonable value.
>  
>  
>  
> Similar bug 

[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().

2019-02-12 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Summary: TcpDiscovery deadlock on SSLSocket.close().  (was: Missing 
SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() 
deadlock.)

> TcpDiscovery deadlock on SSLSocket.close().
> ---
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
> finally \{ this.writeLock.unlock(); }
>  } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
> this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
> if (var3) \{ Thread.currentThread().interrupt(); }
>  } else
> { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever.
> U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
> We need to make it configurable for TcpCommSpi and TcpDisco. I suggest 
> default value 0.
>  
> Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.

2019-02-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); }

finally \{ this.writeLock.unlock(); }
 } else

{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ 
this.fatal((byte)-1, (Throwable)var4); }

else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }

this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }

if (var3) \{ Thread.currentThread().interrupt(); }
 } else

{ this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default 
value 0.

 

Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261.

 

  was:
According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); }
 } else {
 SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent.");
 if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, 
(Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else {
 this.writeLock.lock();
 
 try \{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default 
value 0.

 


> Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing 
> SSLSocket.close() deadlock.
> -
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); }
> finally \{ this.writeLock.unlock(); }
>  } else
> { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ 
> this.fatal((byte)-1, (Throwable)var4); }
> else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
> this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
> if (var3) \{ Thread.currentThread().interrupt(); }
>  } else
> { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever.
> U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
> We need to make it configurable for TcpCommSpi and TcpDisco. I suggest 
> default value 0.
>  
> Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.

2019-02-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
According to java8 SSLSocketImpl:

if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try

{ this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); }
 } else {
 SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent.");
 if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, 
(Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ 
System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4); }
 
 this.sess.invalidate();
 }
 } catch (InterruptedException var14) \{ var3 = true; }
 
 if (var3) \{ Thread.currentThread().interrupt(); }
 } else {
 this.writeLock.lock();
 
 try \{ this.writeRecordInternal(var1, var2); }

finally

{ this.writeLock.unlock(); }

}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which wait 
forever.

U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.

We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default 
value 0.

 

  was:
According to java8 SSLSocketImpl:



if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

 try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try {
 this.writeRecordInternal(var1, var2);
 } finally {
 this.writeLock.unlock();
 }
 } else {
 SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent.");
 if (this.isLayered() && !this.autoClose) {
 this.fatal((byte)-1, (Throwable)var4);
 } else if (debug != null && Debug.isOn("ssl")) {
 System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4);
 }

 this.sess.invalidate();
 }
 } catch (InterruptedException var14) {
 var3 = true;
 }

 if (var3) {
 Thread.currentThread().interrupt();
 }
} else {
 this.writeLock.lock();

 try {
 this.writeRecordInternal(var1, var2);
 } finally {
 this.writeLock.unlock();
 }
}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which 
might fail forever.


> Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing 
> SSLSocket.close() deadlock.
> -
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
> try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try
> { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); 
> }
>  } else {
>  SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent.");
>  if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, 
> (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ 
> System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4); }
>  
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) \{ var3 = true; }
>  
>  if (var3) \{ Thread.currentThread().interrupt(); }
>  } else {
>  this.writeLock.lock();
>  
>  try \{ this.writeRecordInternal(var1, var2); }
> finally
> { this.writeLock.unlock(); }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> wait forever.
> U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative.
> We need to make it configurable for TcpCommSpi and TcpDisco. I suggest 
> default value 0.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.

2019-02-11 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11288:

Description: 
According to java8 SSLSocketImpl:



if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
 boolean var3 = Thread.interrupted();

 try {
 if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
 try {
 this.writeRecordInternal(var1, var2);
 } finally {
 this.writeLock.unlock();
 }
 } else {
 SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message 
cannot be sent.");
 if (this.isLayered() && !this.autoClose) {
 this.fatal((byte)-1, (Throwable)var4);
 } else if (debug != null && Debug.isOn("ssl")) {
 System.out.println(Thread.currentThread().getName() + ", received Exception: " 
+ var4);
 }

 this.sess.invalidate();
 }
 } catch (InterruptedException var14) {
 var3 = true;
 }

 if (var3) {
 Thread.currentThread().interrupt();
 }
} else {
 this.writeLock.lock();

 try {
 this.writeRecordInternal(var1, var2);
 } finally {
 this.writeLock.unlock();
 }
}

 

In case of soLinger is not set we fallback to this.writeLock.lock(); which 
might fail forever.

> Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing 
> SSLSocket.close() deadlock.
> -
>
> Key: IGNITE-11288
> URL: https://issues.apache.org/jira/browse/IGNITE-11288
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Critical
>
> According to java8 SSLSocketImpl:
> if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) {
>  boolean var3 = Thread.interrupted();
>  try {
>  if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) {
>  try {
>  this.writeRecordInternal(var1, var2);
>  } finally {
>  this.writeLock.unlock();
>  }
>  } else {
>  SSLException var4 = new SSLException("SO_LINGER timeout, close_notify 
> message cannot be sent.");
>  if (this.isLayered() && !this.autoClose) {
>  this.fatal((byte)-1, (Throwable)var4);
>  } else if (debug != null && Debug.isOn("ssl")) {
>  System.out.println(Thread.currentThread().getName() + ", received Exception: 
> " + var4);
>  }
>  this.sess.invalidate();
>  }
>  } catch (InterruptedException var14) {
>  var3 = true;
>  }
>  if (var3) {
>  Thread.currentThread().interrupt();
>  }
> } else {
>  this.writeLock.lock();
>  try {
>  this.writeRecordInternal(var1, var2);
>  } finally {
>  this.writeLock.unlock();
>  }
> }
>  
> In case of soLinger is not set we fallback to this.writeLock.lock(); which 
> might fail forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.

2019-02-11 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11288:
---

 Summary: Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi 
causing SSLSocket.close() deadlock.
 Key: IGNITE-11288
 URL: https://issues.apache.org/jira/browse/IGNITE-11288
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Summary: Fix test failure after IGNITE-7648.  (was: Fix test failures after 
IGNITE-7648.)

> Fix test failure after IGNITE-7648.
> ---
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-7648:
---
Description: 
IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 
as a way to prevent unnecessary node drops in case of short network problems.

I suppose it's wrong decision to fix it in such way.

We had faced some issues in our production due to lack of automatic kicking of 
ill-behaving nodes (on example, hanging due to long GC pauses) until we 
realised the necessity of changing default behavior via property.

Right solution is to kick nodes only if failure threshold is reached. Such 
behavior should be always enabled.

UPDATE: During a discussion it was decided what the property will remain 
disabled by default.

We decided to change timeout logic in case of failure detection enabled. We 
start performing connect and handshake from 500ms increasing using exponential 
backoff strategy.

 

 

  was:
IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 
as a way to prevent unnecessary node drops in case of short network problems.

I suppose it's wrong decision to fix it in such way.

We had faced some issues in our production due to lack of automatic kicking of 
ill-behaving nodes (on example, hanging due to long GC pauses) until we 
realised the necessity of changing default behavior via property.

Right solution is to kick nodes only if failure threshold is reached. Such 
behavior should be always enabled.

UPDATE: During a discussion it was decided what the property will remain 
disabled by default.
 


> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
> We decided to change timeout logic in case of failure detection enabled. We 
> start performing connect and handshake from 500ms increasing using 
> exponential backoff strategy.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-7648:
---
Comment: was deleted

(was: We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket.

Only server node can kill client node in case if property is enabled.

client can't kill server, server can't kill server.

Timeout logic changed in case of failure detection enabled scenario

We start connect and hanshake from timeout 500ms. If failed we increase timeout 
using exponential backoff strategy

timeout = Math.min(Math.min(timeout * 2, maxTimeout), 
remainingTiimeTillFailureDetection)

 

 

 

 

 

 

 )

> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-08 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763557#comment-16763557
 ] 

Pavel Voronkin commented on IGNITE-7648:


We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket.

Only server node can kill client node in case if property is enabled.

client can't kill server, server can't kill server.

Timeout logic changed in case of failure detection enabled scenario

We start connect and hanshake from timeout 500ms. If failed we increase timeout 
using exponential backoff strategy

timeout = Math.min(Math.min(timeout * 2, maxTimeout), 
remainingTiimeTillFailureDetection)

 

 

 

 

 

 

 

> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-08 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763556#comment-16763556
 ] 

Pavel Voronkin edited comment on IGNITE-7648 at 2/8/19 12:48 PM:
-

We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket.

Only server node can kill client node in case if property is enabled.

client can't kill server, server can't kill server.

Timeout logic changed in case of failure detection enabled scenario

We start connect and hanshake from timeout 500ms. If failed we increase timeout 
using exponential backoff strategy

timeout = Math.min(Math.min(timeout * 2, maxTimeout), 
remainingTiimeTillFailureDetection)

 


was (Author: voropava):
We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket.

Only server node can kill client node in case if property is enabled.

client can't kill server, server can't kill server.

Timeout logic changed in case of failure detection enabled scenario

We start connect and hanshake from timeout 500ms. If failed we increase timeout 
using exponential backoff strategy

timeout = Math.min(Math.min(timeout * 2, maxTimeout), 
remainingTiimeTillFailureDetection)

 

 

 

 

 

 

> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-08 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763556#comment-16763556
 ] 

Pavel Voronkin commented on IGNITE-7648:


We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket.

Only server node can kill client node in case if property is enabled.

client can't kill server, server can't kill server.

Timeout logic changed in case of failure detection enabled scenario

We start connect and hanshake from timeout 500ms. If failed we increase timeout 
using exponential backoff strategy

timeout = Math.min(Math.min(timeout * 2, maxTimeout), 
remainingTiimeTillFailureDetection)

 

 

 

 

 

 

> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failures after IGNITE-7648.

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Summary: Fix test failures after IGNITE-7648.  (was: Fix test failure after 
IGNITE-7648.)

> Fix test failures after IGNITE-7648.
> 
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Summary: Fix test failure after IGNITE-7648.  (was: Fix test failure after 
IGNITE-7648)

> Fix test failure after IGNITE-7648.
> ---
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11255) Fix test failure after IGNITE-7648

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-11255:
---

Assignee: Pavel Voronkin

> Fix test failure after IGNITE-7648
> --
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Description: 
We need to fix:

 
 * 
CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
 * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes

 

> Fix test failure after IGNITE-7648
> --
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We need to fix:
>  
>  * 
> CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False)
>  * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3
>  * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648

2019-02-08 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11255:

Labels: MakeTeamcityGreenAgain  (was: )

> Fix test failure after IGNITE-7648
> --
>
> Key: IGNITE-11255
> URL: https://issues.apache.org/jira/browse/IGNITE-11255
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11255) Fix test failure after IGNITE-7648

2019-02-08 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11255:
---

 Summary: Fix test failure after IGNITE-7648
 Key: IGNITE-11255
 URL: https://issues.apache.org/jira/browse/IGNITE-11255
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11221) Refactor timeout logic in TcpDiscovery

2019-02-06 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-11221:
---

Assignee: Stanilovsky Evgeny

> Refactor timeout logic in TcpDiscovery
> --
>
> Key: IGNITE-11221
> URL: https://issues.apache.org/jira/browse/IGNITE-11221
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Stanilovsky Evgeny
>Priority: Major
>
> We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing 
> exception handling and timeout calculation.
> We need to reuse ExponentialBackoffTimeout to encapsulate logic of 
> calculating different sets of timeout separately and get rid of many local 
> variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11221) Refactor timeout logic in TcpDiscovery

2019-02-06 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11221:

Description: 
We need to replace IgniteSpiOperationTimeoutHelper with TimeoutStrategy 
introduced in IGNITE-7648, cause it's mixing exception handling and timeout 
calculation.

We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating 
different sets of timeout separately and get rid of many local variables.

  was:
We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing 
exception handling and timeout calculation.

We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating 
different sets of timeout separately and get rid of many local variables.


> Refactor timeout logic in TcpDiscovery
> --
>
> Key: IGNITE-11221
> URL: https://issues.apache.org/jira/browse/IGNITE-11221
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Stanilovsky Evgeny
>Priority: Major
>
> We need to replace IgniteSpiOperationTimeoutHelper with TimeoutStrategy 
> introduced in IGNITE-7648, cause it's mixing exception handling and timeout 
> calculation.
> We need to reuse ExponentialBackoffTimeout to encapsulate logic of 
> calculating different sets of timeout separately and get rid of many local 
> variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-6324) Transactional cache data partially available after crash.

2019-02-05 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-6324:
---
Summary: Transactional cache data partially available after crash.  (was: 
Transactional cache data partially available after crash)

> Transactional cache data partially available after crash.
> -
>
> Key: IGNITE-6324
> URL: https://issues.apache.org/jira/browse/IGNITE-6324
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 1.9, 2.1
>Reporter: Stanilovsky Evgeny
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
> Attachments: InterruptCommitedThreadTest.java
>
>
> If InterruptedException raise in client code during pds store operations we 
> can obtain inconsistent cache after restart. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11221) Refactor timeout logic in TcpDiscovery

2019-02-05 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11221:
---

 Summary: Refactor timeout logic in TcpDiscovery
 Key: IGNITE-11221
 URL: https://issues.apache.org/jira/browse/IGNITE-11221
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin


We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing 
exception handling and timeout calculation.

We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating 
different sets of timeout separately and get rid of many local variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.

2019-02-05 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-7648:
--

Assignee: Pavel Voronkin  (was: Alexei Scherbakov)

> Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
> -
>
> Key: IGNITE-7648
> URL: https://issues.apache.org/jira/browse/IGNITE-7648
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.3
>Reporter: Alexei Scherbakov
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
>
> IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in 
> IGNITE-5718 as a way to prevent unnecessary node drops in case of short 
> network problems.
> I suppose it's wrong decision to fix it in such way.
> We had faced some issues in our production due to lack of automatic kicking 
> of ill-behaving nodes (on example, hanging due to long GC pauses) until we 
> realised the necessity of changing default behavior via property.
> Right solution is to kick nodes only if failure threshold is reached. Such 
> behavior should be always enabled.
> UPDATE: During a discussion it was decided what the property will remain 
> disabled by default.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.

2019-02-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin resolved IGNITE-11201.
-
Resolution: Duplicate

> ConnectorConfiguration and TransactionConfiguration toString is not properly 
> implemented.
> -
>
> Key: IGNITE-11201
> URL: https://issues.apache.org/jira/browse/IGNITE-11201
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> Ignite configuration prints on startup, but ConnectorConfiguration and 
> TransactionConfiguration are not properly printed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.

2019-02-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11201:

Description: Ignite configuration prints on startup, but 
ConnectorConfiguration and TransactionConfiguration are not properly printed.

> ConnectorConfiguration and TransactionConfiguration toString is not properly 
> implemented.
> -
>
> Key: IGNITE-11201
> URL: https://issues.apache.org/jira/browse/IGNITE-11201
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> Ignite configuration prints on startup, but ConnectorConfiguration and 
> TransactionConfiguration are not properly printed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented

2019-02-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11201:

Summary: ConnectorConfiguration and TransactionConfiguration toString is 
not properly implemented  (was: ConnectorConfdiguration and 
TransactionConfiguration toString is not properly implemented.)

> ConnectorConfiguration and TransactionConfiguration toString is not properly 
> implemented
> 
>
> Key: IGNITE-11201
> URL: https://issues.apache.org/jira/browse/IGNITE-11201
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.

2019-02-04 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11201:

Summary: ConnectorConfiguration and TransactionConfiguration toString is 
not properly implemented.  (was: ConnectorConfiguration and 
TransactionConfiguration toString is not properly implemented)

> ConnectorConfiguration and TransactionConfiguration toString is not properly 
> implemented.
> -
>
> Key: IGNITE-11201
> URL: https://issues.apache.org/jira/browse/IGNITE-11201
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11201) ConnectorConfdiguration and TransactionConfiguration toString is not properly implemented.

2019-02-04 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11201:
---

 Summary: ConnectorConfdiguration and TransactionConfiguration 
toString is not properly implemented.
 Key: IGNITE-11201
 URL: https://issues.apache.org/jira/browse/IGNITE-11201
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11172) While handling duplicated connections we got exception on writing message to stale connection.

2019-02-01 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11172:

Summary: While handling duplicated connections we got exception on writing 
message to stale connection.  (was: On receiving duplicated connections we got 
exception on writing message on stale connections.)

> While handling duplicated connections we got exception on writing message to 
> stale connection.
> --
>
> Key: IGNITE-11172
> URL: https://issues.apache.org/jira/browse/IGNITE-11172
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> [2019-01-31 16:10:19,072][INFO 
> ][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming 
> connection from remote node while connecting to this node, rejecting 
> [locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, 
> rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2]
> [2019-01-31 
> 16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] 
> Failed to process selector key [ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
> finished=false, hashCode=848731852, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
> bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
> super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, 
> rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, 
> closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, 
> bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, 
> lastRcvTime=1548940219115, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=false, markedForClose=true]]]
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
> finished=false, hashCode=848731852, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
> bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
> super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, 

[jira] [Created] (IGNITE-11172) On receiving duplicated connections we got exception.

2019-02-01 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11172:
---

 Summary: On receiving duplicated connections we got exception.
 Key: IGNITE-11172
 URL: https://issues.apache.org/jira/browse/IGNITE-11172
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin


[2019-01-31 16:10:19,072][INFO 
][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming 
connection from remote node while connecting to this node, rejecting 
[locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, 
rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2]
[2019-01-31 
16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] 
Failed to process selector key [ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
finished=false, hashCode=848731852, interrupted=false, 
runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 
lim=32511 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 
cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList 
[172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], 
discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, 
ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList 
[172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], 
discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, 
ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, 
rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, 
closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, 
bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, 
lastRcvTime=1548940219115, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=false, markedForClose=true]]]
javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
finished=false, hashCode=848731852, interrupted=false, 
runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 
lim=32511 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 
cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList 
[172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], 
discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, 
ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList 
[172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], 
discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, 
ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, 
rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, 
closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, 
bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, 
lastRcvTime=1548940219115, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=false, markedForClose=true]]]
 at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
 at 

[jira] [Updated] (IGNITE-11172) On receiving duplicated connections we got exception on writing message on stale connections.

2019-02-01 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11172:

Summary: On receiving duplicated connections we got exception on writing 
message on stale connections.  (was: On receiving duplicated connections we got 
exception.)

> On receiving duplicated connections we got exception on writing message on 
> stale connections.
> -
>
> Key: IGNITE-11172
> URL: https://issues.apache.org/jira/browse/IGNITE-11172
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> [2019-01-31 16:10:19,072][INFO 
> ][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming 
> connection from remote node while connecting to this node, rejecting 
> [locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, 
> rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2]
> [2019-01-31 
> 16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] 
> Failed to process selector key [ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
> finished=false, hashCode=848731852, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
> bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
> super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, 
> rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, 
> closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, 
> bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, 
> lastRcvTime=1548940219115, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=false, markedForClose=true]]]
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, 
> finished=false, hashCode=848731852, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, 
> bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, 
> super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, 
> rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, 
> node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, 
> addrs=ArrayList [172.25.1.12], sockAddrs=HashSet 
> [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, 
> intOrder=2, lastExchangeTime=1548940115834, loc=false, 
> ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, 
> connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], 
> 

[jira] [Updated] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.

2019-01-29 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11126:

Description: 
We need to rework createShmemClient() logic, to support failure 
detection/exponential backoff timeout logic intruduced in IGNITE-7648.

Also isRecoverableError() sleep loop needs to be implemented in case of 
exception.

  was:We need to rework 


> Rework TcpCommunicationSpi.createShmemClient failure detection logic.
> -
>
> Key: IGNITE-11126
> URL: https://issues.apache.org/jira/browse/IGNITE-11126
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> We need to rework createShmemClient() logic, to support failure 
> detection/exponential backoff timeout logic intruduced in IGNITE-7648.
> Also isRecoverableError() sleep loop needs to be implemented in case of 
> exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.

2019-01-29 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11126:

Description: We need to rework 

> Rework TcpCommunicationSpi.createShmemClient failure detection logic.
> -
>
> Key: IGNITE-11126
> URL: https://issues.apache.org/jira/browse/IGNITE-11126
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>
> We need to rework 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.

2019-01-29 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11126:
---

 Summary: Rework TcpCommunicationSpi.createShmemClient failure 
detection logic.
 Key: IGNITE-11126
 URL: https://issues.apache.org/jira/browse/IGNITE-11126
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8894) Provide information about coordinator in control.sh output

2019-01-28 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753999#comment-16753999
 ] 

Pavel Voronkin commented on IGNITE-8894:


Looks good for me.

> Provide information about coordinator in control.sh output
> --
>
> Key: IGNITE-8894
> URL: https://issues.apache.org/jira/browse/IGNITE-8894
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.5
>Reporter: Sergey Kosarev
>Assignee: Sergey Kosarev
>Priority: Minor
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Information about coordinator can be added in an existing command (i.e. 
> --state, --baseline)
> either a new command can be introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-10876) "Affinity changes (coordinator) applied" can be executed in parallel

2019-01-28 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-10876:
---

Assignee: Pavel Voronkin

> "Affinity changes (coordinator) applied" can be executed in parallel
> 
>
> Key: IGNITE-10876
> URL: https://issues.apache.org/jira/browse/IGNITE-10876
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
>
> There is for loop over all cache groups which execution N*P operations in 
> exchange worker where N is number of cache groups, P is number of partitions.
> We spend 80% of time in a loop
> for (CacheGroupContext grp : cctx.cache().cacheGroups()){
> GridDhtPartitionTopology top = grp != null ? grp.topology() : 
> cctx.exchange().clientTopology(grp.groupId(), events().discoveryCache());
> top.beforeExchange(this, true, true);
> } 
> I believe we can execute it in parallel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11017:

Description: 
VisorCache doesn't show cacheSize() nevertheless VisorCache object contains 
cacheSize.

 

  was:GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not 
EVICTED partitions on calculating entries size.


> Visor doesn't show cacheSize metrics.
> -
>
> Key: IGNITE-11017
> URL: https://issues.apache.org/jira/browse/IGNITE-11017
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> VisorCache doesn't show cacheSize() nevertheless VisorCache object contains 
> cacheSize.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11017:

Description: 
VisorCache doesn't show cacheSize() on the caches screen nevertheless 
VisorCache object contains cacheSize.

 

  was:
VisorCache doesn't show cacheSize() nevertheless VisorCache object contains 
cacheSize.

 


> Visor doesn't show cacheSize metrics.
> -
>
> Key: IGNITE-11017
> URL: https://issues.apache.org/jira/browse/IGNITE-11017
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> VisorCache doesn't show cacheSize() on the caches screen nevertheless 
> VisorCache object contains cacheSize.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11017:

Summary: Visor doesn't show cacheSize metrics.  (was: OffheapEntriesCount 
metrics calculate size on all not EVICTED partitions)

> Visor doesn't show cacheSize metrics.
> -
>
> Key: IGNITE-11017
> URL: https://issues.apache.org/jira/browse/IGNITE-11017
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not 
> EVICTED partitions on calculating entries size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IGNITE-11061) Сopyright still points out 2018

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin closed IGNITE-11061.
---
Ignite Flags:   (was: Docs Required)

Created ticket by mistake

> Сopyright still points out 2018
> ---
>
> Key: IGNITE-11061
> URL: https://issues.apache.org/jira/browse/IGNITE-11061
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11061) Сopyright still points out 2018

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin resolved IGNITE-11061.
-
Resolution: Invalid

> Сopyright still points out 2018
> ---
>
> Key: IGNITE-11061
> URL: https://issues.apache.org/jira/browse/IGNITE-11061
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-11061) Сopyright still points out 2018

2019-01-24 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751194#comment-16751194
 ] 

Pavel Voronkin edited comment on IGNITE-11061 at 1/24/19 2:38 PM:
--

Created ticket by mistake.


was (Author: voropava):
Created ticket by mistake

> Сopyright still points out 2018
> ---
>
> Key: IGNITE-11061
> URL: https://issues.apache.org/jira/browse/IGNITE-11061
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11061) Сopyright still points out 2018

2019-01-24 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11061:
---

 Summary: Сopyright still points out 2018
 Key: IGNITE-11061
 URL: https://issues.apache.org/jira/browse/IGNITE-11061
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Fix Version/s: 2.8

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: IgniteClientConnectSslTest.java
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11054:

Description: 
We have bug in processWrite()

 

 

> GridNioServer.processWrite() reordered socket.write and onMessageWritten 
> callback.
> --
>
> Key: IGNITE-11054
> URL: https://issues.apache.org/jira/browse/IGNITE-11054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> We have bug in processWrite()
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Ignite Flags:   (was: Docs Required)

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: IgniteClientConnectSslTest.java
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.

2019-01-24 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11054:

Description: 
We have bug in processWrite()

SessionWriteRequest.onMessageWritten() is invoked before actual write to socket.

 

 

  was:
We have bug in processWrite()

 

 


> GridNioServer.processWrite() reordered socket.write and onMessageWritten 
> callback.
> --
>
> Key: IGNITE-11054
> URL: https://issues.apache.org/jira/browse/IGNITE-11054
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> We have bug in processWrite()
> SessionWriteRequest.onMessageWritten() is invoked before actual write to 
> socket.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.

2019-01-23 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11054:
---

 Summary: GridNioServer.processWrite() reordered socket.write and 
onMessageWritten callback.
 Key: IGNITE-11054
 URL: https://issues.apache.org/jira/browse/IGNITE-11054
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET.

2019-01-23 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin resolved IGNITE-11026.
-
Resolution: Won't Fix

We decided no to introduce new parameters.

> Support TcpCommunicationSpi.NeedWaitDelay, 
> TcpCommunicationSpi.MaxNeedWaitDelay in .NET.
> 
>
> Key: IGNITE-11026
> URL: https://issues.apache.org/jira/browse/IGNITE-11026
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-23 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750679#comment-16750679
 ] 

Pavel Voronkin commented on IGNITE-11016:
-

I agree with that we need to add failure detection logic here in another jira.

Maybe it would be IGNITE-7648.

I will change odd initial delay of 1ms and will implement failure detection 
logic in another jira.

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-22 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749517#comment-16749517
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

Thanks for your feedback [~ascherbakov], i've resolved them.

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11031) Improve test coverage on ssl and fix existing ssl tcp communication spi tests.

2019-01-22 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11031:
---

 Summary: Improve test coverage on ssl and fix existing ssl tcp 
communication spi tests.
 Key: IGNITE-11031
 URL: https://issues.apache.org/jira/browse/IGNITE-11031
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.

2019-01-22 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11026:
---

 Summary: Support TcpCommunicationSpi.NeedWaitDelay, 
TcpCommunicationSpi.MaxNeedWaitDelay.
 Key: IGNITE-11026
 URL: https://issues.apache.org/jira/browse/IGNITE-11026
 Project: Ignite
  Issue Type: Improvement
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET.

2019-01-22 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11026:

Summary: Support TcpCommunicationSpi.NeedWaitDelay, 
TcpCommunicationSpi.MaxNeedWaitDelay in .NET.  (was: Support 
TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.)

> Support TcpCommunicationSpi.NeedWaitDelay, 
> TcpCommunicationSpi.MaxNeedWaitDelay in .NET.
> 
>
> Key: IGNITE-11026
> URL: https://issues.apache.org/jira/browse/IGNITE-11026
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-22 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin reassigned IGNITE-11016:
---

Assignee: Pavel Voronkin

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11017:

Description: GridDhtPartitionTopologyImpl.CurrentPartitionsIterator 
iterates over not EVICTED partitions on calculating entries size.

> OffheapEntriesCount metrics calculate size on all not EVICTED partitions
> 
>
> Key: IGNITE-11017
> URL: https://issues.apache.org/jira/browse/IGNITE-11017
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not 
> EVICTED partitions on calculating entries size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions

2019-01-21 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11017:
---

 Summary: OffheapEntriesCount metrics calculate size on all not 
EVICTED partitions
 Key: IGNITE-11017
 URL: https://issues.apache.org/jira/browse/IGNITE-11017
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748456#comment-16748456
 ] 

Pavel Voronkin commented on IGNITE-11016:
-

Reproducer attached.

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Attachment: IgniteClientConnectSslTest.java

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
> Attachments: IgniteClientConnectSslTest.java
>
>
> Problem: 
> In case of initiator node haven't joined topology yet (doesn't exist in 
> DiscoCache, but exists in TcpDsicovery ring)
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else clause:
> if (unknownNode)
> { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']'); ses.close(); }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut)
> { ses.close(); }
> });
>  }
> In case of SSL such code do encrypt and send concurrently with 
> session.close() which results in exception:
>  javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                  at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                  at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                  at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                  at java.lang.Thread.run(Thread.java:745)
>   
> So initiator receive closed exception instead of NEED_WAIT message which 
> leads to exception scenario.
> As result instead of NEED_WAIT loop we retry with exception N times and fail.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Summary: RecoveryLastReceivedMessage(NEED_WAIT) write message failed in 
case of SSL  (was: NEED_WAIT write message failed in case of SSL)

> RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Description: 
Problem: 

In case of initiator node haven't joined topology yet (doesn't exist in 
DiscoCache, but exists in TcpDsicovery ring)

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else clause:

if (unknownNode)

{ U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']'); ses.close(); }

else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut)

{ ses.close(); }

});
 }

In case of SSL such code do encrypt and send concurrently with session.close() 
which results in exception:


 javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                 at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
                 at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
                 at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
                 at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
                 at java.lang.Thread.run(Thread.java:745)
  
So initiator receive closed exception instead of NEED_WAIT message which leads 
to exception scenario.

As result instead of NEED_WAIT loop we retry with exception N times and fail.

 

  was:
The problem is that in case of initiator node doesn't exist in DiscoCache, but 
exists in TcpDsicovery ring

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else:

 

if (unknownNode) {
 U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']');

 ses.close();
}
else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut) {
 ses.close();
 }
 });
}

In case of SSL such code do encrypt and send concurrently with close which 
results in :

 
javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                at 

[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Description: 
The problem is that in case of initiator node doesn't exist in DiscoCache, but 
exists in TcpDsicovery ring

we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
else:

 

if (unknownNode) {
 U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
ses=" + ses + ']');

 ses.close();
}
else {
 ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
CI1>() {
 @Override public void apply(IgniteInternalFuture fut) {
 ses.close();
 }
 });
}

In case of SSL such code do encrypt and send concurrently with close which 
results in :

 
javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
[status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
[worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
hashCode=1324367867, interrupted=false, 
runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
 [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
select=true, super=]DirectNioClientWorker [super=], 
writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, 
outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, 
rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, 
bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, 
sndSchedTime=1544502852522, lastSndTime=1544502852522, 
lastRcvTime=1544502852522, readsPaused=false, 
filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
filter], accepted=true, markedForClose=true]]]
                at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
                at 
org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
                at 
org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
                at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
                at java.lang.Thread.run(Thread.java:745)
 
 
 

 

 

 

 

> RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> The problem is that in case of initiator node doesn't exist in DiscoCache, 
> but exists in TcpDsicovery ring
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else:
>  
> if (unknownNode) {
>  U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']');
>  ses.close();
> }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut) {
>  ses.close();
>  }
>  });
> }
> In case of SSL such code do encrypt and send concurrently with close which 
> results in :
>  
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 

[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".

2019-01-21 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-11016:

Summary: RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to 
encrypt data (SSL engine error)".  (was: RecoveryLastReceivedMessage(NEED_WAIT) 
write message failed in case of SSL)

> RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data 
> (SSL engine error)".
> --
>
> Key: IGNITE-11016
> URL: https://issues.apache.org/jira/browse/IGNITE-11016
> Project: Ignite
>  Issue Type: Bug
>Reporter: Pavel Voronkin
>Priority: Major
>
> The problem is that in case of initiator node doesn't exist in DiscoCache, 
> but exists in TcpDsicovery ring
> we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below 
> else:
>  
> if (unknownNode) {
>  U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", 
> ses=" + ses + ']');
>  ses.close();
> }
> else {
>  ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new 
> CI1>() {
>  @Override public void apply(IgniteInternalFuture fut) {
>  ses.close();
>  }
>  });
> }
> In case of SSL such code do encrypt and send concurrently with close which 
> results in :
>  
> javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) 
> [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl 
> [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, 
> igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, 
> hashCode=1324367867, interrupted=false, 
> runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker
>  [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, 
> select=true, super=]DirectNioClientWorker [super=], 
> writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], 
> readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
> inRecovery=null, outRecovery=null, super=GridNioSessionImpl 
> [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, 
> createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, 
> bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, 
> lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, 
> filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL 
> filter], accepted=true, markedForClose=true]]]
>                 at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380)
>                 at 
> org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138)
>                 at 
> org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792)
>                 at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>                 at java.lang.Thread.run(Thread.java:745)
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11016) NEED_WAIT write message failed in case of SSL

2019-01-21 Thread Pavel Voronkin (JIRA)
Pavel Voronkin created IGNITE-11016:
---

 Summary: NEED_WAIT write message failed in case of SSL
 Key: IGNITE-11016
 URL: https://issues.apache.org/jira/browse/IGNITE-11016
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Voronkin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/21/19 6:59 AM:
--

I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix all issues there.

 


was (Author: voropava):
I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix issue there.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-20 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

I don't think it breaks compatibilty, cause we have ignite property to rollback 
to original behaviour for mixed envs.

Moveover GridAffinityAssignment serialization is broken right now. See 
IGNITE-10925, we need to fix issue there.

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Fix For: 2.8
>
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083
 ] 

Pavel Voronkin edited comment on IGNITE-10877 at 1/18/19 9:34 AM:
--

16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

At nodes more than 500 we need compacted BitSet see 

 

 


was (Author: voropava):
16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

I suggest to have threshold of 500.

 

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

16part, 16 nodes

HashSet

!image-2019-01-18-12-09-32-876.png!

BitSet

!image-2019-01-18-12-09-04-835.png!

 

In total we have:

N - number of nodes

P - number of parts

low P, low N  - BitSet better

high P, low N - BitSet better

low P, high N - BitSet slightly better

high P, high N - HashSet is better

I suggest to have threshold of 500.

 

 

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-12-09-32-876.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-12-09-04-835.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, 
> image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745931#comment-16745931
 ] 

Pavel Voronkin commented on IGNITE-10877:
-

1024part 4k nodes

HashSet

!image-2019-01-18-11-56-10-339.png!

BitSet

!image-2019-01-18-11-56-18-040.png!

 On as small number of partitions BitSet is roughly the same with HashSet.

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-56-18-040.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-56-10-339.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, 
> image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure

2019-01-18 Thread Pavel Voronkin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Voronkin updated IGNITE-10877:

Attachment: image-2019-01-18-11-55-39-496.png

> GridAffinityAssignment.initPrimaryBackupMaps memory pressure
> 
>
> Key: IGNITE-10877
> URL: https://issues.apache.org/jira/browse/IGNITE-10877
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Pavel Voronkin
>Assignee: Pavel Voronkin
>Priority: Major
> Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, 
> image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, 
> image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, 
> image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, 
> image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> 1) While running tests with JFR we observe huge memory allocation pressure 
> produced by:
> *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)*
> java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 
> 100
>  java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 
> 784 100
>  java.util.HashMap.put(Object, Object) 481 298 044 784 100
>  java.util.HashSet.add(Object) 480 297 221 040 99,724
>  
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps()
>  1 823 744 0,276
> org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion,
>  List, List) 1 823 744 0,276
> *Allocation stats*
> Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB 
> Size(bytes) Total TLAB Size(bytes) Pressure(%)
>  java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876
>  java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687
>  java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 
> 11,341
>  java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654
>  java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198
> 2) Also another hot place found
> Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)
>  java.util.ArrayList.grow(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554
>  java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554
>  java.util.ArrayList.add(Object) 7 5 766 448 9,554
>  
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int,
>  AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 
> 766 448 9,554
> The reason of that is defail
> I think we need to improve memory efficiency by switching from from Sets to 
> BitSets
>  
> JFR attached, see Allocations in 12:50:28 - 12:50:29
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >