[jira] [Commented] (IGNITE-11262) Compression on Discovery data bag
[ https://issues.apache.org/jira/browse/IGNITE-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774047#comment-16774047 ] Pavel Voronkin commented on IGNITE-11262: - Thanks [~v.pyatkov] changes looks good to me > Compression on Discovery data bag > - > > Key: IGNITE-11262 > URL: https://issues.apache.org/jira/browse/IGNITE-11262 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Size of GridComponetns data may increase significantly in large deployment. > Examples: > 1) In case of more then 3K caches with QueryEntry configured - size of > {{DiscoveryDataBag}}{{GridCacheProcessor}} data bag consume more then 20 Mb > 2) If cluster contain more then 13K objects - > {{GridMarshallerMappingProcessor}} size more then 1 Mb > 3) Cluster with more then 3К types in binary format - > {{CacheObjectBinaryProcessorImpl}} size can grow to 10Mb > The data in most cases contain duplicated structure and simple zip > compression can led to seriously reduce size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (IGNITE-11358) Bug in ZK tests occurs periodically
[ https://issues.apache.org/jira/browse/IGNITE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin closed IGNITE-11358. --- Will be fixed in IGNITE-11255. > Bug in ZK tests occurs periodically > --- > > Key: IGNITE-11358 > URL: https://issues.apache.org/jira/browse/IGNITE-11358 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > java.lang.NullPointerException > at > org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430) > at > org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525) > at > org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-11358) Bug in ZK tests occurs periodically
[ https://issues.apache.org/jira/browse/IGNITE-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin resolved IGNITE-11358. - Resolution: Duplicate > Bug in ZK tests occurs periodically > --- > > Key: IGNITE-11358 > URL: https://issues.apache.org/jira/browse/IGNITE-11358 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > java.lang.NullPointerException > at > org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430) > at > org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58) > at > org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) > at > org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525) > at > org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11262) Compression on Discovery data bag
[ https://issues.apache.org/jira/browse/IGNITE-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772987#comment-16772987 ] Pavel Voronkin commented on IGNITE-11262: - Hi, [~v.pyatkov] i've put comments, please have a look. > Compression on Discovery data bag > - > > Key: IGNITE-11262 > URL: https://issues.apache.org/jira/browse/IGNITE-11262 > Project: Ignite > Issue Type: Improvement >Reporter: Vladislav Pyatkov >Assignee: Vladislav Pyatkov >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Size of GridComponetns data may increase significantly in large deployment. > Examples: > 1) In case of more then 3K caches with QueryEntry configured - size of > {{DiscoveryDataBag}}{{GridCacheProcessor}} data bag consume more then 20 Mb > 2) If cluster contain more then 13K objects - > {{GridMarshallerMappingProcessor}} size more then 1 Mb > 3) Cluster with more then 3К types in binary format - > {{CacheObjectBinaryProcessorImpl}} size can grow to 10Mb > The data in most cases contain duplicated structure and simple zip > compression can led to seriously reduce size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Description: We need to fix: * CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes ZookeeperDiscovery1: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv] Platform NET: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv] was: We need to fix: * CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes ZookeeperDiscovery1: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv] Platform NET: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv] > Fix test failure after IGNITE-7648. > --- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > > ZookeeperDiscovery1: > [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv] > Platform NET: > [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv] > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Description: We need to fix: * CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes ZookeeperDiscovery1: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv] Platform NET: [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv] was: We need to fix: * CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > Fix test failure after IGNITE-7648. > --- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > > ZookeeperDiscovery1: > [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_ZooKeeperDiscovery1=pull%2F6062%2Fhead=buildTypeStatusDiv] > Platform NET: > [https://ci.ignite.apache.org/viewType.html?buildTypeId=IgniteTests24Java8_PlatformNetLongRunning=pull%2F6062%2Fhead=buildTypeStatusDiv] > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11255) Fix test failure after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771928#comment-16771928 ] Pavel Voronkin commented on IGNITE-11255: - I've found another bug running tests related to AllNodesSupport is called while Spi is not initialized. > Fix test failure after IGNITE-7648. > --- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11358) Bug in ZK tests occurs periodically
Pavel Voronkin created IGNITE-11358: --- Summary: Bug in ZK tests occurs periodically Key: IGNITE-11358 URL: https://issues.apache.org/jira/browse/IGNITE-11358 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin java.lang.NullPointerException at org.apache.ignite.spi.discovery.zk.ZookeeperDiscoverySpi.allNodesSupport(ZookeeperDiscoverySpi.java:342) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.isHandshakeWaitSupported(TcpCommunicationSpi.java:4109) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.access$400(TcpCommunicationSpi.java:277) at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2.onConnected(TcpCommunicationSpi.java:430) at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionOpened(GridNioFilterChain.java:251) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionOpened(GridNioCodecFilter.java:66) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridConnectionBytesVerifyFilter.onSessionOpened(GridConnectionBytesVerifyFilter.java:58) at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionOpened(GridNioFilterAdapter.java:88) at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionOpened(GridNioServer.java:3525) at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionOpened(GridNioFilterChain.java:139) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.register(GridNioServer.java:2639) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1997) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1818) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10928) After huge load on cluster and restart with walCompactionEnabled=True errors on log
[ https://issues.apache.org/jira/browse/IGNITE-10928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771753#comment-16771753 ] Pavel Voronkin commented on IGNITE-10928: - Looks good to me > After huge load on cluster and restart with walCompactionEnabled=True errors > on log > --- > > Key: IGNITE-10928 > URL: https://issues.apache.org/jira/browse/IGNITE-10928 > Project: Ignite > Issue Type: Bug > Components: data structures >Affects Versions: 2.5 >Reporter: ARomantsov >Assignee: Sergey Antonov >Priority: Critical > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > {code:java} > > class="org.apache.ignite.configuration.DataRegionConfiguration"> > > > > > > {code} > {code:java} > [15:30:56,809][INFO][wal-file-compressor-%null%-1-#68][FileWriteAheadLogManager] > Stopping WAL iteration due to an exception: Failed to read WAL record at > position: 28310114 size: -1, ptr=FileWALPointer [idx=35, fileOff=28310114, > len=0] > [15:30:56,811][INFO][wal-file-compressor-%null%-3-#70][FileWriteAheadLogManager] > Stopping WAL iteration due to an exception: Failed to read WAL record at > position: 28303753 size: -1, ptr=FileWALPointer [idx=36, fileOff=28303753, > len=0] > [15:30:56,811][SEVERE][wal-file-compressor-%null%-1-#68][FileWriteAheadLogManager] > Compression of WAL segment [idx=35] was skipped due to unexpected error > class org.apache.ignite.IgniteCheckedException: Failed to read WAL record at > position: 28310114 size: -1 > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.handleRecordException(AbstractWalRecordsIterator.java:292) > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:258) > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advance(AbstractWalRecordsIterator.java:154) > at > org.apache.ignite.internal.processors.cache.persistence.wal.SingleSegmentLogicalRecordsIterator.advance(SingleSegmentLogicalRecordsIterator.java:119) > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:123) > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.onNext(AbstractWalRecordsIterator.java:52) > at > org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:41) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.compressSegmentToFile(FileWriteAheadLogManager.java:2039) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body0(FileWriteAheadLogManager.java:1974) > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileCompressorWorker.body(FileWriteAheadLogManager.java:1950) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > Caused by: class org.apache.ignite.IgniteCheckedException: Failed to read WAL > record at position: 28310114 size: -1 > at > org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV1Serializer.readWithCrc(RecordV1Serializer.java:394) > at > org.apache.ignite.internal.processors.cache.persistence.wal.serializer.RecordV2Serializer.readRecord(RecordV2Serializer.java:235) > at > org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.advanceRecord(AbstractWalRecordsIterator.java:243) > ... 10 more > Caused by: java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164) > at > org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.read(RandomAccessFileIO.java:58) > at > org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.read(FileIODecorator.java:51) > at > org.apache.ignite.internal.processors.cache.persistence.wal.io.SimpleFileInput.ensure(SimpleFileInput.java:119) > at > org.apache.ignite.internal.processors.cache.persistence.wal.io.FileInput$Crc32CheckingFileInput.ensure(FileInput.java:89) > at >
[jira] [Created] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.
Pavel Voronkin created IGNITE-11350: --- Summary: doInParallel interruption is not properly handled in ExchangeFuture. Key: IGNITE-11350 URL: https://issues.apache.org/jira/browse/IGNITE-11350 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.
[ https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11350: Attachment: GridGain_Tests_8.4_Java_8_Binary_Objects_DR_7222.log > doInParallel interruption is not properly handled in ExchangeFuture. > > > Key: IGNITE-11350 > URL: https://issues.apache.org/jira/browse/IGNITE-11350 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > While sys pool tasks interrupted on stop > detectLostPartitions() and resetLostPartitions() might endup > IgniteCheckedInterruptedException thrown which will cause node hang on stop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.
[ https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11350: Description: While sys pool tasks interrupted on stop detectLostPartitions() and resetLostPartitions() might endup IgniteCheckedInterruptedException thrown which will cause node hang on stop. > doInParallel interruption is not properly handled in ExchangeFuture. > > > Key: IGNITE-11350 > URL: https://issues.apache.org/jira/browse/IGNITE-11350 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > While sys pool tasks interrupted on stop > detectLostPartitions() and resetLostPartitions() might endup > IgniteCheckedInterruptedException thrown which will cause node hang on stop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11350) doInParallel interruption is not properly handled in ExchangeFuture.
[ https://issues.apache.org/jira/browse/IGNITE-11350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11350: Attachment: (was: GridGain_Tests_8.4_Java_8_Binary_Objects_DR_7222.log) > doInParallel interruption is not properly handled in ExchangeFuture. > > > Key: IGNITE-11350 > URL: https://issues.apache.org/jira/browse/IGNITE-11350 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > While sys pool tasks interrupted on stop > detectLostPartitions() and resetLostPartitions() might endup > IgniteCheckedInterruptedException thrown which will cause node hang on stop. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-9113) Allocate memory for a data region when first cache assigned to this region is created
[ https://issues.apache.org/jira/browse/IGNITE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768268#comment-16768268 ] Pavel Voronkin commented on IGNITE-9113: [~NIzhikov] i also observe that persistent(true) is ignored and region always created non-persistent. Seems like a bug we shouuld either fail on start or support persistent regions. > Allocate memory for a data region when first cache assigned to this region is > created > - > > Key: IGNITE-9113 > URL: https://issues.apache.org/jira/browse/IGNITE-9113 > Project: Ignite > Issue Type: Improvement > Components: cache >Affects Versions: 2.6 >Reporter: Valentin Kulichenko >Assignee: Nikolay Izhikov >Priority: Major > Fix For: 2.8 > > > Currently we do not create any regions or allocate any offheap memory on > client nodes unless it's explicitly configured. This is good behavior, > however there is a usability issue caused by the fact that many users have > the same config file for both server and clients. This can lead to unexpected > excessive memory usage on client side and forces users to maintain two config > files in most cases. > Same issue is applied to server nodes that do not store any data (e.g. nodes > running only services). > It's better to allocate memory dynamically, when first cache assigned to a > data region is created. > More detailed discussion here: > http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-9113) Allocate memory for a data region when first cache assigned to this region is created
[ https://issues.apache.org/jira/browse/IGNITE-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768268#comment-16768268 ] Pavel Voronkin edited comment on IGNITE-9113 at 2/14/19 1:51 PM: - [~NIzhikov] i also observe that persistent(true) is ignored and region always created non-persistent on client. Seems like a bug we shouuld either fail on start or support persistent regions. was (Author: voropava): [~NIzhikov] i also observe that persistent(true) is ignored and region always created non-persistent. Seems like a bug we shouuld either fail on start or support persistent regions. > Allocate memory for a data region when first cache assigned to this region is > created > - > > Key: IGNITE-9113 > URL: https://issues.apache.org/jira/browse/IGNITE-9113 > Project: Ignite > Issue Type: Improvement > Components: cache >Affects Versions: 2.6 >Reporter: Valentin Kulichenko >Assignee: Nikolay Izhikov >Priority: Major > Fix For: 2.8 > > > Currently we do not create any regions or allocate any offheap memory on > client nodes unless it's explicitly configured. This is good behavior, > however there is a usability issue caused by the fact that many users have > the same config file for both server and clients. This can lead to unexpected > excessive memory usage on client side and forces users to maintain two config > files in most cases. > Same issue is applied to server nodes that do not store any data (e.g. nodes > running only services). > It's better to allocate memory dynamically, when first cache assigned to a > data region is created. > More detailed discussion here: > http://apache-ignite-developers.2346864.n4.nabble.com/Data-regions-on-client-nodes-td32834.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767054#comment-16767054 ] Pavel Voronkin commented on IGNITE-11288: - Thanks > TcpDiscovery locks forever on SSLSocket.close(). > > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Fix For: 2.8 > > Time Spent: 20m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate(); > } > } catch (InterruptedException var14) > { var3 = true; } > if (var3) > { Thread.currentThread().interrupt(); } > } else > { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > }{code} > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. > Solution: > 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets > using iptables. > 2) Set SO_LINGER to some reasonable positive value. > Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > Guys end up setting SO_LINGER. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11308) Add soLinger parameter support in TcpDiscoverySpi .NET configuration.
[ https://issues.apache.org/jira/browse/IGNITE-11308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11308: Description: NET client should support TcpDiscoverry.soLinger parameter. > Add soLinger parameter support in TcpDiscoverySpi .NET configuration. > - > > Key: IGNITE-11308 > URL: https://issues.apache.org/jira/browse/IGNITE-11308 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Priority: Major > > NET client should support TcpDiscoverry.soLinger parameter. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11308) Add soLinger parameter support in TcpDiscoverySpi .NET configuration.
Pavel Voronkin created IGNITE-11308: --- Summary: Add soLinger parameter support in TcpDiscoverySpi .NET configuration. Key: IGNITE-11308 URL: https://issues.apache.org/jira/browse/IGNITE-11308 Project: Ignite Issue Type: Improvement Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets using iptables. 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. Guys end up setting SO_LINGER. was: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets using iptables. 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. Guys end up setting SO_LINGER. > TcpDiscovery locks forever on SSLSocket.close(). > > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever on SSLSOcketImpl.close(). > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate(); > } > } catch (InterruptedException var14) > { var3 =
[jira] [Updated] (IGNITE-11288) TcpDiscovery locks forever on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Summary: TcpDiscovery locks forever on SSLSocket.close(). (was: TcpDiscovery deadlock on SSLSocket.close().) > TcpDiscovery locks forever on SSLSocket.close(). > > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() > onTimeout hangs on writeLock. > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate(); > } > } catch (InterruptedException var14) > { var3 = true; } > if (var3) > { Thread.currentThread().interrupt(); } > } else > { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > }{code} > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. > Solution: > 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets > using iptables. > 2) Set SO_LINGER to some reasonable positive value. > Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > Guys end up setting SO_LINGER. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets using iptables. 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. Guys end up setting SO_LINGER. was: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets using iptables. 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. Guys end up setting SO_LINGER> > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() > onTimeout hangs on writeLock. > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate();
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //that didn't help on Linux in case we drop packets using iptables. 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. Guys end up setting SO_LINGER> was: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets using iptables . 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() > onTimeout hangs on writeLock. > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate(); > } > } catch
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \\{ this.writeLock.unlock(); } } else \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \\{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \\{ var3 = true; } if (var3) \\{ Thread.currentThread().interrupt(); } } else \\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } \{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets using iptables . 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. was: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } {code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() > onTimeout hangs on writeLock. > According to java8 SSLSocketImpl: > {code} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > > finally \\{ this.writeLock.unlock(); } > } else > > \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > > else if (debug != null && Debug.isOn("ssl")) \\{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \\{ var3 = true; } >
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code:java} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } }{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets using iptables . 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. was: Rootcause is java bug locking on SSLSocketImpl.close() on write lock: //we create socket with soTimeout(0) here, but setting it here won't help anyway. RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: {code} if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \\{ this.writeLock.unlock(); } } else \\{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \\{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \\{ var3 = true; } if (var3) \\{ Thread.currentThread().interrupt(); } } else \\{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } \{code} In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. Solution: 1) Set proper SO_TIMEOUT //we checked that didn' help on Linux if drop packets using iptables . 2) Set SO_LINGER to some reasonable positive value. Similar JDK bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is java bug locking on SSLSocketImpl.close() on write lock: > //we create socket with soTimeout(0) here, but setting it here won't help > anyway. > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > //After timeout grid-timeout-worker blocks forever but SSLSOcketImpl.close() > onTimeout hangs on writeLock. > According to java8 SSLSocketImpl: > {code:java} > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) > { System.out.println(Thread.currentThread().getName() + ", received > Exception: " + var4); } > this.sess.invalidate(); > } > } catch
[jira] [Assigned] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-11288: --- Assignee: Pavel Voronkin > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is we not set SO_TIMEOUT on discovery socket on retry: > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout > hangs on writeLock. > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > > finally \{ this.writeLock.unlock(); } > } else > > \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > > if (var3) \{ Thread.currentThread().interrupt(); } > } else > > \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > > Solution: > 1) Set proper SO_TIMEOUT > 2) Possibly add ability to override SO_LINGER to some reasonable value. > > Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. was: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is we not set SO_TIMEOUT on discovery socket on retry: > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout > hangs on writeLock. > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > > finally \{ this.writeLock.unlock(); } > } else > > \{ SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); } > > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > > if (var3) \{ Thread.currentThread().interrupt(); } > } else > > \{ this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: Rootcause is we not set SO_TIMEOUT on discovery socket on retry: RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout hangs on writeLock. According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. Solution: 1) Set proper SO_TIMEOUT 2) Possibly add ability to override SO_LINGER to some reasonable value. Similar bug [https://bugs.openjdk.java.net/browse/JDK-6668261]. was: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default value 0. Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261. > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Rootcause is we not set SO_TIMEOUT on discovery socket on retry: > RingMessageWorker: 3152 sock = spi.openSocket(addr, timeoutHelper); > So ring message worker blocks forever but SSLSOcketImpl.close() onTimeout > hangs on writeLock. > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > > finally \{ this.writeLock.unlock(); } > } else > > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) > { this.fatal((byte)-1, (Throwable)var4); } > > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > > if (var3) \{ Thread.currentThread().interrupt(); } > } else > > { this.writeLock.lock(); try > { this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever, cause RingMessageWorker is writing message with SO_TIMEOUT zero. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > > Solution: > 1) Set proper SO_TIMEOUT > 2) Possibly add ability to override SO_LINGER to some reasonable value. > > > > Similar bug
[jira] [Updated] (IGNITE-11288) TcpDiscovery deadlock on SSLSocket.close().
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Summary: TcpDiscovery deadlock on SSLSocket.close(). (was: Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.) > TcpDiscovery deadlock on SSLSocket.close(). > --- > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally \{ this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > if (var3) \{ Thread.currentThread().interrupt(); } > } else > { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > We need to make it configurable for TcpCommSpi and TcpDisco. I suggest > default value 0. > > Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default value 0. Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261. was: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try \{ this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default value 0. > Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing > SSLSocket.close() deadlock. > - > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } > finally \{ this.writeLock.unlock(); } > } else > { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); if (this.isLayered() && !this.autoClose) \\{ > this.fatal((byte)-1, (Throwable)var4); } > else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > if (var3) \{ Thread.currentThread().interrupt(); } > } else > { this.writeLock.lock(); try \\{ this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > We need to make it configurable for TcpCommSpi and TcpDisco. I suggest > default value 0. > > Similar bug https://bugs.openjdk.java.net/browse/JDK-6668261. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) \{ var3 = true; } if (var3) \{ Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try \{ this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which wait forever. U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. We need to make it configurable for TcpCommSpi and TcpDisco. I suggest default value 0. was: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which might fail forever. > Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing > SSLSocket.close() deadlock. > - > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try > { this.writeRecordInternal(var1, var2); } finally \{ this.writeLock.unlock(); > } > } else { > SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); > if (this.isLayered() && !this.autoClose) \{ this.fatal((byte)-1, > (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) \{ > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); } > > this.sess.invalidate(); > } > } catch (InterruptedException var14) \{ var3 = true; } > > if (var3) \{ Thread.currentThread().interrupt(); } > } else { > this.writeLock.lock(); > > try \{ this.writeRecordInternal(var1, var2); } > finally > { this.writeLock.unlock(); } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > wait forever. > U.closeQuiet(socket) if SSL is on will hang if soLinger() is negative. > We need to make it configurable for TcpCommSpi and TcpDisco. I suggest > default value 0. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.
[ https://issues.apache.org/jira/browse/IGNITE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11288: Description: According to java8 SSLSocketImpl: if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { boolean var3 = Thread.interrupted(); try { if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } else { SSLException var4 = new SSLException("SO_LINGER timeout, close_notify message cannot be sent."); if (this.isLayered() && !this.autoClose) { this.fatal((byte)-1, (Throwable)var4); } else if (debug != null && Debug.isOn("ssl")) { System.out.println(Thread.currentThread().getName() + ", received Exception: " + var4); } this.sess.invalidate(); } } catch (InterruptedException var14) { var3 = true; } if (var3) { Thread.currentThread().interrupt(); } } else { this.writeLock.lock(); try { this.writeRecordInternal(var1, var2); } finally { this.writeLock.unlock(); } } In case of soLinger is not set we fallback to this.writeLock.lock(); which might fail forever. > Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing > SSLSocket.close() deadlock. > - > > Key: IGNITE-11288 > URL: https://issues.apache.org/jira/browse/IGNITE-11288 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Critical > > According to java8 SSLSocketImpl: > if (var1.isAlert((byte)0) && this.getSoLinger() >= 0) { > boolean var3 = Thread.interrupted(); > try { > if (this.writeLock.tryLock((long)this.getSoLinger(), TimeUnit.SECONDS)) { > try { > this.writeRecordInternal(var1, var2); > } finally { > this.writeLock.unlock(); > } > } else { > SSLException var4 = new SSLException("SO_LINGER timeout, close_notify > message cannot be sent."); > if (this.isLayered() && !this.autoClose) { > this.fatal((byte)-1, (Throwable)var4); > } else if (debug != null && Debug.isOn("ssl")) { > System.out.println(Thread.currentThread().getName() + ", received Exception: > " + var4); > } > this.sess.invalidate(); > } > } catch (InterruptedException var14) { > var3 = true; > } > if (var3) { > Thread.currentThread().interrupt(); > } > } else { > this.writeLock.lock(); > try { > this.writeRecordInternal(var1, var2); > } finally { > this.writeLock.unlock(); > } > } > > In case of soLinger is not set we fallback to this.writeLock.lock(); which > might fail forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11288) Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock.
Pavel Voronkin created IGNITE-11288: --- Summary: Missing SO_LINGER in TcpDiscovery and TcpCommunicationSpi causing SSLSocket.close() deadlock. Key: IGNITE-11288 URL: https://issues.apache.org/jira/browse/IGNITE-11288 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Summary: Fix test failure after IGNITE-7648. (was: Fix test failures after IGNITE-7648.) > Fix test failure after IGNITE-7648. > --- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-7648: --- Description: IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 as a way to prevent unnecessary node drops in case of short network problems. I suppose it's wrong decision to fix it in such way. We had faced some issues in our production due to lack of automatic kicking of ill-behaving nodes (on example, hanging due to long GC pauses) until we realised the necessity of changing default behavior via property. Right solution is to kick nodes only if failure threshold is reached. Such behavior should be always enabled. UPDATE: During a discussion it was decided what the property will remain disabled by default. We decided to change timeout logic in case of failure detection enabled. We start performing connect and handshake from 500ms increasing using exponential backoff strategy. was: IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in IGNITE-5718 as a way to prevent unnecessary node drops in case of short network problems. I suppose it's wrong decision to fix it in such way. We had faced some issues in our production due to lack of automatic kicking of ill-behaving nodes (on example, hanging due to long GC pauses) until we realised the necessity of changing default behavior via property. Right solution is to kick nodes only if failure threshold is reached. Such behavior should be always enabled. UPDATE: During a discussion it was decided what the property will remain disabled by default. > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > We decided to change timeout logic in case of failure detection enabled. We > start performing connect and handshake from 500ms increasing using > exponential backoff strategy. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-7648: --- Comment: was deleted (was: We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket. Only server node can kill client node in case if property is enabled. client can't kill server, server can't kill server. Timeout logic changed in case of failure detection enabled scenario We start connect and hanshake from timeout 500ms. If failed we increase timeout using exponential backoff strategy timeout = Math.min(Math.min(timeout * 2, maxTimeout), remainingTiimeTillFailureDetection) ) > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763557#comment-16763557 ] Pavel Voronkin commented on IGNITE-7648: We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket. Only server node can kill client node in case if property is enabled. client can't kill server, server can't kill server. Timeout logic changed in case of failure detection enabled scenario We start connect and hanshake from timeout 500ms. If failed we increase timeout using exponential backoff strategy timeout = Math.min(Math.min(timeout * 2, maxTimeout), remainingTiimeTillFailureDetection) > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763556#comment-16763556 ] Pavel Voronkin edited comment on IGNITE-7648 at 2/8/19 12:48 PM: - We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket. Only server node can kill client node in case if property is enabled. client can't kill server, server can't kill server. Timeout logic changed in case of failure detection enabled scenario We start connect and hanshake from timeout 500ms. If failed we increase timeout using exponential backoff strategy timeout = Math.min(Math.min(timeout * 2, maxTimeout), remainingTiimeTillFailureDetection) was (Author: voropava): We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket. Only server node can kill client node in case if property is enabled. client can't kill server, server can't kill server. Timeout logic changed in case of failure detection enabled scenario We start connect and hanshake from timeout 500ms. If failed we increase timeout using exponential backoff strategy timeout = Math.min(Math.min(timeout * 2, maxTimeout), remainingTiimeTillFailureDetection) > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763556#comment-16763556 ] Pavel Voronkin commented on IGNITE-7648: We changed behaviour of ENABLED_FORCIBLE_NODE_KILL=true in this ticket. Only server node can kill client node in case if property is enabled. client can't kill server, server can't kill server. Timeout logic changed in case of failure detection enabled scenario We start connect and hanshake from timeout 500ms. If failed we increase timeout using exponential backoff strategy timeout = Math.min(Math.min(timeout * 2, maxTimeout), remainingTiimeTillFailureDetection) > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failures after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Summary: Fix test failures after IGNITE-7648. (was: Fix test failure after IGNITE-7648.) > Fix test failures after IGNITE-7648. > > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648.
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Summary: Fix test failure after IGNITE-7648. (was: Fix test failure after IGNITE-7648) > Fix test failure after IGNITE-7648. > --- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-11255) Fix test failure after IGNITE-7648
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-11255: --- Assignee: Pavel Voronkin > Fix test failure after IGNITE-7648 > -- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Description: We need to fix: * CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > Fix test failure after IGNITE-7648 > -- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > > We need to fix: > > * > CacheQueriesRestartServerTest.Test_ScanQueryAfterClientReconnect_ReturnsResults(False) > * ZookeeperDiscoveryClientDisconnectTest.testReconnectServersRestart_3 > * IgniteTwoRegionsRebuildIndexTest.testRebuildIndexes > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11255) Fix test failure after IGNITE-7648
[ https://issues.apache.org/jira/browse/IGNITE-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11255: Labels: MakeTeamcityGreenAgain (was: ) > Fix test failure after IGNITE-7648 > -- > > Key: IGNITE-11255 > URL: https://issues.apache.org/jira/browse/IGNITE-11255 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > Labels: MakeTeamcityGreenAgain > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11255) Fix test failure after IGNITE-7648
Pavel Voronkin created IGNITE-11255: --- Summary: Fix test failure after IGNITE-7648 Key: IGNITE-11255 URL: https://issues.apache.org/jira/browse/IGNITE-11255 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-11221) Refactor timeout logic in TcpDiscovery
[ https://issues.apache.org/jira/browse/IGNITE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-11221: --- Assignee: Stanilovsky Evgeny > Refactor timeout logic in TcpDiscovery > -- > > Key: IGNITE-11221 > URL: https://issues.apache.org/jira/browse/IGNITE-11221 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Stanilovsky Evgeny >Priority: Major > > We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing > exception handling and timeout calculation. > We need to reuse ExponentialBackoffTimeout to encapsulate logic of > calculating different sets of timeout separately and get rid of many local > variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11221) Refactor timeout logic in TcpDiscovery
[ https://issues.apache.org/jira/browse/IGNITE-11221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11221: Description: We need to replace IgniteSpiOperationTimeoutHelper with TimeoutStrategy introduced in IGNITE-7648, cause it's mixing exception handling and timeout calculation. We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating different sets of timeout separately and get rid of many local variables. was: We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing exception handling and timeout calculation. We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating different sets of timeout separately and get rid of many local variables. > Refactor timeout logic in TcpDiscovery > -- > > Key: IGNITE-11221 > URL: https://issues.apache.org/jira/browse/IGNITE-11221 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Stanilovsky Evgeny >Priority: Major > > We need to replace IgniteSpiOperationTimeoutHelper with TimeoutStrategy > introduced in IGNITE-7648, cause it's mixing exception handling and timeout > calculation. > We need to reuse ExponentialBackoffTimeout to encapsulate logic of > calculating different sets of timeout separately and get rid of many local > variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6324) Transactional cache data partially available after crash.
[ https://issues.apache.org/jira/browse/IGNITE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-6324: --- Summary: Transactional cache data partially available after crash. (was: Transactional cache data partially available after crash) > Transactional cache data partially available after crash. > - > > Key: IGNITE-6324 > URL: https://issues.apache.org/jira/browse/IGNITE-6324 > Project: Ignite > Issue Type: Bug > Components: persistence >Affects Versions: 1.9, 2.1 >Reporter: Stanilovsky Evgeny >Assignee: Dmitriy Govorukhin >Priority: Major > Fix For: 2.8 > > Attachments: InterruptCommitedThreadTest.java > > > If InterruptedException raise in client code during pds store operations we > can obtain inconsistent cache after restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11221) Refactor timeout logic in TcpDiscovery
Pavel Voronkin created IGNITE-11221: --- Summary: Refactor timeout logic in TcpDiscovery Key: IGNITE-11221 URL: https://issues.apache.org/jira/browse/IGNITE-11221 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin We need to reimplement IgniteSpiOperationTimeoutHelper, cause it's mixing exception handling and timeout calculation. We need to reuse ExponentialBackoffTimeout to encapsulate logic of calculating different sets of timeout separately and get rid of many local variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-7648) Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property.
[ https://issues.apache.org/jira/browse/IGNITE-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-7648: -- Assignee: Pavel Voronkin (was: Alexei Scherbakov) > Fix IGNITE_ENABLE_FORCIBLE_NODE_KILL system property. > - > > Key: IGNITE-7648 > URL: https://issues.apache.org/jira/browse/IGNITE-7648 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.3 >Reporter: Alexei Scherbakov >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > > IGNITE_ENABLE_FORCIBLE_NODE_KILL system property was introduced in > IGNITE-5718 as a way to prevent unnecessary node drops in case of short > network problems. > I suppose it's wrong decision to fix it in such way. > We had faced some issues in our production due to lack of automatic kicking > of ill-behaving nodes (on example, hanging due to long GC pauses) until we > realised the necessity of changing default behavior via property. > Right solution is to kick nodes only if failure threshold is reached. Such > behavior should be always enabled. > UPDATE: During a discussion it was decided what the property will remain > disabled by default. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.
[ https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin resolved IGNITE-11201. - Resolution: Duplicate > ConnectorConfiguration and TransactionConfiguration toString is not properly > implemented. > - > > Key: IGNITE-11201 > URL: https://issues.apache.org/jira/browse/IGNITE-11201 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > Ignite configuration prints on startup, but ConnectorConfiguration and > TransactionConfiguration are not properly printed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.
[ https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11201: Description: Ignite configuration prints on startup, but ConnectorConfiguration and TransactionConfiguration are not properly printed. > ConnectorConfiguration and TransactionConfiguration toString is not properly > implemented. > - > > Key: IGNITE-11201 > URL: https://issues.apache.org/jira/browse/IGNITE-11201 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > Ignite configuration prints on startup, but ConnectorConfiguration and > TransactionConfiguration are not properly printed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented
[ https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11201: Summary: ConnectorConfiguration and TransactionConfiguration toString is not properly implemented (was: ConnectorConfdiguration and TransactionConfiguration toString is not properly implemented.) > ConnectorConfiguration and TransactionConfiguration toString is not properly > implemented > > > Key: IGNITE-11201 > URL: https://issues.apache.org/jira/browse/IGNITE-11201 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11201) ConnectorConfiguration and TransactionConfiguration toString is not properly implemented.
[ https://issues.apache.org/jira/browse/IGNITE-11201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11201: Summary: ConnectorConfiguration and TransactionConfiguration toString is not properly implemented. (was: ConnectorConfiguration and TransactionConfiguration toString is not properly implemented) > ConnectorConfiguration and TransactionConfiguration toString is not properly > implemented. > - > > Key: IGNITE-11201 > URL: https://issues.apache.org/jira/browse/IGNITE-11201 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11201) ConnectorConfdiguration and TransactionConfiguration toString is not properly implemented.
Pavel Voronkin created IGNITE-11201: --- Summary: ConnectorConfdiguration and TransactionConfiguration toString is not properly implemented. Key: IGNITE-11201 URL: https://issues.apache.org/jira/browse/IGNITE-11201 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11172) While handling duplicated connections we got exception on writing message to stale connection.
[ https://issues.apache.org/jira/browse/IGNITE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11172: Summary: While handling duplicated connections we got exception on writing message to stale connection. (was: On receiving duplicated connections we got exception on writing message on stale connections.) > While handling duplicated connections we got exception on writing message to > stale connection. > -- > > Key: IGNITE-11172 > URL: https://issues.apache.org/jira/browse/IGNITE-11172 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > [2019-01-31 16:10:19,072][INFO > ][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming > connection from remote node while connecting to this node, rejecting > [locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, > rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2] > [2019-01-31 > 16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] > Failed to process selector key [ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, > finished=false, hashCode=848731852, interrupted=false, > runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, > bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, > super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, > rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, > closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, > bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, > lastRcvTime=1548940219115, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=false, markedForClose=true]]] > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, > finished=false, hashCode=848731852, interrupted=false, > runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, > bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, > super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096,
[jira] [Created] (IGNITE-11172) On receiving duplicated connections we got exception.
Pavel Voronkin created IGNITE-11172: --- Summary: On receiving duplicated connections we got exception. Key: IGNITE-11172 URL: https://issues.apache.org/jira/browse/IGNITE-11172 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin [2019-01-31 16:10:19,072][INFO ][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming connection from remote node while connecting to this node, rejecting [locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2] [2019-01-31 16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, finished=false, hashCode=848731852, interrupted=false, runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList [172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList [172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, lastRcvTime=1548940219115, readsPaused=false, filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, finished=false, hashCode=848731852, interrupted=false, runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList [172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, addrs=ArrayList [172.25.1.12], sockAddrs=HashSet [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1548940115834, loc=false, ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, lastRcvTime=1548940219115, readsPaused=false, filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL filter], accepted=false, markedForClose=true]]] at org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) at
[jira] [Updated] (IGNITE-11172) On receiving duplicated connections we got exception on writing message on stale connections.
[ https://issues.apache.org/jira/browse/IGNITE-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11172: Summary: On receiving duplicated connections we got exception on writing message on stale connections. (was: On receiving duplicated connections we got exception.) > On receiving duplicated connections we got exception on writing message on > stale connections. > - > > Key: IGNITE-11172 > URL: https://issues.apache.org/jira/browse/IGNITE-11172 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > [2019-01-31 16:10:19,072][INFO > ][grid-nio-worker-tcp-comm-5-#45][TcpCommunicationSpi] Received incoming > connection from remote node while connecting to this node, rejecting > [locNode=e0668107-3c19-41ba-b9f5-9f073711d3ce, locNodeOrder=1, > rmtNode=848095e3-29bf-4d67-a5d7-117f44001b70, rmtNodeOrder=2] > [2019-01-31 > 16:10:20,310][ERROR][grid-nio-worker-tcp-comm-6-#46][TcpCommunicationSpi] > Failed to process selector key [ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, > finished=false, hashCode=848731852, interrupted=false, > runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, > bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, > super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > super=GridNioSessionImpl [locAddr=/172.25.1.11:58372, > rmtAddr=lab12.gridgain.local/172.25.1.12:47100, createTime=1548940219095, > closeTime=0, bytesSent=5750672, bytesRcvd=23544, bytesSent0=5750672, > bytesRcvd0=23544, sndSchedTime=1548940219095, lastSndTime=1548940219306, > lastRcvTime=1548940219115, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=false, markedForClose=true]]] > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, > finished=false, hashCode=848731852, interrupted=false, > runner=grid-nio-worker-tcp-comm-6-#46]AbstractNioClientWorker [idx=6, > bytesRcvd=28540977, bytesSent=0, bytesRcvd0=30504, bytesSent0=0, select=true, > super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32511 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], > outRecovery=GridNioRecoveryDescriptor [acked=484549, resendCnt=0, > rcvCnt=443208, sentCnt=532641, reserved=true, lastAck=443200, nodeLeft=false, > node=TcpDiscoveryNode [id=848095e3-29bf-4d67-a5d7-117f44001b70, > addrs=ArrayList [172.25.1.12], sockAddrs=HashSet > [lab12.gridgain.local/172.25.1.12:47500], discPort=47500, order=2, > intOrder=2, lastExchangeTime=1548940115834, loc=false, > ver=2.5.5#20190131-sha1:38e914f7, isClient=false], connected=false, > connectCnt=16, queueLimit=4096, reserveCnt=17, pairedConnections=false], >
[jira] [Updated] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.
[ https://issues.apache.org/jira/browse/IGNITE-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11126: Description: We need to rework createShmemClient() logic, to support failure detection/exponential backoff timeout logic intruduced in IGNITE-7648. Also isRecoverableError() sleep loop needs to be implemented in case of exception. was:We need to rework > Rework TcpCommunicationSpi.createShmemClient failure detection logic. > - > > Key: IGNITE-11126 > URL: https://issues.apache.org/jira/browse/IGNITE-11126 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Priority: Major > > We need to rework createShmemClient() logic, to support failure > detection/exponential backoff timeout logic intruduced in IGNITE-7648. > Also isRecoverableError() sleep loop needs to be implemented in case of > exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.
[ https://issues.apache.org/jira/browse/IGNITE-11126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11126: Description: We need to rework > Rework TcpCommunicationSpi.createShmemClient failure detection logic. > - > > Key: IGNITE-11126 > URL: https://issues.apache.org/jira/browse/IGNITE-11126 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Priority: Major > > We need to rework -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11126) Rework TcpCommunicationSpi.createShmemClient failure detection logic.
Pavel Voronkin created IGNITE-11126: --- Summary: Rework TcpCommunicationSpi.createShmemClient failure detection logic. Key: IGNITE-11126 URL: https://issues.apache.org/jira/browse/IGNITE-11126 Project: Ignite Issue Type: Improvement Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8894) Provide information about coordinator in control.sh output
[ https://issues.apache.org/jira/browse/IGNITE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753999#comment-16753999 ] Pavel Voronkin commented on IGNITE-8894: Looks good for me. > Provide information about coordinator in control.sh output > -- > > Key: IGNITE-8894 > URL: https://issues.apache.org/jira/browse/IGNITE-8894 > Project: Ignite > Issue Type: Improvement >Affects Versions: 2.5 >Reporter: Sergey Kosarev >Assignee: Sergey Kosarev >Priority: Minor > Fix For: 2.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Information about coordinator can be added in an existing command (i.e. > --state, --baseline) > either a new command can be introduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-10876) "Affinity changes (coordinator) applied" can be executed in parallel
[ https://issues.apache.org/jira/browse/IGNITE-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-10876: --- Assignee: Pavel Voronkin > "Affinity changes (coordinator) applied" can be executed in parallel > > > Key: IGNITE-10876 > URL: https://issues.apache.org/jira/browse/IGNITE-10876 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > > There is for loop over all cache groups which execution N*P operations in > exchange worker where N is number of cache groups, P is number of partitions. > We spend 80% of time in a loop > for (CacheGroupContext grp : cctx.cache().cacheGroups()){ > GridDhtPartitionTopology top = grp != null ? grp.topology() : > cctx.exchange().clientTopology(grp.groupId(), events().discoveryCache()); > top.beforeExchange(this, true, true); > } > I believe we can execute it in parallel -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.
[ https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11017: Description: VisorCache doesn't show cacheSize() nevertheless VisorCache object contains cacheSize. was:GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not EVICTED partitions on calculating entries size. > Visor doesn't show cacheSize metrics. > - > > Key: IGNITE-11017 > URL: https://issues.apache.org/jira/browse/IGNITE-11017 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > VisorCache doesn't show cacheSize() nevertheless VisorCache object contains > cacheSize. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.
[ https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11017: Description: VisorCache doesn't show cacheSize() on the caches screen nevertheless VisorCache object contains cacheSize. was: VisorCache doesn't show cacheSize() nevertheless VisorCache object contains cacheSize. > Visor doesn't show cacheSize metrics. > - > > Key: IGNITE-11017 > URL: https://issues.apache.org/jira/browse/IGNITE-11017 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > VisorCache doesn't show cacheSize() on the caches screen nevertheless > VisorCache object contains cacheSize. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11017) Visor doesn't show cacheSize metrics.
[ https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11017: Summary: Visor doesn't show cacheSize metrics. (was: OffheapEntriesCount metrics calculate size on all not EVICTED partitions) > Visor doesn't show cacheSize metrics. > - > > Key: IGNITE-11017 > URL: https://issues.apache.org/jira/browse/IGNITE-11017 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not > EVICTED partitions on calculating entries size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (IGNITE-11061) Сopyright still points out 2018
[ https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin closed IGNITE-11061. --- Ignite Flags: (was: Docs Required) Created ticket by mistake > Сopyright still points out 2018 > --- > > Key: IGNITE-11061 > URL: https://issues.apache.org/jira/browse/IGNITE-11061 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-11061) Сopyright still points out 2018
[ https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin resolved IGNITE-11061. - Resolution: Invalid > Сopyright still points out 2018 > --- > > Key: IGNITE-11061 > URL: https://issues.apache.org/jira/browse/IGNITE-11061 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-11061) Сopyright still points out 2018
[ https://issues.apache.org/jira/browse/IGNITE-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751194#comment-16751194 ] Pavel Voronkin edited comment on IGNITE-11061 at 1/24/19 2:38 PM: -- Created ticket by mistake. was (Author: voropava): Created ticket by mistake > Сopyright still points out 2018 > --- > > Key: IGNITE-11061 > URL: https://issues.apache.org/jira/browse/IGNITE-11061 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11061) Сopyright still points out 2018
Pavel Voronkin created IGNITE-11061: --- Summary: Сopyright still points out 2018 Key: IGNITE-11061 URL: https://issues.apache.org/jira/browse/IGNITE-11061 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Fix Version/s: 2.8 > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > Attachments: IgniteClientConnectSslTest.java > > Time Spent: 0.5h > Remaining Estimate: 0h > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.
[ https://issues.apache.org/jira/browse/IGNITE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11054: Description: We have bug in processWrite() > GridNioServer.processWrite() reordered socket.write and onMessageWritten > callback. > -- > > Key: IGNITE-11054 > URL: https://issues.apache.org/jira/browse/IGNITE-11054 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > We have bug in processWrite() > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Ignite Flags: (was: Docs Required) > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > Attachments: IgniteClientConnectSslTest.java > > Time Spent: 0.5h > Remaining Estimate: 0h > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.
[ https://issues.apache.org/jira/browse/IGNITE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11054: Description: We have bug in processWrite() SessionWriteRequest.onMessageWritten() is invoked before actual write to socket. was: We have bug in processWrite() > GridNioServer.processWrite() reordered socket.write and onMessageWritten > callback. > -- > > Key: IGNITE-11054 > URL: https://issues.apache.org/jira/browse/IGNITE-11054 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > We have bug in processWrite() > SessionWriteRequest.onMessageWritten() is invoked before actual write to > socket. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11054) GridNioServer.processWrite() reordered socket.write and onMessageWritten callback.
Pavel Voronkin created IGNITE-11054: --- Summary: GridNioServer.processWrite() reordered socket.write and onMessageWritten callback. Key: IGNITE-11054 URL: https://issues.apache.org/jira/browse/IGNITE-11054 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET.
[ https://issues.apache.org/jira/browse/IGNITE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin resolved IGNITE-11026. - Resolution: Won't Fix We decided no to introduce new parameters. > Support TcpCommunicationSpi.NeedWaitDelay, > TcpCommunicationSpi.MaxNeedWaitDelay in .NET. > > > Key: IGNITE-11026 > URL: https://issues.apache.org/jira/browse/IGNITE-11026 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750679#comment-16750679 ] Pavel Voronkin commented on IGNITE-11016: - I agree with that we need to add failure detection logic here in another jira. Maybe it would be IGNITE-7648. I will change odd initial delay of 1ms and will implement failure detection logic in another jira. > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: IgniteClientConnectSslTest.java > > Time Spent: 0.5h > Remaining Estimate: 0h > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749517#comment-16749517 ] Pavel Voronkin commented on IGNITE-10877: - Thanks for your feedback [~ascherbakov], i've resolved them. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11031) Improve test coverage on ssl and fix existing ssl tcp communication spi tests.
Pavel Voronkin created IGNITE-11031: --- Summary: Improve test coverage on ssl and fix existing ssl tcp communication spi tests. Key: IGNITE-11031 URL: https://issues.apache.org/jira/browse/IGNITE-11031 Project: Ignite Issue Type: Improvement Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.
Pavel Voronkin created IGNITE-11026: --- Summary: Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay. Key: IGNITE-11026 URL: https://issues.apache.org/jira/browse/IGNITE-11026 Project: Ignite Issue Type: Improvement Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11026) Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET.
[ https://issues.apache.org/jira/browse/IGNITE-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11026: Summary: Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay in .NET. (was: Support TcpCommunicationSpi.NeedWaitDelay, TcpCommunicationSpi.MaxNeedWaitDelay.) > Support TcpCommunicationSpi.NeedWaitDelay, > TcpCommunicationSpi.MaxNeedWaitDelay in .NET. > > > Key: IGNITE-11026 > URL: https://issues.apache.org/jira/browse/IGNITE-11026 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin reassigned IGNITE-11016: --- Assignee: Pavel Voronkin > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: IgniteClientConnectSslTest.java > > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions
[ https://issues.apache.org/jira/browse/IGNITE-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11017: Description: GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not EVICTED partitions on calculating entries size. > OffheapEntriesCount metrics calculate size on all not EVICTED partitions > > > Key: IGNITE-11017 > URL: https://issues.apache.org/jira/browse/IGNITE-11017 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > GridDhtPartitionTopologyImpl.CurrentPartitionsIterator iterates over not > EVICTED partitions on calculating entries size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11017) OffheapEntriesCount metrics calculate size on all not EVICTED partitions
Pavel Voronkin created IGNITE-11017: --- Summary: OffheapEntriesCount metrics calculate size on all not EVICTED partitions Key: IGNITE-11017 URL: https://issues.apache.org/jira/browse/IGNITE-11017 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748456#comment-16748456 ] Pavel Voronkin commented on IGNITE-11016: - Reproducer attached. > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > Attachments: IgniteClientConnectSslTest.java > > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Attachment: IgniteClientConnectSslTest.java > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > Attachments: IgniteClientConnectSslTest.java > > > Problem: > In case of initiator node haven't joined topology yet (doesn't exist in > DiscoCache, but exists in TcpDsicovery ring) > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else clause: > if (unknownNode) > { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); ses.close(); } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) > { ses.close(); } > }); > } > In case of SSL such code do encrypt and send concurrently with > session.close() which results in exception: > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > So initiator receive closed exception instead of NEED_WAIT message which > leads to exception scenario. > As result instead of NEED_WAIT loop we retry with exception N times and fail. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Summary: RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL (was: NEED_WAIT write message failed in case of SSL) > RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Description: Problem: In case of initiator node haven't joined topology yet (doesn't exist in DiscoCache, but exists in TcpDsicovery ring) we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below else clause: if (unknownNode) { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", ses=" + ses + ']'); ses.close(); } else { ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new CI1>() { @Override public void apply(IgniteInternalFuture fut) { ses.close(); } }); } In case of SSL such code do encrypt and send concurrently with session.close() which results in exception: javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, hashCode=1324367867, interrupted=false, runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%|#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, select=true, super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL filter], accepted=true, markedForClose=true]]] at org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) at org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) So initiator receive closed exception instead of NEED_WAIT message which leads to exception scenario. As result instead of NEED_WAIT loop we retry with exception N times and fail. was: The problem is that in case of initiator node doesn't exist in DiscoCache, but exists in TcpDsicovery ring we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below else: if (unknownNode) { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", ses=" + ses + ']'); ses.close(); } else { ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new CI1>() { @Override public void apply(IgniteInternalFuture fut) { ses.close(); } }); } In case of SSL such code do encrypt and send concurrently with close which results in : javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, hashCode=1324367867, interrupted=false, runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, select=true, super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL filter], accepted=true, markedForClose=true]]] at
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Description: The problem is that in case of initiator node doesn't exist in DiscoCache, but exists in TcpDsicovery ring we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below else: if (unknownNode) { U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", ses=" + ses + ']'); ses.close(); } else { ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new CI1>() { @Override public void apply(IgniteInternalFuture fut) { ses.close(); } }); } In case of SSL such code do encrypt and send concurrently with close which results in : javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, hashCode=1324367867, interrupted=false, runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, select=true, super=]DirectNioClientWorker [super=], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL filter], accepted=true, markedForClose=true]]] at org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) at org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) > RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > The problem is that in case of initiator node doesn't exist in DiscoCache, > but exists in TcpDsicovery ring > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else: > > if (unknownNode) { > U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); > ses.close(); > } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) { > ses.close(); > } > }); > } > In case of SSL such code do encrypt and send concurrently with close which > results in : > > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false,
[jira] [Updated] (IGNITE-11016) RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)".
[ https://issues.apache.org/jira/browse/IGNITE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-11016: Summary: RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data (SSL engine error)". (was: RecoveryLastReceivedMessage(NEED_WAIT) write message failed in case of SSL) > RecoveryLastReceivedMessage(NEED_WAIT) fails with "Failed to encrypt data > (SSL engine error)". > -- > > Key: IGNITE-11016 > URL: https://issues.apache.org/jira/browse/IGNITE-11016 > Project: Ignite > Issue Type: Bug >Reporter: Pavel Voronkin >Priority: Major > > The problem is that in case of initiator node doesn't exist in DiscoCache, > but exists in TcpDsicovery ring > we are writing back new RecoveryLastReceivedMessage(NEED_WAIT)) in the below > else: > > if (unknownNode) { > U.warn(log, "Close incoming connection, unknown node [nodeId=" + sndId + ", > ses=" + ses + ']'); > ses.close(); > } > else { > ses.send(new RecoveryLastReceivedMessage(NEED_WAIT)).listen(new > CI1>() { > @Override public void apply(IgniteInternalFuture fut) { > ses.close(); > } > }); > } > In case of SSL such code do encrypt and send concurrently with close which > results in : > > javax.net.ssl.SSLException: Failed to encrypt data (SSL engine error) > [status=CLOSED, handshakeStatus=NEED_UNWRAP, ses=GridSelectorNioSessionImpl > [worker=GridWorker [name=grid-nio-worker-tcp-comm-10, > igniteInstanceName=DPL_GRID%DplGridNodeName, finished=false, > hashCode=1324367867, interrupted=false, > runner=grid-nio-worker-tcp-comm-10-#130%DPL_GRID%DplGridNodeName%]AbstractNioClientWorker > [idx=10, bytesRcvd=121406754, bytesSent=0, bytesRcvd0=16659, bytesSent0=0, > select=true, super=]DirectNioClientWorker [super=], > writeBuf=java.nio.DirectByteBuffer[pos=0 lim=10 cap=32768], > readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], > inRecovery=null, outRecovery=null, super=GridNioSessionImpl > [locAddr=/10.116.69.208:47100, rmtAddr=/10.53.15.23:55380, > createTime=1544502852482, closeTime=0, bytesSent=4076, bytesRcvd=4346, > bytesSent0=4076, bytesRcvd0=4346, sndSchedTime=1544502852522, > lastSndTime=1544502852522, lastRcvTime=1544502852522, readsPaused=false, > filterChain=FilterChain[filters=[, GridConnectionBytesVerifyFilter, SSL > filter], accepted=true, markedForClose=true]]] > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslHandler.encrypt(GridNioSslHandler.java:380) > at > org.apache.ignite.internal.util.nio.ssl.GridNioSslFilter.encrypt(GridNioSslFilter.java:270) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWriteSsl(GridNioServer.java:1465) > at > org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1326) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2374) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2138) > at > org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1792) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-11016) NEED_WAIT write message failed in case of SSL
Pavel Voronkin created IGNITE-11016: --- Summary: NEED_WAIT write message failed in case of SSL Key: IGNITE-11016 URL: https://issues.apache.org/jira/browse/IGNITE-11016 Project: Ignite Issue Type: Bug Reporter: Pavel Voronkin -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703 ] Pavel Voronkin edited comment on IGNITE-10877 at 1/21/19 6:59 AM: -- I don't think it breaks compatibilty, cause we have ignite property to rollback to original behaviour for mixed envs. Moveover GridAffinityAssignment serialization is broken right now. See IGNITE-10925, we need to fix all issues there. was (Author: voropava): I don't think it breaks compatibilty, cause we have ignite property to rollback to original behaviour for mixed envs. Moveover GridAffinityAssignment serialization is broken right now. See IGNITE-10925, we need to fix issue there. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747703#comment-16747703 ] Pavel Voronkin commented on IGNITE-10877: - I don't think it breaks compatibilty, cause we have ignite property to rollback to original behaviour for mixed envs. Moveover GridAffinityAssignment serialization is broken right now. See IGNITE-10925, we need to fix issue there. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Fix For: 2.8 > > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083 ] Pavel Voronkin edited comment on IGNITE-10877 at 1/18/19 9:34 AM: -- 16part, 16 nodes HashSet !image-2019-01-18-12-09-32-876.png! BitSet !image-2019-01-18-12-09-04-835.png! In total we have: N - number of nodes P - number of parts low P, low N - BitSet better high P, low N - BitSet better low P, high N - BitSet slightly better high P, high N - HashSet is better At nodes more than 500 we need compacted BitSet see was (Author: voropava): 16part, 16 nodes HashSet !image-2019-01-18-12-09-32-876.png! BitSet !image-2019-01-18-12-09-04-835.png! In total we have: N - number of nodes P - number of parts low P, low N - BitSet better high P, low N - BitSet better low P, high N - BitSet slightly better high P, high N - HashSet is better I suggest to have threshold of 500. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746083#comment-16746083 ] Pavel Voronkin commented on IGNITE-10877: - 16part, 16 nodes HashSet !image-2019-01-18-12-09-32-876.png! BitSet !image-2019-01-18-12-09-04-835.png! In total we have: N - number of nodes P - number of parts low P, low N - BitSet better high P, low N - BitSet better low P, high N - BitSet slightly better high P, high N - HashSet is better I suggest to have threshold of 500. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-10877: Attachment: image-2019-01-18-12-09-32-876.png > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-10877: Attachment: image-2019-01-18-12-09-04-835.png > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png, > image-2019-01-18-12-09-04-835.png, image-2019-01-18-12-09-32-876.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745931#comment-16745931 ] Pavel Voronkin commented on IGNITE-10877: - 1024part 4k nodes HashSet !image-2019-01-18-11-56-10-339.png! BitSet !image-2019-01-18-11-56-18-040.png! On as small number of partitions BitSet is roughly the same with HashSet. > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-10877: Attachment: image-2019-01-18-11-56-18-040.png > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-10877: Attachment: image-2019-01-18-11-56-10-339.png > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png, > image-2019-01-18-11-56-10-339.png, image-2019-01-18-11-56-18-040.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-10877) GridAffinityAssignment.initPrimaryBackupMaps memory pressure
[ https://issues.apache.org/jira/browse/IGNITE-10877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavel Voronkin updated IGNITE-10877: Attachment: image-2019-01-18-11-55-39-496.png > GridAffinityAssignment.initPrimaryBackupMaps memory pressure > > > Key: IGNITE-10877 > URL: https://issues.apache.org/jira/browse/IGNITE-10877 > Project: Ignite > Issue Type: Improvement >Reporter: Pavel Voronkin >Assignee: Pavel Voronkin >Priority: Major > Attachments: grid.srv.node.1.0-29.12.2018-12.50.15.jfr, > image-2019-01-17-12-58-07-382.png, image-2019-01-17-12-59-52-137.png, > image-2019-01-17-15-45-49-561.png, image-2019-01-17-15-45-53-043.png, > image-2019-01-17-15-46-32-872.png, image-2019-01-18-11-36-57-451.png, > image-2019-01-18-11-38-39-410.png, image-2019-01-18-11-55-39-496.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > 1) While running tests with JFR we observe huge memory allocation pressure > produced by: > *Stack Trace TLABs Total TLAB Size(bytes) Pressure(%)* > java.util.HashMap.newNode(int, Object, Object, HashMap$Node) 481 298 044 784 > 100 > java.util.HashMap.putVal(int, Object, Object, boolean, boolean) 481 298 044 > 784 100 > java.util.HashMap.put(Object, Object) 481 298 044 784 100 > java.util.HashSet.add(Object) 480 297 221 040 99,724 > > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.initPrimaryBackupMaps() > 1 823 744 0,276 > org.apache.ignite.internal.processors.affinity.GridAffinityAssignment.(AffinityTopologyVersion, > List, List) 1 823 744 0,276 > *Allocation stats* > Class Average Object Size(bytes) Total Object Size(bytes) TLABs Average TLAB > Size(bytes) Total TLAB Size(bytes) Pressure(%) > java.util.HashMap$Node 32 15 392 481 619 635,726 298 044 784 32,876 > java.lang.Object[] 1 470,115 461 616 314 655 019,236 205 676 040 22,687 > java.util.HashMap$Node[] 41 268,617 6 149 024 149 690 046,067 102 816 864 > 11,341 > java.lang.Integer 16 1 456 91 662 911,385 60 324 936 6,654 > java.util.ArrayList 24 1 608 67 703 389,97 47 127 128 5,198 > 2) Also another hot place found > Stack Trace TLABs Total TLAB Size(bytes) Pressure(%) > java.util.ArrayList.grow(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureExplicitCapacity(int) 7 5 766 448 9,554 > java.util.ArrayList.ensureCapacityInternal(int) 7 5 766 448 9,554 > java.util.ArrayList.add(Object) 7 5 766 448 9,554 > > org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.nodes(int, > AffinityTopologyVersion, GridDhtPartitionState, GridDhtPartitionState[]) 7 5 > 766 448 9,554 > The reason of that is defail > I think we need to improve memory efficiency by switching from from Sets to > BitSets > > JFR attached, see Allocations in 12:50:28 - 12:50:29 > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)