[ https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504922#comment-16504922 ]
Alexey Goncharuk commented on IGNITE-8657: ------------------------------------------ [~sergey-chugunov], I think I've found an issue in the tests: Take a look at the latest run of Binary Objects (Simple Mapper Basic) https://ci.ignite.apache.org/viewLog.html?buildId=1367214&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperBasic&tab=buildResultsDiv I see the following assertion in the log {code} [16:30:59]W: [org.apache.ignite:ignite-core] java.lang.AssertionError: TcpDiscoveryNode [id=d089379e-11db-453f-99a0-a270bc200002, addrs=[127.0.0.1], sockAddrs=[/127.0.0.1:47502], discPort=47502, order=341, intOrder=172, lastExchangeTime=1528378258963, loc=false, ver=2.6.0#20180607-sha1:8f8efe4f, isClient=false] [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.IgniteNeedReconnectException.<init>(IgniteNeedReconnectException.java:38) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.forceClientReconnect(GridDhtPartitionsExchangeFuture.java:2051) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1569) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2837) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2816) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125) [16:30:59]W: [org.apache.ignite:ignite-core] at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091) [16:30:59]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [16:30:59]W: [org.apache.ignite:ignite-core] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [16:30:59]W: [org.apache.ignite:ignite-core] at java.lang.Thread.run(Thread.java:745) {code} Looks like the exception may be deserialized on a non-client node, so the assertion should be removed and properly handled on receive. > Simultaneous start of bunch of client nodes may lead to some clients hangs > -------------------------------------------------------------------------- > > Key: IGNITE-8657 > URL: https://issues.apache.org/jira/browse/IGNITE-8657 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.5 > Reporter: Sergey Chugunov > Assignee: Sergey Chugunov > Priority: Major > Fix For: 2.6 > > > h3. Description > PartitionExchangeManager uses a system property > *IGNITE_EXCHANGE_HISTORY_SIZE* to manage max number of exchange objects and > optimize memory consumption. > Default value of the property is 1000 but in scenarios with many caches and > partitions it is reasonable to set exchange history size to a smaller values > around few dozens. > Then if user starts up at once more client nodes than history size some > clients may hang because their exchange information was preempted and no > longer available. > h3. Workarounds > Two workarounds are possible: > * Do not start at once more clients than history size. > * Restart hanging client node. > h3. Solution > Forcing client node to reconnect when server detected loosing its exchange > information prevents client nodes hanging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)