[ 
https://issues.apache.org/jira/browse/IGNITE-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504922#comment-16504922
 ] 

Alexey Goncharuk commented on IGNITE-8657:
------------------------------------------

[~sergey-chugunov], I think I've found an issue in the tests:
Take a look at the latest run of Binary Objects (Simple Mapper Basic) 
https://ci.ignite.apache.org/viewLog.html?buildId=1367214&buildTypeId=IgniteTests24Java8_BinaryObjectsSimpleMapperBasic&tab=buildResultsDiv
 
I see the following assertion in the log
{code}
[16:30:59]W:             [org.apache.ignite:ignite-core] 
java.lang.AssertionError: TcpDiscoveryNode 
[id=d089379e-11db-453f-99a0-a270bc200002, addrs=[127.0.0.1], 
sockAddrs=[/127.0.0.1:47502], discPort=47502, order=341, intOrder=172, 
lastExchangeTime=1528378258963, loc=false, ver=2.6.0#20180607-sha1:8f8efe4f, 
isClient=false]
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.IgniteNeedReconnectException.<init>(IgniteNeedReconnectException.java:38)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.forceClientReconnect(GridDhtPartitionsExchangeFuture.java:2051)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1569)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:138)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:345)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:325)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2837)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2816)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[16:30:59]W:             [org.apache.ignite:ignite-core]        at 
java.lang.Thread.run(Thread.java:745)
{code}

Looks like the exception may be deserialized on a non-client node, so the 
assertion should be removed and properly handled on receive.

> Simultaneous start of bunch of client nodes may lead to some clients hangs
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-8657
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8657
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.5
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>             Fix For: 2.6
>
>
> h3. Description
> PartitionExchangeManager uses a system property 
> *IGNITE_EXCHANGE_HISTORY_SIZE* to manage max number of exchange objects and 
> optimize memory consumption.
> Default value of the property is 1000 but in scenarios with many caches and 
> partitions it is reasonable to set exchange history size to a smaller values 
> around few dozens.
> Then if user starts up at once more client nodes than history size some 
> clients may hang because their exchange information was preempted and no 
> longer available.
> h3. Workarounds
> Two workarounds are possible: 
> * Do not start at once more clients than history size.
> * Restart hanging client node.
> h3. Solution
> Forcing client node to reconnect when server detected loosing its exchange 
> information prevents client nodes hanging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to