[ 
https://issues.apache.org/jira/browse/IGNITE-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985178#comment-14985178
 ] 

Semen Boikov edited comment on IGNITE-1758 at 11/2/15 1:05 PM:
---------------------------------------------------------------

Found more issues: 
- TcpDiscoveryClientReconnectMessage did not have TcpDiscoveryEnsureDelivery 
annotation, so client could hang waiting for reconenct message response
- client GridDhtPartitionsExchangeFuture could hang if all server nodes that 
were alive at the time when client joined failed


was (Author: sboikov):
Found one more issue: TcpDiscoveryClientReconnectMessage did not have 
TcpDiscoveryEnsureDelivery annotation, so client could hang waiting for 
reconenct message response. 

> Clients don't survive during massive servers shutdown
> -----------------------------------------------------
>
>                 Key: IGNITE-1758
>                 URL: https://issues.apache.org/jira/browse/IGNITE-1758
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: ignite-1.4
>            Reporter: Denis Magda
>            Assignee: Semen Boikov
>            Priority: Blocker
>             Fix For: 1.5
>
>         Attachments: ignite-1758-test.patch
>
>
> There is a real world use case.
> Start sensible amount of servers and clients.
> Perform cache operations under a transaction.
> Stop a half of the servers. Clients must survive and keep execution their 
> transactions.
> Did the following test:
> - Started 14 servers and 14 clients;
> - Clients execute transactional put operations;
> - Stopped 7 servers.
> Getting different assertions on clients side.
> {noformat}
> [15:47:33,401][ERROR][tcp-client-disco-msg-worker-#521%internal.IgniteClientReconnectCacheMultiThreadedTest18][TcpDiscoverySpi]
>  Runtime error caught during grid runnable execution: IgniteSpiThread 
> [name=tcp-client-disco-msg-worker-#521%internal.IgniteClientReconnectCacheMultiThreadedTest18]
> java.lang.AssertionError: lastVer=29, newVer=32, locNode=TcpDiscoveryNode 
> [id=80f14def-9d49-43a0-96bc-6b83aedb3008, addrs=[127.0.0.1], 
> sockAddrs=[/127.0.0.1:0], discPort=0, order=26, intOrder=0, 
> lastExchangeTime=1445428036418, loc=true, ver=1.4.1#19700101-sha1:00000000, 
> isClient=true], msg=TcpDiscoveryNodeFailedMessage 
> [failedNodeId=3020dc65-ed3e-426f-8784-5bb766961003, order=4, warning=null, 
> super=TcpDiscoveryAbstractMessage 
> [sndNodeId=10c5cfe9-df07-4dfe-a5c0-460087aa9001, 
> id=eed3e3a8051-008a978d-28cc-4f0c-8728-4a815f858000, 
> verifierNodeId=800cf998-828e-4f56-af6a-c2760c5ed008, topVer=32, pendingIdx=0, 
> isClient=false]]
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.updateTopologyHistory(ClientImpl.java:720)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.access$2700(ClientImpl.java:118)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeFailedMessage(ClientImpl.java:1812)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:1543)
>       at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1467)
>       at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}
> {noformat}
> java.lang.AssertionError: Missed message future [rcvCnt=141, acked=0, 
> desc=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=0, 
> reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode 
> [id=6090f64b-e019-440b-9d0e-c3642bd3a006, addrs=[127.0.0.1], 
> sockAddrs=[/127.0.0.1:47503], discPort=47503, order=3, intOrder=3, 
> lastExchangeTime=1445428027468, loc=false, ver=1.4.1#19700101-sha1:00000000, 
> isClient=false], connected=false, connectCnt=1, queueLimit=5120]]
>       at 
> org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.ackReceived(GridNioRecoveryDescriptor.java:181)
>       at 
> org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor.onHandshake(GridNioRecoveryDescriptor.java:251)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2331)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2084)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:1978)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1914)
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1880)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1066)
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1214)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor.publish(GridClockSyncProcessor.java:305)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor.access$800(GridClockSyncProcessor.java:54)
>       at 
> org.apache.ignite.internal.processors.clock.GridClockSyncProcessor$TimeCoordinator.body(GridClockSyncProcessor.java:382)
>       at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to