[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729499#comment-15729499 ]
Andrey Gura commented on IGNITE-4003: ------------------------------------- Minor fixes. New PR created https://github.com/apache/ignite/pull/1328. {{TcpCommunicationSpi.ConnectGate}} should be optimized because now it has single contention point (mutex). Waiting for TC. Please, review. > Slow or faulty client can stall the whole cluster. > -------------------------------------------------- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general > Affects Versions: 1.7 > Reporter: Vladimir Ozerov > Assignee: Andrey Gura > Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799) > [ignite-core-1.5.23.jar:1.5.23] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_51] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51] > Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message > to remote node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124) > [ignite-core-1.5.23.jar:1.5.23] > ... 32 common frames omitted > Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to > node (is node still alive?). Make sure that each GridComputeTask and > GridCacheTransaction has a timeout set in order to prevent parties from > waiting forever in case of network issues > [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[/172.17.0.6:47100, > /127.0.0.1:47100]] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149) > ~[ignite-core-1.5.23.jar:1.5.23] > ... 12 common frames omitted > Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect > to address: /172.17.0.6:47100 > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494) > ~[ignite-core-1.5.23.jar:1.5.23] > ... 35 common frames omitted > Caused by: java.net.SocketTimeoutException: null > at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353) > ... 35 common frames omitted > Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect > to address: /127.0.0.1:47100 > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494) > ~[ignite-core-1.5.23.jar:1.5.23] > ... 35 common frames omitted > Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is > not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714, > rcvd=48cccf25-7c29-4048-bd52-704acdb552e6] > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604) > at > org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361) > ... 35 common frames omitted > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)