[ 
https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729499#comment-15729499
 ] 

Andrey Gura commented on IGNITE-4003:
-------------------------------------

Minor fixes. New PR created https://github.com/apache/ignite/pull/1328.

{{TcpCommunicationSpi.ConnectGate}} should be optimized because now it has 
single contention point (mutex).

Waiting for TC. 

Please, review.

> Slow or faulty client can stall the whole cluster.
> --------------------------------------------------
>
>                 Key: IGNITE-4003
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4003
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache, general
>    Affects Versions: 1.7
>            Reporter: Vladimir Ozerov
>            Assignee: Andrey Gura
>            Priority: Critical
>             Fix For: 2.0
>
>
> Steps to reproduce:
> 1) Start two server nodes and some data to cache.
> 2) Start a client from Docker subnet, which is not visible from the outside. 
> Client will join the cluster.
> 3) Try to put something to cache or start another node to force rabalance.
> Cluster is stuck at this moment. Root cause - servers are constantly trying 
> to establish outgoing connection to the client, but fail as Docker subnet is 
> not visible from the outside. It may stop virtually all cluster operations.
> Typical thread dump:
> {code}
> org.apache.ignite.IgniteCheckedException: Failed to send message (node may 
> have left the grid or TCP connection cannot be established due to firewall 
> issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true], topic=T4 [topic=TOPIC_CACHE, 
> id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, 
> id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage 
> [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, 
> data=null, futId=null], policy=2]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_51]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_51]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message 
> to remote node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, 
> /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, 
> lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, 
> isClient=true]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
>  [ignite-core-1.5.23.jar:1.5.23]
>       ... 32 common frames omitted
> Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to 
> node (is node still alive?). Make sure that each GridComputeTask and 
> GridCacheTransaction has a timeout set in order to prevent parties from 
> waiting forever in case of network issues 
> [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714, addrs=[/172.17.0.6:47100, 
> /127.0.0.1:47100]]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
>  [ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>       ... 12 common frames omitted
>       Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
> to address: /172.17.0.6:47100
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>               ... 35 common frames omitted
>       Caused by: java.net.SocketTimeoutException: null
>               at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
>               ... 35 common frames omitted
>       Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect 
> to address: /127.0.0.1:47100
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
>  ~[ignite-core-1.5.23.jar:1.5.23]
>               ... 35 common frames omitted
>       Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is 
> not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714, 
> rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
>               at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
>               ... 35 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to