Vladimir Ozerov created IGNITE-4003:
---------------------------------------
Summary: Slow or faulty client can stall the whole cluster.
Key: IGNITE-4003
URL: https://issues.apache.org/jira/browse/IGNITE-4003
Project: Ignite
Issue Type: Bug
Components: cache, general
Affects Versions: 1.7
Reporter: Vladimir Ozerov
Priority: Critical
Fix For: 1.8
Steps to reproduce:
1) Start two server nodes and some data to cache.
2) Start a client from Docker subnet, which is not visible from the outside.
Client will join the cluster.
3) Try to put something to cache or start another node to force rabalance.
Cluster is stuck at this moment. Root cause - servers are constantly trying to
establish outgoing connection to the client, but fail as Docker subnet is not
visible from the outside. It may stop virtually all cluster operations.
Typical thread dump:
{code}
org.apache.ignite.IgniteCheckedException: Failed to send message (node may have
left the grid or TCP connection cannot be established due to firewall issues)
[node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0,
/172.17.0.6:0], discPort=0, order=7241, intOrder=3707,
lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da,
isClient=true], topic=T4 [topic=TOPIC_CACHE,
id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc,
id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage
[type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db,
data=null, futId=null], policy=2]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:202)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.onMessage(GridDhtPreloader.java:200)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:877)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$MessageHandler.apply(GridDhtPreloader.java:859)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:582)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:280)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:204)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$000(GridCacheIoManager.java:80)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:163)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1058)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:836)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:104)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:799)
[ignite-core-1.5.23.jar:1.5.23]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_51]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: org.apache.ignite.spi.IgniteSpiException: Failed to send message to
remote node: TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0,
/172.17.0.6:0], discPort=0, order=7241, intOrder=3707,
lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da,
isClient=true]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1986)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
... 32 common frames omitted
Caused by: org.apache.ignite.IgniteCheckedException: Failed to connect to node
(is node still alive?). Make sure that each GridComputeTask and
GridCacheTransaction has a timeout set in order to prevent parties from waiting
forever in case of network issues [nodeId=a15d74c2-1ec2-4349-9640-aeacd70d8714,
addrs=[/172.17.0.6:47100, /127.0.0.1:47100]]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2489)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2130)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2024)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1960)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1926)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1124)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476)
[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture$MiniFuture.onResult(GridDhtLockFuture.java:1213)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onResult(GridDhtLockFuture.java:529)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processDhtLockResponse(GridDhtTransactionalCacheAdapter.java:639)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$100(GridDhtTransactionalCacheAdapter.java:89)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:151)
~[ignite-core-1.5.23.jar:1.5.23]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$5.apply(GridDhtTransactionalCacheAdapter.java:149)
~[ignite-core-1.5.23.jar:1.5.23]
... 12 common frames omitted
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect
to address: /172.17.0.6:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
... 35 common frames omitted
Caused by: java.net.SocketTimeoutException: null
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2353)
... 35 common frames omitted
Suppressed: org.apache.ignite.IgniteCheckedException: Failed to connect
to address: /127.0.0.1:47100
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2494)
~[ignite-core-1.5.23.jar:1.5.23]
... 35 common frames omitted
Caused by: org.apache.ignite.IgniteCheckedException: Remote node ID is
not as expected [expected=a15d74c2-1ec2-4349-9640-aeacd70d8714,
rcvd=48cccf25-7c29-4048-bd52-704acdb552e6]
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.safeHandshake(TcpCommunicationSpi.java:2604)
at
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2361)
... 35 common frames omitted
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)