[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430419#comment-16430419
 ] 

ASF GitHub Bot commented on IGNITE-7944:


Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3737


> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-08 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429674#comment-16429674
 ] 

Roman Guseinov commented on IGNITE-7944:


[~vozerov] ,thanks for your comments.
1) Yes. It does not lead to uncompleted features. Actually, it prevents 
uncompleted features when createTcpClient returns null.
2) I agree with that. Added an exception instead of log message. Even if we try 
to cancel jobs worker.finishTask(null, err, 
{color:#cc}cancelChildren:{color} true), futures will be completed and we 
will get IgniteClientDisconnectedException.

TC results (Run ALL): 
[https://ci.ignite.apache.org/viewLog.html?buildId=1185411&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll]

"Ignite Platform CPP Win32" has compilation issues which are not related to the 
fix.

New failed tests - 41. Restarted tests:
 * 
[https://ci.ignite.apache.org/viewLog.html?buildId=1185415&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_Basic2]
 * 
[https://ci.ignite.apache.org/viewLog.html?buildId=1185413&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_ActivateDeactivateCluster]

Other failed tests look flaky.

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-06 Thread Vladimir Ozerov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428181#comment-16428181
 ] 

Vladimir Ozerov commented on IGNITE-7944:
-

[~guseinov], [~dpavlov], patch looks simple to me, but I have some doubts 
anyway:
1) Now we do not cancel child tasks in case of disconnect. Is it? Could it lead 
to some uncompleted futures or so?
2) When client is not connected, we do not throw an exception, but just exit 
{{send}} method. Ack closure is not notified either. Can we throw an exception 
instead? What would be the consequences?

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-06 Thread Dmitriy Pavlov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428163#comment-16428163
 ] 

Dmitriy Pavlov commented on IGNITE-7944:


Patch looks good to me, let's consider this test failure is not related to fix. 
Let's keep an eye on TeamCity.

But I would like to get here approve from Ignite veteran, so if [~vozerov], you 
will be able to take a look it would be great.

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427837#comment-16427837
 ] 

Roman Guseinov commented on IGNITE-7944:


[~dpavlov] , thanks for your comment and for restarting "Ignite Queries [1]".

The test was passed: 
[https://ci.ignite.apache.org/viewLog.html?buildId=1181645&buildTypeId=IgniteTests24Java8_IgniteQueries&tab=buildResultsDiv]

 

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Dmitriy Pavlov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427039#comment-16427039
 ] 

Dmitriy Pavlov commented on IGNITE-7944:


Test looks like a newly introduced failure:
 Ignite Queries [1] [ tests 1 ]
   IgniteBinaryCacheQueryTestSuite: 
IgniteSqlSplitterSelfTest.testReplicatedTablesUsingPartitionedCacheClientRO 
(master fail rate 0,0%) 

[~guseinov], could you please check reasons of this failure?

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427003#comment-16427003
 ] 

Roman Guseinov commented on IGNITE-7944:


[~dpavlov] , 

could you review PR [https://github.com/apache/ignite/pull/3737], please? 

TC results: 
[https://ci.ignite.apache.org/viewLog.html?buildId=1180152&tab=queuedBuildOverviewTab]

23 new failed test. I checked them and they look flaky.

Thanks.

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426998#comment-16426998
 ] 

Roman Guseinov commented on IGNITE-7944:


[~amashenkov] , thank you

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Andrew Mashenkov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426992#comment-16426992
 ] 

Andrew Mashenkov commented on IGNITE-7944:
--

[~guseinov],

Fix looks good for me can be merged if there is no TC issues.

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-05 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426898#comment-16426898
 ] 

Roman Guseinov commented on IGNITE-7944:


[~amashenkov] , according to your comments, I added the following changes:
* set job synchronization using Latch instead of Thread.sleep()
* set test timeout = 2 minutes
* use existing LOCAL_IP_FINDER

"assert ignite instanceof IgniteKernal" wasn't added because some tests use 
mock objects instead of IgniteKernal
(GridTcpCommunicationSpiConcurrentConnectSelfTest for example)

Please take a look.

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-03 Thread Roman Guseinov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423663#comment-16423663
 ] 

Roman Guseinov commented on IGNITE-7944:


TC results: 
[https://ci.ignite.apache.org/viewLog.html?buildId=1174100&tab=queuedBuildOverviewTab]

 

> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7944) Disconnected client node tries to send JOB_CANCEL message

2018-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423426#comment-16423426
 ] 

ASF GitHub Bot commented on IGNITE-7944:


GitHub user gromtech opened a pull request:

https://github.com/apache/ignite/pull/3737

IGNITE-7944 Disconnected client node tries to send JOB_CANCEL message

* Skip sending messages if client disconnected.
* Throw IgniteCheckedException if a client node is disconnected and 
communication client is null.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-7944

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3737


commit e043f87b34178d7e010c5e5396382aa557e5486f
Author: Roman Guseinov 
Date:   2018-03-16T02:55:18Z

IGNITE-7944 Disconnected client node tries to send JOB_CANCEL message

* Skip sending message if client disconnected.
* Throw IgniteCheckedException if a client node is disconnected and 
communication client is null.




> Disconnected client node tries to send JOB_CANCEL message
> -
>
> Key: IGNITE-7944
> URL: https://issues.apache.org/jira/browse/IGNITE-7944
> Project: Ignite
>  Issue Type: Bug
>  Components: messaging
>Affects Versions: 1.9, 2.3
>Reporter: Roman Guseinov
>Assignee: Roman Guseinov
>Priority: Major
> Fix For: 2.5
>
> Attachments: Reproducer7944.java
>
>
> In case the network is blocked (socket connections not closed) and failure is 
> detected, tcp-client-disco-msg-worker thread can be stuck in process of 
> TcpClient creating:
> {code:java}
> "tcp-client-disco-msg-worker-#4%wd5prsvtots0016a-tg-QueryFabric%" #494 prio=5 
> os_prio=0 tid=0x7f94c067c800 nid=0x2bdf runnable [0x7f960ecf1000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.Net.poll(Native Method)
> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:954)
> - locked <0x7fa140f520c0> (a java.lang.Object)
> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:110)
> - locked <0x7fa140f520b0> (a java.lang.Object)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2950)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2681)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2568)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2429)
> at 
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2393)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1590)
> at 
> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1659)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.cancelChildren(GridTaskWorker.java:1305)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1609)
> at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1581)
> at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.onDisconnected(GridTaskProcessor.java:168)
> at 
> org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3460)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:601)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2407)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2386)
> at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1714)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> It looks like msg-worker is trying to send JOB_CANCEL message for each job 
> with timeout equals failureDetectionTimeout.
> Reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)