[ 
https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16799408#comment-16799408
 ] 

Ignite TC Bot commented on IGNITE-6587:
---------------------------------------

{panel:title=--> Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Platform .NET (Core Linux){color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386454]]

{color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 0 TIMEOUT , Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386456]]

{color:#d04437}Client Nodes{color} [[tests 0 TIMEOUT , Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386458]]

{color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure 
on metric |https://ci.ignite.apache.org/viewLog.html?buildId=3386476]]

{color:#d04437}Thin client: PHP{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386482]]

{color:#d04437}Hibernate 5.3{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386484]]

{color:#d04437}Thin client: Node.js{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386486]]

{color:#d04437}Thin client: Python{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386492]]

{color:#d04437}Spring (Data){color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386496]]

{color:#d04437}Queries 1{color} [[tests 
6|https://ci.ignite.apache.org/viewLog.html?buildId=3386460]]
* IgniteBinaryCacheQueryTestSuite: 
SchemaExchangeSelfTest.testServerRestartWithNewTypes - 0,0% fails in last 422 
master runs.

{color:#d04437}Cache 1{color} [[tests 
10|https://ci.ignite.apache.org/viewLog.html?buildId=3386462]]
* IgniteBinaryCacheTestSuite: 
DataStreamerClientReconnectAfterClusterRestartTest.testTwoClientsAllowOverwrite 
- 0,0% fails in last 419 master runs.
* IgniteBinaryCacheTestSuite: 
DataStreamerClientReconnectAfterClusterRestartTest.testOneClientAllowOverwrite 
- 0,0% fails in last 419 master runs.
* IgniteBinaryCacheTestSuite: 
DataStreamerClientReconnectAfterClusterRestartTest.testTwoClients - 0,0% fails 
in last 419 master runs.
* IgniteBinaryCacheTestSuite: 
DataStreamerClientReconnectAfterClusterRestartTest.testOneClient - 0,0% fails 
in last 419 master runs.

{color:#d04437}PDS (Indexing){color} [[tests 3 Out Of Memory Error 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386464]]
* IgnitePdsWithIndexingCoreTestSuite: 
IgniteLogicalRecoveryTest.testRecoveryOnDynamicallyStartedCaches - 0,0% fails 
in last 414 master runs.
* IgnitePdsWithIndexingCoreTestSuite: 
IgnitePdsThreadInterruptionTest.testInterruptsOnWALWrite - 0,0% fails in last 
414 master runs.

{color:#d04437}Cache 3{color} [[tests 
3|https://ci.ignite.apache.org/viewLog.html?buildId=3386466]]
* IgniteBinaryObjectsCacheTestSuite3: 
CacheMetricsManageTest.testJmxPdsStatisticsEnable
* IgniteBinaryObjectsCacheTestSuite3: 
CacheGroupsMetricsRebalanceTest.testRebalanceEstimateFinishTime

{color:#d04437}Queries 2{color} [[tests 
13|https://ci.ignite.apache.org/viewLog.html?buildId=3386468]]
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentTransactionalReplicatedSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
IgniteCacheQueryNodeRestartSelfTest2.testRestarts - 0,0% fails in last 0 master 
runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentAtomicReplicatedSelfTest.testClientReconnectWithNonDynamicCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicIndexReplicatedAtomicConcurrentSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentAtomicPartitionedSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentAtomicReplicatedSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentTransactionalPartitionedSelfTest.testClientReconnectWithNonDynamicCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicIndexPartitionedAtomicConcurrentSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentTransactionalReplicatedSelfTest.testClientReconnectWithNonDynamicCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicIndexReplicatedTransactionalConcurrentSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentAtomicPartitionedSelfTest.testClientReconnectWithNonDynamicCacheRestart
 - 0,0% fails in last 426 master runs.
* IgniteBinaryCacheQueryTestSuite2: 
DynamicColumnsConcurrentTransactionalPartitionedSelfTest.testClientReconnectWithCacheRestart
 - 0,0% fails in last 426 master runs.

{color:#d04437}ZooKeeper (Discovery) 2{color} [[tests 
4|https://ci.ignite.apache.org/viewLog.html?buildId=3386470]]
* ZookeeperDiscoverySpiTestSuite2: IgniteClientReconnectCacheTest.testReconnect 
- 0,0% fails in last 416 master runs.
* ZookeeperDiscoverySpiTestSuite2: 
IgniteClientReconnectCacheTest.testReconnectClusterRestart - 0,0% fails in last 
416 master runs.
* ZookeeperDiscoverySpiTestSuite2: 
IgniteClientReconnectCacheTest.testReconnectCacheDestroyedAndCreated - 0,0% 
fails in last 416 master runs.

{color:#d04437}Cache 2{color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=3386472]]
* IgniteCacheTestSuite2: 
IgniteCacheClientNodeChangingTopologyTest.testPessimisticTx2 - 0,0% fails in 
last 418 master runs.
* IgniteCacheTestSuite2: 
IgniteClientCacheStartFailoverTest.testClientStartLastServerFailsTx - 0,0% 
fails in last 418 master runs.

{color:#d04437}Continuous Query 1{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=3386480]]
* IgniteCacheQuerySelfTestSuite3: 
CacheContinuousQueryConcurrentPartitionUpdateTest.testConcurrentUpdatesAndQueryStartTx
 - 0,0% fails in last 422 master runs.

{color:#d04437}Web Sessions{color} [[tests 
4|https://ci.ignite.apache.org/viewLog.html?buildId=3386490]]
* IgniteWebSessionSelfTestSuite: WebSessionSelfTest.testClientReconnectRequest 
- 0,0% fails in last 426 master runs.

{color:#d04437}Basic 3{color} [[tests 
1|https://ci.ignite.apache.org/viewLog.html?buildId=3386494]]
* IgniteBasicWithPersistenceTestSuite: 
PluginNodeValidationTest.testValidationException

{color:#d04437}Platform C++ (Win x64 | Release){color} [[tests 9 Failure on 
metric , BuildFailureOnMessage 
|https://ci.ignite.apache.org/viewLog.html?buildId=3386478]]
* IgniteOdbcTest: QueriesTestSuite: TestNotFullInsertBatchSelect4096 - 0,8% 
fails in last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestManyCursorsSelectMerge2 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestManyCursorsTwoSelects2 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect1025 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect100 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect2000 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect1000 - 0,8% fails in 
last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestNotFullInsertBatchSelect1500 - 0,8% 
fails in last 790 master runs.
* IgniteOdbcTest: QueriesTestSuite: TestInsertBatchSelect1024 - 0,6% fails in 
last 790 master runs.

{panel}
[TeamCity *--> Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3372451&buildTypeId=IgniteTests24Java8_RunAll]

> Ignite watchdog service
> -----------------------
>
>                 Key: IGNITE-6587
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6587
>             Project: Ignite
>          Issue Type: Improvement
>          Components: general
>    Affects Versions: 2.2
>            Reporter: Alexey Goncharuk
>            Assignee: Andrey Kuznetsov
>            Priority: Major
>              Labels: IEP-5
>             Fix For: 2.7
>
>         Attachments: watchdog.sh
>
>
> As described in [1], each Ignite node has a number of system-critical 
> threads. We should implement a periodic check that calls failure handler when 
> one of the following conditions has been detected:
> * Critical thread is not alive anymore.
> * Critical thread 'hangs' for a long time, e.g. while executing a task 
> extracted from task queue.
> In case of failure condition, call stacks of all threads should be logged 
> before invoking failure handler.
> Actual list of system-critical threads can be found at [1].
> Implementations based on separate diagnostic thread seem fragile, cause this 
> thread become a vulnerable point with respect to thread termination and CPU 
> resource starvation. So we are to use self-monitoring approach: critical 
> threads themselves should monitor each other.
> Currently we have {{o.a.i.internal.worker.WorkersRegistry}} facility that 
> fits best to store and track system critical threads. All of them should be 
> refactored to be {{GridWorker's}} and added to {{WorkersRegistry}}. Each 
> worker should periodically choose some subset of peer workers and check 
> whether
> * All of them are alive.
> * All of them are actively running.
> It's required to add a 'heartbeat' timestamp to worker in order to implement 
> latter check. Additionally, infinite queue polls, waits on monitors or thread 
> parks should be refactored to their timed equivalents in system critical 
> threads.
> Monitoring parameters (enable/disable, check interval, thread 'hang' 
> threshold, etc.) are to be set via system properties.
> [1] 
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-14+Ignite+failures+handling



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to