[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2020-07-09 Thread Aleksey Plekhanov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154230#comment-17154230
 ] 

Aleksey Plekhanov commented on IGNITE-3212:
---

[~cguo1], can you check your case with the actual Ignite version (2.8.1). There 
are a lot of improvements were made to partition-map-exchange in 2.8 release, 
perhaps this ticket is not actual anymore.

> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Priority: Critical
> Fix For: 3.0, 2.9
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, 
> /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, 
> lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] 
> Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait 
> for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting 
> for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings are on other nodes.
> {noformat}
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait 
> for partition release future [topVer=AffinityTopologyVersion [topVer=29, 
> minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending 
> objects that might be the cause:
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready 
> affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1]
> [08:09:45,826][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange 
> future: GridDhtPartitionsExchangeFuture ...
> {noformat}
> Load config:
> - 1 client, 20 servers (5 servers per 1 host)
> - warmup 60
> - duration 66h
> - preload 5M
> - key range 10M
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL 
> PUT_IF_ABSENT REPLACE
> - backups count 3
> - 3 servers restart every 15 min with 30 sec step, pause between stop and 
> start 5min



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2020-06-08 Thread Cong Guo (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17128626#comment-17128626
 ] 

Cong Guo commented on IGNITE-3212:
--

This issue still exists. Do you have any clue?

We are using Ignite 2.7.0 and seeing this issue sometimes. We have three server 
nodes embedded in our application nodes. For example, with three nodes A, B, C, 
when we restart A,  the whole cluster gets stuck during partition map exchange.

A logs:

"Failed to wait for initial partition map exchange."

then repeats forever:

"Failed to wait for partition map exchange" and a log of dump

The coordinator is C. C logs:

"Unable to await partitions release latch within timeout: ServerLatch 
[permits=1, pendingAcks=[070d58f2-a1a8-4b5f-b986-06a63ac18fb2], 
super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=87, 
minorTopVer=0]]]"

and "Failed to roll back transaction (cache may contain stale locks)"

The pending ACK should be from B (070d58f2-a1a8-4b5f-b986-06a63ac18fb2).

B logs:

"Failed to wait for partition release future" and dumps many pending 
transactions.

Our transaction time out is 35 seconds, but the duration of many of these 
transactions is much longer than 35 seconds, and the transaction state is 
MARKED_ROLLBACK and timeout=true.

 

 

 

 

 

 

 

 

 

> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Priority: Critical
> Fix For: 3.0, 2.9
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, 
> /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, 
> lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] 
> Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait 
> for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting 
> for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings are on other nodes.
> {noformat}
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait 
> for partition release future [topVer=AffinityTopologyVersion [topVer=29, 
> minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending 
> objects that might be the cause:
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready 
> affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1]
> [08:09:45,826][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange 
> future: GridDhtPartitionsExchangeFuture ...
> {noformat}
> Load config:
> - 1 client, 20 servers (5 servers per 1 host)
> - warmup 60
> - duration 66h
> - preload 5M
> - key range 10M
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL 
> PUT_IF_ABSENT REPLACE
> - backups count 3
> - 3 servers restart every 15 min with 30 sec step, pause between stop and 
> start 5min



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2017-07-17 Thread Rohit Kanchan (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090627#comment-16090627
 ] 

Rohit Kanchan commented on IGNITE-3212:
---

We are trying to use Embedded ignite for our spark streaming application, we 
are also seeing similar issue where it is getting stuck with similar stack 
trace and thread dump. We are using Ignite 2.0 version. I will try to dig 
deeper and see if I can find something what is causing this. 

> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Assignee: Ksenia Rybakova
>Priority: Critical
> Fix For: 1.9
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, 
> /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, 
> lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] 
> Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait 
> for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting 
> for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings are on other nodes.
> {noformat}
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait 
> for partition release future [topVer=AffinityTopologyVersion [topVer=29, 
> minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending 
> objects that might be the cause:
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready 
> affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1]
> [08:09:45,826][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange 
> future: GridDhtPartitionsExchangeFuture ...
> {noformat}
> Load config:
> - 1 client, 20 servers (5 servers per 1 host)
> - warmup 60
> - duration 66h
> - preload 5M
> - key range 10M
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL 
> PUT_IF_ABSENT REPLACE
> - backups count 3
> - 3 servers restart every 15 min with 30 sec step, pause between stop and 
> start 5min



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2016-06-17 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335890#comment-15335890
 ] 

Semen Boikov commented on IGNITE-3212:
--

Issues with GridCacheTxRecoveryFuture.onNodeLeft are fixed (commit #4e82af8), 
also verified that iteration over 'IgniteTxManager.completedVersHashMap' is not 
an issue.

> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Assignee: Semen Boikov
> Fix For: 1.7
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, 
> /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, 
> lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] 
> Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait 
> for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting 
> for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings are on other nodes.
> {noformat}
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait 
> for partition release future [topVer=AffinityTopologyVersion [topVer=29, 
> minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending 
> objects that might be the cause:
> [08:09:45,822][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready 
> affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1]
> [08:09:45,826][WARN 
> ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange 
> future: GridDhtPartitionsExchangeFuture ...
> {noformat}
> Load config:
> - 1 client, 20 servers (5 servers per 1 host)
> - warmup 60
> - duration 66h
> - preload 5M
> - key range 10M
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL 
> PUT_IF_ABSENT REPLACE
> - backups count 3
> - 3 servers restart every 15 min with 30 sec step, pause between stop and 
> start 5min



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2016-06-02 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312057#comment-15312057
 ] 

Semen Boikov commented on IGNITE-3212:
--

Another issue, observed this stacktrace in thread dump at the moment when node 
logs 'Failed to wait for partition release future' message:

{noformat}
Thread [name="disco-event-worker-#78%null%", id=94, state=RUNNABLE, blockCnt=6, 
waitCnt=15904]
at 
o.a.i.i.processors.cache.transactions.IgniteTxManager.txsPreparedOrCommitted(IgniteTxManager.java:1830)
at 
o.a.i.i.processors.cache.transactions.IgniteTxManager.txsPreparedOrCommitted(IgniteTxManager.java:1638)
at 
o.a.i.i.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:189)
at 
o.a.i.i.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
at 
o.a.i.i.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
at 
o.a.i.i.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
at 
o.a.i.i.processors.cache.distributed.GridCacheTxRecoveryFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:404)
at 
o.a.i.i.processors.cache.GridCacheMvccManager$3.onEvent(GridCacheMvccManager.java:253)
at 
o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:770)
at 
o.a.i.i.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:755)
at 
o.a.i.i.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:295)
at 
o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2078)
at 
o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2285)
at 
o.a.i.i.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2118)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
{noformat}

At least this execution should be moved out of 'disco-event-worker', also in 
'txsPreparedOrCommitted' there is an iteration over 
'IgniteTxManager.completedVersHashMap', need check how long it takes if 
completedVersHashMap has max allowed size (262144), probably it should be 
optimized if there are multiple GridCacheTxRecoveryFutures.

> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Assignee: Semen Boikov
> Fix For: 1.7
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, 
> /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, 
> lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] 
> Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait 
> for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting 
> for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings 

[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

2016-06-02 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312014#comment-15312014
 ] 

Semen Boikov commented on IGNITE-3212:
--

Observed StackOverflowError in logs:
{noformat}
at java.lang.StringCoding.deref(StringCoding.java:63)
at java.lang.StringCoding.decode(StringCoding.java:179)
at java.lang.String.(String.java:416)
at java.lang.String.(String.java:481)
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:117)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2360)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2137)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2031)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1967)
at 
org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1933)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1285)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1354)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:707)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:856)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:170)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:173)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:173)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
...
...
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
at 
org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
{noformat}


> Servers get stuck with the warning "Failed to wait for initial partition map 
> exchange" during falover test
> --
>
> Key: IGNITE-3212
> URL: https://issues.apache.org/jira/browse/IGNITE-3212
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Ksenia Rybakova
>Assignee: Semen Boikov
> Fix For: 1.7
>
>
> Servers being restarted during failover test get stuck after some time with 
> the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
> [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, 
> /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, 
> lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] 
> Added new node to topology: TcpDiscoveryNode 
>