[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-07 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554526#comment-15554526
 ] 

Semen Boikov commented on IGNITE-1924:
--

Looks good now, just move tests you added in IgniteCacheFailoverTestSuiteSsl  
in IgniteSpiCommunicationSelfTestSuite.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-06 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552169#comment-15552169
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Semen, 
fixes applied, please check.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-06 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551498#comment-15551498
 ] 

Semen Boikov commented on IGNITE-1924:
--

Anton, 

I reviewed fix, have some comments:
- correct way to process REMINDER is do it rigth after session registration (do 
ses.removeMeta and filterChain.onMessageReceived inside 'register'), otherwise 
I think there is no guarantee that 'processSelectedKeys' will be called.
- why you can not use here 'decode' buffer as REMINDER (just call 
decode.position(9)):
{noformat}
if (decode.limit() > 9) {
ByteBuffer rem = 
ByteBuffer.allocate(decode.limit());

rem.put(decode);

rem.position(9);

meta.put(REMINDER.ordinal(), rem);
}
{noformat}
- GridNioSslHandler:
{noformat}
ByteBuffer blockInBuf = ses.removeMeta(IN_BUFF.ordinal());

if (blockInBuf != null)
inNetBuf = blockInBuf;
{noformat}
I think here you need assert that blockInBuf was allocated with 'netBufSize' or 
do not use 'blockInBuf' as inNetBuf, but always allocate inNetBuf and it copy 
there.



> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-06 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15551068#comment-15551068
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Hi, 
Fix is checking now, I'll provide info when it becomes checked.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-05 Thread Anand Kumar Sankaran (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549737#comment-15549737
 ] 

Anand Kumar Sankaran commented on IGNITE-1924:
--

@Anton.Vinogradov Is there a PR or a patch with this fix?  We (Workday) want to 
try this in our builds to see if it makes a difference to the bugs we see.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2016-10-04 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546206#comment-15546206
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Situation:
Node configured to use SSL.
Node 1 sends message to node 2
This cause client creation at node 1
This cause sending HandshakeMessage from node 1 to node 2
This cause client creation at node 2 and RecoveryLastReceivedMessage sending to 
node 1 (which should finish handshale with node 1).

Current implementation finishes client future at node 2 right after sending 
RecoveryLastReceivedMessage and allows to use it.
In case TcpCommunicationSpi.sendMessage0() will be called at node 2 right after 
client created, this message can be mixed with RecoveryLastReceivedMessage 
(They both will be sent as a one SSL message).

At node 1 at safeHandshake we can see buf = ByteBuffer.allocate(1000); which 
will read both messages and second message will be lost.

So, this should be fixed.

checked following solutions:
1) Do not allow to mix handshake messages with another by not finishing client 
future untill first message received from node 1.
Implemented and checked it works (at my tests)
Can cause hang while client at node 1 creating.

2) Improved code to send RecoveryLastReceivedMessage as a separate SSL message.
This allow to read both message from socket, then somehow (for example increase 
by 1 byte until decoded) find size of first message, 
then gain second SSL message, save it to session meta and handle it under 
NioClientWorker.processSelectedKeysOptimized 

Problem that ssl message can't be decoded twice, this cause 
javax.net.ssl.SSLException: bad record MAC

3) Gain both messages (decoded) and save second to session meta.
After that it can be added to decoded buffer at SslHandler.onMessageReceived
In this case sometimes I see java.lang.AssertionError: Missed message future, 
when not whole message read from socket.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Priority: Critical
>  Labels: Muted_test
> Fix For: 1.8
>
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2015-11-30 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031686#comment-15031686
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Sure, 

Reopened as muted test.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>  Labels: Muted_test
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2015-11-28 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030497#comment-15030497
 ] 

Yakov Zhdanov commented on IGNITE-1924:
---

Anton, couple of comments:

# we never add comments like this in Ignite: {{//Can fail as described at 
https://issues.apache.org/jira/browse/IGNITE-1924}} - 
org/apache/ignite/spi/communication/tcp/IgniteCacheSslStartStopSelfTest.java:30
# why is this test still running on TC and hanging it?

I think test should be properly muted and never more cause TC hangs.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Critical
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2015-11-26 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029063#comment-15029063
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Im most cases lost message is the first demand message from node just joined to 
topology to coordinator node.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Critical
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2015-11-26 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029069#comment-15029069
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Closed as won't fix, because not important at this moment.

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Critical
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> 

[jira] [Commented] (IGNITE-1924) Incomplete marshaller cache rebalancing causes Grid hangs

2015-11-23 Thread Anton Vinogradov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022162#comment-15022162
 ] 

Anton Vinogradov commented on IGNITE-1924:
--

Cause mostly (only?) at IgniteCacheSslStartStopSelfTest

> Incomplete marshaller cache rebalancing causes Grid hangs
> -
>
> Key: IGNITE-1924
> URL: https://issues.apache.org/jira/browse/IGNITE-1924
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Anton Vinogradov
>Priority: Critical
>
> End of the log.
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,947][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=20660c29-91a1-4279-9dc1-88d192bc6002, partitionsCount=6, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,962][INFO 
> ][exchange-worker-#220584%tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Starting rebalancing 
> [cache=ignite-marshaller-sys-cache, mode=SYNC, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, partitionsCount=7, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], updateSeq=1]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220587%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=00b3a75a-074d-46a5-a158-3956c0ec4000, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> [11:49:32] :   [org.apache.ignite:ignite-core] [11:49:32,963][INFO 
> ][ignite-#220586%marshaller-cache-tcp.IgniteCacheSslStartStopSelfTest3%][GridDhtPartitionDemander]
>   Completed rebalancing 
> [cache=ignite-marshaller-sys-cache, 
> fromNode=108bffdb-1c1e-49aa-9525-b434784fa001, 
> topology=AffinityTopologyVersion [topVer=594, minorTopVer=0], time=21 ms]
> Hang on:
> [11:51:56] :   [org.apache.ignite:ignite-core] Thread 
> [name="ignite-#220562%sys-tcp.IgniteCacheSslStartStopSelfTest3%", id=287517, 
> state=WAITING, blockCnt=0, waitCnt=3]
> [11:51:56] :   [org.apache.ignite:ignite-core] Lock 
> [object=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$RebalanceFuture@b402f89,
>  ownerName=null, ownerId=-1]
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> sun.misc.Unsafe.park(Native Method)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:157)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.util.future.GridFutureAdapter.get(GridFutureAdapter.java:115)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.waitForCacheRebalancing(GridDhtPartitionDemander.java:265)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.access$400(GridDhtPartitionDemander.java:85)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:323)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander$3.call(GridDhtPartitionDemander.java:320)
> [11:51:56] :   [org.apache.ignite:ignite-core] at 
>