[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-12-14 Thread Pavel Kovalenko (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721420#comment-16721420
 ] 

Pavel Kovalenko commented on IGNITE-4210:
-

[~Alexey Kuznetsov] 
I've reviewed your solution and have several concerns about it's implementation 
and test.
1) Current implentation of cache loading mechanism doesn't fit into requirement 
that there should be no ongoing update operations during PME.
2) We're waiting for finishing that operations at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture#waitPartitionRelease
 method . Cache load operations are not covered in partitions release future 
and have their custom workflow which is big miss now and this should be fixed.
3) GridDhtInvalidPartitionException is throwed only when affinity for that 
partition is changed. What will be happened when some partitions are moved to 
other nodes, some not? In that case some of the cache loads will be finished 
with exception, some of them not. This will cause of data inconsistency between 
nodes after PME is finished.
4) For me test doesn't cover well case when cluster topology is changed. Future 
that starts new nodes can be completed before you run load to cache store. 
Moreover test don't check that event with ClusterTopologyCheckedException is 
fired always when topology is changed. It means that sometimes test checks 
negative (expected) scenario, sometimes positive scneario which leads to test 
flakiness. 

>From my side of view correct solution should be implemented as below:
1) Introduce new cache future in 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager (as welll as 
locks, transactions, atomic updates futures)  that will indicate that cache 
loading is in progress.
2) Create and register this future at the beginning of 
org.apache.ignite.internal.processors.cache.store.CacheStoreManager#loadAll 
method.
3) Future should have ability to cancel. Add this future as a part of 
partitionsRelease future and cancel it when waitPartitionsRelease event is 
happened.
4) Divide whole keys set to micro-batches in loadAll. At the end of each of 
micro-batch, check that CacheLoadFuture is not cancelled by 
waitPartitionsRelease.
5) If future was cancelled, immediately finish this future to unblock 
waitPartitionsRelease and throw appropriate exception to user, that topology is 
changed and operation should be retired.
6) Fix test in a way to have 100% guarantees that it checks negative scenario 
with cluster topology changing.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-08-02 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566983#comment-16566983
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

[~agura] Hi
Now, exception is thrown to user when topology changes during cache store 
loading.

Could you review the pr ?

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-07-12 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541377#comment-16541377
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

[~agura] In case of topology changes during store loading I wanted to throw 
*ClusterTopologyCheckedException* with retry future, but it failed to be 
serialized.
Imagine client calling store load, and loading failed on remote node, 

In this case *ClusterTopologyCheckedException* would be returned back to 
client, but without retry future.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-07-04 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532671#comment-16532671
 ] 

Andrey Gura commented on IGNITE-4210:
-

[~Alexey Kuznetsov] I don't see any arguments for blocking PME. Any operations 
on the cluster are impossible during PME so, as I said early, we must not block 
PME.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-29 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527520#comment-16527520
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

[~sergey-chugunov] [~agura] Hi
Andrey, you propose to cancel cache store loading if PME is started.
But there is a drawback in this idea : user might prefer PME hang than 
canceling cache store loading(and he could be confused seeing cache store 
loading was canceled by ignite).

I propose to block PME if cache store loading is in progress. We should warn 
user that cache store loading might cause massive data rebalancing.
This solution looks more naturally for me.

What do you think about it?

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.7
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-18 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515567#comment-16515567
 ] 

Andrey Gura commented on IGNITE-4210:
-

[~Alexey Kuznetsov] I think we should cancel cache store loading in mentioned 
cases.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-13 Thread Alexey Kuznetsov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511060#comment-16511060
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

[~agura] 
Am I understand you correctly ? :
1) User starts 1 grid
2) User initiates cache store loading
3) Additional grids connect to cluster. During PME they observe cache store 
loading progress(certain future was created on initiator), 
they cancel cache store loading, pass exception to user.
4) User receives exception during mass node start. Cache contains some values, 
loaded from store.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-06-13 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510877#comment-16510877
 ] 

Andrey Gura commented on IGNITE-4210:
-

[~Alexey Kuznetsov] From my point of view it's a not good solution because it 
will block partition map exchange until loadCache finish data loading and then 
will lead to massive data rebalancing. Better way, I believe, it's passing 
exception to user code in case of topology changes and then user will have 
possibility to manage initial data loading (e.g. user can split whole data set 
on blocks and retry loading of block on which topology is changed).

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-05-29 Thread Andrey Gura (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493695#comment-16493695
 ] 

Andrey Gura commented on IGNITE-4210:
-

I'll take a look.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-05-24 Thread Dmitriy Pavlov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488759#comment-16488759
 ] 

Dmitriy Pavlov commented on IGNITE-4210:


[~agura], could you please take a look? 

TC run seems to be good, license failure is not related to fix.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-05-22 Thread Alexey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483807#comment-16483807
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

[~dpavlov] hi. 
Can you plz review the ticket?
All failed tests doesn't correspond to the patch.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.6
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-04-27 Thread Alexey Kuznetsov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456279#comment-16456279
 ] 

Alexey Kuznetsov commented on IGNITE-4210:
--

The root cause of the bug identified.
grid.cache(DEFAULT_CACHE_NAME).loadCache(null) performs cache loading only on 
few nodes(ususally one) because other nodes are in the middle of process of 
joining cluster.

In unstable topology(multiple nodes join cluster) some entries aren't get 
loaded into the cache , because partitions cannot be reserved. Partitons 
concurrently are evicted and moved to other nodes while PME.

I put cache lock on topology before 
grid.cache(DEFAULT_CACHE_NAME).loadCache(null), and unlocked it after loading. 
Test passes.

So, we should lock topology before cache loading, or retry loading after 
topology is settled down.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4210) CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose data.

2018-04-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447745#comment-16447745
 ] 

ASF GitHub Bot commented on IGNITE-4210:


GitHub user voipp opened a pull request:

https://github.com/apache/ignite/pull/3898

IGNITE-4210 cache store loading test fixed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/voipp/ignite IGNITE-4210

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3898


commit 6819c3474206652fb5e55c98f58ee45b6c42b8a2
Author: voipp 
Date:   2018-04-23T08:27:05Z

IGNITE-4210 cache store loading test fixed




> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> 
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Vinogradov
>Assignee: Alexey Kuznetsov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)