[jira] [Updated] (IGNITE-4424) REPLICATED cache isn't synced across nodes

Andrew Mashenkov (JIRA) Wed, 14 Dec 2016 02:10:45 -0800

     [ 
https://issues.apache.org/jira/browse/IGNITE-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Mashenkov updated IGNITE-4424:
-------------------------------------
    Description: 
Replicated cache sometimes won't sync across nodes properly.
PFA a reproducer code.

All nodes are started at the same time on different machines:
* Ignition.start() // Blocks until node is up
* Only one of the nodes performs next: getOrCreateCache() then putAll() 
* All the other nodes block on this before proceeding. 
* All of the nodes perform next:
** getOrCreateCache() // Again
** cache.localSize(CachePeekMode.ALL)
All nodes should see filled cache, but sometimes some nodes see empty cache. 
LocalSize call can be replaced by iterating over cache, but result will be same.

Much more rarely, cluster degradation is possible and one part of cluster see 
empty cache while another see filled cache. Logs contain no errors at all. It 
takes about two hours running test in infinite loop to catch this rare error.

  was:
Replicated cache sometimes won't sync across nodes properly.
PFA a reproducer code.

All nodes are started at the same time on different machines:
* Ignition.start() // Blocks until node is up
* Only one of the nodes performs next: getOrCreateCache() then putAll() 
* All the other nodes block on this before proceeding. 
* All of the nodes perform next:
** getOrCreateCache() // Again
** cache.localSize(CachePeekMode.ALL)
All nodes should see filled cache, but sometimes some nodes see empty cache. 
LocalSize call can be replaced by iterating over cache, but result will be same.

Logs says that more than one cluster is started unexpectedly, but there is no 
errors at all.

I'd run test in infinite loop and it failed in two hours. It looks like there 
is a race.  Possibly, it is harder to reproduce on single machine due to small 
latency.


> REPLICATED cache isn't synced across nodes
> ------------------------------------------
>
>                 Key: IGNITE-4424
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4424
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 1.8
>            Reporter: Andrew Mashenkov
>            Priority: Blocker
>             Fix For: 2.0
>
>         Attachments: ReplicatedCacheFails.java
>
>
> Replicated cache sometimes won't sync across nodes properly.
> PFA a reproducer code.
> All nodes are started at the same time on different machines:
> * Ignition.start() // Blocks until node is up
> * Only one of the nodes performs next: getOrCreateCache() then putAll() 
> * All the other nodes block on this before proceeding. 
> * All of the nodes perform next:
> ** getOrCreateCache() // Again
> ** cache.localSize(CachePeekMode.ALL)
> All nodes should see filled cache, but sometimes some nodes see empty cache. 
> LocalSize call can be replaced by iterating over cache, but result will be 
> same.
> Much more rarely, cluster degradation is possible and one part of cluster see 
> empty cache while another see filled cache. Logs contain no errors at all. It 
> takes about two hours running test in infinite loop to catch this rare error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (IGNITE-4424) REPLICATED cache isn't synced across nodes

Reply via email to