Re: Cache write synchronization mode

2023-07-26 Thread Raymond Wilson
I kicked off a separate thread for this as requested.

On Tue, Jul 25, 2023 at 9:43 PM Pavel Tupitsyn  wrote:

> > If the primary node 'comes back' after the primary node failure would
> you expect the new value to propagate to all nodes?
>
> I don't think so, but I'm not 100% sure - could you ask about this
> specific case in a separate thread?
>
> On Tue, Jul 25, 2023 at 8:50 AM Raymond Wilson 
> wrote:
>
>> >>  However, if a primary node fails before at least 1 backup
>> node receives an update, then the update will be lost, and all nodes will
>> have the old value.
>>
>> Does this imply that it is a good idea to have the FullSync write
>> synchronization mode? If the primary node 'comes back' after the primary
>> node failure would you expect the new value to propagate to all nodes?
>>
>>
>> On Tue, Jul 25, 2023 at 5:22 PM Pavel Tupitsyn 
>> wrote:
>>
>>> > if a hard failure occurs to one of the backup servers in the
>>> replicated cache will the server that failed have an inconsistent (old)
>>> copy of that element in the replicated cache when it restarts
>>>
>>> If only a backup server fails and restarts, it will get new data from
>>> the primary node, no issue here.
>>> However, if a primary node fails before at least 1 backup node receives
>>> an update, then the update will be lost, and all nodes will have the old
>>> value.
>>>
>>> Related: CacheConfiguration.ReadFromBackup property is true by default,
>>> meaning that with PrimarySync it is possible to get old value from a backup
>>> node after an update, before backups receive new data.
>>>
>>> On Mon, Jul 24, 2023 at 11:51 PM Raymond Wilson <
>>> raymond_wil...@trimble.com> wrote:
>>>
 Hi Pavel,

 I understand the differences between the sync modes in terms of when
 the write returns. What I want to understand is if there are consistency
 risks with the PrimarySync versus FullSync modes.

 For example, if I have 4 nodes participating in the replicated cache
 (and am using the default PrimarySync mode), then the write will return
 once the primary node in the replicated cache has completed the write. At
 that point if a hard failure occurs to one of the backup servers in the
 replicated cache will the server that failed have an inconsistent (old)
 copy of that element in the replicated cache when it restarts?

 Raymond.


>>
>> --
>> 
>> Raymond Wilson
>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> raymond_wil...@trimble.com
>>
>>
>> 
>>
>

-- 

Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com




Cache write synchonization with replicated caches

2023-07-26 Thread Raymond Wilson
Hi,

I have a query regarding data safety of replicated caches in the case of
hard failure of the compute resource but where the storage resource is
available when the node returns.

We are using Ignite 2.15 with the C# client.

We have a number of these caches that have four nodes participating in the
replicated caches, all with the default PrimarySync write synchronization
mode. All data storage configurations are configured with WalMode =
WalMode.Fsync.

We have logic performing writes against these caches which will continue
once the primary node for the replicated cache has written the data item.

I am unsure of the guarantees made by Ignite at this point in the event of
failure. Specifically, hard/red-button failure of compute hardware
resources and/or abrupt (but recoverable) detachment of storage resources.

Scenario one: Primary node returns "OK", then immediately fails (before
check point). When the primary node returns should I expect the replicated
value to be in the primary, and to appear in all other nodes too.

Scenario two: Primary node returns "OK", then a secondary node immediately
fails (before achieving the write and so before any check point). When the
secondary node returns should I expect the replicated value to be in the
recovered secondary node?

In relation to these scenarios, does setting the cache write
synchronization mode improve the safety of the write as all nodes must
acknowledge the write before it returns.

If there is an improvement in write safety in this instance, does this
imply the Fsync WalMode write pathway has opportunities for data loss in
these failure situations?

Thanks,
Raymond.




-- 

Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com