Re: Cache write synchonization with replicated caches

Raymond Wilson Mon, 21 Aug 2023 17:13:18 -0700

Thanks for the pointer to the read repair facility added in Ignite 2.14.

Unfortunately the .WithReadRepair() extension does not seem to be present
in the Ignite C# client.


This means we either need to use the experimental Command.sh support, or
improve our tooling to effectively do the same. I am curious why this is
labelled as experimental? Does this imply risk if run against a production
environment grid?

Raymond.


On Mon, Aug 21, 2023 at 5:50 PM Николай Ижиков <nizhi...@apache.org> wrote:

> Hello.
>
> I don’t know the cause of your issue.
> But, we have feature to overcome it [1]
>
> Consistency repair can be run from control.sh.
>
> ```
> ./bin/control.sh --enable-experimental
> ...
>   [EXPERIMENTAL]
>   Check/Repair cache consistency using Read Repair approach:
>     control.(sh|bat) --consistency repair cache-name partition
>
>     Parameters:
>       cache-name  - Cache to be checked/repaired.
>       partition   - Cache's partition to be checked/repaired.
>
>   [EXPERIMENTAL]
>   Cache consistency check/repair operations status:
>     control.(sh|bat) --consistency status
>
>   [EXPERIMENTAL]
>   Finalize partitions update counters:
>     control.(sh|bat) --consistency finalize
> ```
>
> It seems that docs for a cmd command not full.
> It also accepts strategy argument so you can manage your repair actions
> more accurate.
> Try to run:
>
> ```
> ❯ ./bin/control.sh --enable-experimental --consistency repair --cache
> default --strategy CHECK_ONLY --partitions 1,2,3,…your_partitions_list...
> ```
>
> Available strategies with good description can be found in sources [2]
>
>
> [1] https://ignite.apache.org/docs/latest/key-value-api/read-repair
> [2]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java
>
>
>
> 21 авг. 2023 г., в 07:46, Raymond Wilson <raymond_wil...@trimble.com>
> написал(а):
>
> [Replying onto correct thread]
>
> As a follow up to this email, we are starting to collect evidence that
> replicated caches within our Ignite grid are failing to replicate values in
> a small number of cases.
>
> In the cases we observe so far, with a cluster of 4 nodes participating in
> a replicated cache, only one node reports having the correct value for a
> key, and the other three report having no value for that key.
>
> The documentation is pretty opinionated about the
> CacheWriteSynchronizationMode not being impactful with respect to
> consistency for replicated caches. As noted below, we use PrimarySync (the
> default) for these caches, which would suggest a potential failure mode
> preventing the backup copies obtaining their copy once the primary copy has
> been written.
>
> We are continuing to investigate and would be interested in any
> suggestions you may have as to the likely cause.
>
> Thanks,
> Raymond.
>
> On Thu, Jul 27, 2023 at 12:38 PM Raymond Wilson <
> raymond_wil...@trimble.com> wrote:
>
>> Hi,
>>
>> I have a query regarding data safety of replicated caches in the case of
>> hard failure of the compute resource but where the storage resource is
>> available when the node returns.
>>
>> We are using Ignite 2.15 with the C# client.
>>
>> We have a number of these caches that have four nodes participating in
>> the replicated caches, all with the default PrimarySync write
>> synchronization mode. All data storage configurations are configured with
>> WalMode = WalMode.Fsync.
>>
>> We have logic performing writes against these caches which will continue
>> once the primary node for the replicated cache has written the data item.
>>
>> I am unsure of the guarantees made by Ignite at this point in the event
>> of failure. Specifically, hard/red-button failure of compute hardware
>> resources and/or abrupt (but recoverable) detachment of storage resources.
>>
>> Scenario one: Primary node returns "OK", then immediately fails (before
>> check point). When the primary node returns should I expect the replicated
>> value to be in the primary, and to appear in all other nodes too.
>>
>> Scenario two: Primary node returns "OK", then a secondary node
>> immediately fails (before achieving the write and so before any check
>> point). When the secondary node returns should I expect the replicated
>> value to be in the recovered secondary node?
>>
>> In relation to these scenarios, does setting the cache write
>> synchronization mode improve the safety of the write as all nodes must
>> acknowledge the write before it returns.
>>
>> If there is an improvement in write safety in this instance, does this
>> imply the Fsync WalMode write pathway has opportunities for data loss in
>> these failure situations?
>>
>> Thanks,
>> Raymond.
>>
>>
>>
>>
>> --
>> <http://www.trimble.com/>
>> Raymond Wilson
>> Trimble Distinguished Engineer, Civil Construction Software (CCS)
>> 11 Birmingham Drive | Christchurch, New Zealand
>> raymond_wil...@trimble.com
>>
>>
>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>

-- 
<http://www.trimble.com/>
Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Re: Cache write synchonization with replicated caches

Reply via email to