Thanks for the pointer to the read repair facility added in Ignite 2.14. Unfortunately the .WithReadRepair() extension does not seem to be present in the Ignite C# client.
This means we either need to use the experimental Command.sh support, or improve our tooling to effectively do the same. I am curious why this is labelled as experimental? Does this imply risk if run against a production environment grid? Raymond. On Mon, Aug 21, 2023 at 5:50 PM Николай Ижиков <nizhi...@apache.org> wrote: > Hello. > > I don’t know the cause of your issue. > But, we have feature to overcome it [1] > > Consistency repair can be run from control.sh. > > ``` > ./bin/control.sh --enable-experimental > ... > [EXPERIMENTAL] > Check/Repair cache consistency using Read Repair approach: > control.(sh|bat) --consistency repair cache-name partition > > Parameters: > cache-name - Cache to be checked/repaired. > partition - Cache's partition to be checked/repaired. > > [EXPERIMENTAL] > Cache consistency check/repair operations status: > control.(sh|bat) --consistency status > > [EXPERIMENTAL] > Finalize partitions update counters: > control.(sh|bat) --consistency finalize > ``` > > It seems that docs for a cmd command not full. > It also accepts strategy argument so you can manage your repair actions > more accurate. > Try to run: > > ``` > ❯ ./bin/control.sh --enable-experimental --consistency repair --cache > default --strategy CHECK_ONLY --partitions 1,2,3,…your_partitions_list... > ``` > > Available strategies with good description can be found in sources [2] > > > [1] https://ignite.apache.org/docs/latest/key-value-api/read-repair > [2] > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/cache/ReadRepairStrategy.java > > > > 21 авг. 2023 г., в 07:46, Raymond Wilson <raymond_wil...@trimble.com> > написал(а): > > [Replying onto correct thread] > > As a follow up to this email, we are starting to collect evidence that > replicated caches within our Ignite grid are failing to replicate values in > a small number of cases. > > In the cases we observe so far, with a cluster of 4 nodes participating in > a replicated cache, only one node reports having the correct value for a > key, and the other three report having no value for that key. > > The documentation is pretty opinionated about the > CacheWriteSynchronizationMode not being impactful with respect to > consistency for replicated caches. As noted below, we use PrimarySync (the > default) for these caches, which would suggest a potential failure mode > preventing the backup copies obtaining their copy once the primary copy has > been written. > > We are continuing to investigate and would be interested in any > suggestions you may have as to the likely cause. > > Thanks, > Raymond. > > On Thu, Jul 27, 2023 at 12:38 PM Raymond Wilson < > raymond_wil...@trimble.com> wrote: > >> Hi, >> >> I have a query regarding data safety of replicated caches in the case of >> hard failure of the compute resource but where the storage resource is >> available when the node returns. >> >> We are using Ignite 2.15 with the C# client. >> >> We have a number of these caches that have four nodes participating in >> the replicated caches, all with the default PrimarySync write >> synchronization mode. All data storage configurations are configured with >> WalMode = WalMode.Fsync. >> >> We have logic performing writes against these caches which will continue >> once the primary node for the replicated cache has written the data item. >> >> I am unsure of the guarantees made by Ignite at this point in the event >> of failure. Specifically, hard/red-button failure of compute hardware >> resources and/or abrupt (but recoverable) detachment of storage resources. >> >> Scenario one: Primary node returns "OK", then immediately fails (before >> check point). When the primary node returns should I expect the replicated >> value to be in the primary, and to appear in all other nodes too. >> >> Scenario two: Primary node returns "OK", then a secondary node >> immediately fails (before achieving the write and so before any check >> point). When the secondary node returns should I expect the replicated >> value to be in the recovered secondary node? >> >> In relation to these scenarios, does setting the cache write >> synchronization mode improve the safety of the write as all nodes must >> acknowledge the write before it returns. >> >> If there is an improvement in write safety in this instance, does this >> imply the Fsync WalMode write pathway has opportunities for data loss in >> these failure situations? >> >> Thanks, >> Raymond. >> >> >> >> >> -- >> <http://www.trimble.com/> >> Raymond Wilson >> Trimble Distinguished Engineer, Civil Construction Software (CCS) >> 11 Birmingham Drive | Christchurch, New Zealand >> raymond_wil...@trimble.com >> >> >> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >> > > > -- > <http://www.trimble.com/> > Raymond Wilson > Trimble Distinguished Engineer, Civil Construction Software (CCS) > 11 Birmingham Drive | Christchurch, New Zealand > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> > > > -- <http://www.trimble.com/> Raymond Wilson Trimble Distinguished Engineer, Civil Construction Software (CCS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>