Igniters,
I'm glad to introduce Read Repair feature [0] provides additional
consistency guarantee for Ignite.

1) Why we need it?
The detailed explanation can be found at IEP-31 [1].
In short, because of bugs, it's possible to gain an inconsistent state.
We need additional features to handle this case.

Currently we able to check cluster using Idle_verify [2] feature, but it
will not fix the data, will not even tell which entries are broken.
Read Repair is a feature to understand which entries are broken and to fix
them.

1) How it works?
IgniteCache now able to provide special proxy [3] withReadRepair().
This proxy guarantee that data will be gained from all owners and compared.
In the case of consistency violation situation, data will be recovered and
a special event recorded.

3) Naming?
Feature name based on Cassandra's Read Repair feature [4], which is pretty
similar.

4) Limitations which can be fixed in the future?
  * MVCC and Near caches are not supported.
  * Atomic caches can be checked (false positive case is possible on this
check), but can't be recovered.
  * Partial entry removal can't be recovered.
  * Entries streamed using data streamer (using not a "cache.put" based
updater) and loaded by cache.load
  are perceived as inconsistent since they may have different versions for
same keys.
  * Only explicit get operations are supported (getAndReplace, getAndPut,
etc can be supported in future).

5) What's left?
  * SQL/ThinClient/etc support.
  * Metrics (found/repaired).
  * Simple per-partition recovery feature able to work in the background in
addition to per-entry recovery feature.

6) Is code checked?
  * Pull Request #5656 [5] (feature) - has green TC.
  * Pull Request #6575 [6] (RunAll with the feature enabled for every get()
request) - has a limited amount of failures (because of data streamer,
cache.load, etc).

Thoughts?

[0] https://issues.apache.org/jira/browse/IGNITE-10663
[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix
[2]
https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums
[3]
https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140
[4]
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html
[5] https://github.com/apache/ignite/pull/5656
[6] https://github.com/apache/ignite/pull/6575

Reply via email to