Perhaps, it would be a good idea to think about the recovery tool/ control-utility command that will allow achieving the same goal. If I am not mistaken it was already proposed in the email thread.
Thanks, S. ср, 10 июл. 2019 г. в 15:33, Вячеслав Коптилин <slava.kopti...@gmail.com>: > Hi Anton, > > Well, the ReadRepair feature is finally merged and that is good news :) > > Unfortunately, I cannot find a consensus about the whole functionality in > any of these topics: > - > http://apache-ignite-developers.2346864.n4.nabble.com/Consistency-check-and-fix-review-request-td41629.html > - > http://apache-ignite-developers.2346864.n4.nabble.com/Read-Repair-ex-Consistency-Check-review-request-2-td42421.html > Also, there are no comments/discussion in JIRA. That makes me sad :( > especially when a feature is huge, not obvious and involves changing public > API (and that is the case, I think). > > Anyway, I left some comments in your pull-request at github. Please take a > look. The most annoying thing is poorly documented code :( > By the way, is it required to add test related to fail-over scenarios? > > I just checked, the IEP page and I still cannot find Jira tickets that > should cover existing limitations/improvements. > I would suggest creating the following tasks, at least: > - MVCC support > - Near caches > - Additional metrics (number of violations, number of repaired entries > etc) > - Ignite C++ (It looks like, .Net is already aware of that feature) > - Thin clients support > - Perhaps, it would be useful to support different strategies to resolve > inconsistencies > > What do you think? > > Thanks, > S. > > > ср, 10 июл. 2019 г. в 10:16, Anton Vinogradov <a...@apache.org>: > >> Folks, >> >> Thanks to everyone for tips and reviews. >> Yardstick checked, no performance drop found. >> Additional measurement: RR get() is just up to 7% slower than regular get >> on real benchmarks (8 clients, 4 servers, 3 backups) >> Code merged to the master. >> "Must have" tasks created and attached to the IEP. >> >> On Thu, Jul 4, 2019 at 12:18 PM Anton Vinogradov <a...@apache.org> wrote: >> >> > Folks, >> > >> > Just a minor update. >> > >> > RunAll [1] with enabled ReadRepair proxy is almost green now (~10 tests >> > left, started with 6k :)). >> > During the analisys, I've found some tests with >> > - unexpected repairs at tx caches >> > - inconsistent state after the test finished (different entries across >> the >> > topology) >> > For example, >> > - testInvokeAllAppliedOnceOnBinaryTypeRegistration generates obsolete >> > versions on backups in case of retry, fixed [2] >> > - initial cache load generates not equal versions on backups, fixed [3] >> > - testAccountTxNodeRestart causes unexpected repairs (entries have >> > different versions), to be investigated. >> > >> > What's next? >> > >> > 1) Going to merge the solution once "RunAll with ReadRepair enabled" >> > becomes fully green. >> > 2) Going to add special check after each test which will ensure caches >> > content after the test is consistent. >> > 2.1) The Same check can (should?) be injected to >> > awaitPartitionMapExchange() and similar methods. >> > 3) Update Jepsen tests with RR checks. >> > 4) Introduce per partition RR. >> > >> > So, the final goal is to be sure that Ignite produces only consistent >> data >> > and to have a feature to solve consistency in case we gain inconsistent >> > state somehow. >> > >> > Limitations? >> > >> > Currently, RR has some limitations, but they are not related to real >> > production cases. >> > In case someone interested to support, for example, MVCC or near caches, >> > please, feel free to contribute. >> > >> > [1] >> > >> https://mtcga.gridgain.com/pr.html?serverId=apache&suiteId=IgniteTests24Java8_RunAll&branchForTc=pull/6575/head&action=Latest >> > [2] >> > >> https://github.com/apache/ignite/pull/5656/commits/6f6ec4434095e692af209c61833a350f3013408c >> > [3] >> > >> https://github.com/apache/ignite/pull/5656/commits/255e552b474839e470c66a77e74e3c807bc76f13 >> > >> > On Fri, Jun 28, 2019 at 2:41 PM Anton Vinogradov <a...@apache.org> wrote: >> > >> >> Slava, >> >> >> >> >> I will take a look at your pull request if you don't mind. >> >> Great news! >> >> >> >> >> In any way, could you please update the IEP page with the list of >> >> >> constraints/limitations of the proposed approach, TODOs, etc? >> >> Not sure we should keep this at IEP until list (#4 from original >> letter) >> >> is not confirmed. >> >> >> >> Currently, I'm checking RunAll with RR enabled to almost each get >> request. >> >> "Almost" means: readRepair = !ctx.readThrough() && >> >> ctx.config().getBackups() > 0 && !ctx.isNear() && !ctx.mvccEnabled() >> >> For now I have 60 failed tests and amount decreasing. >> >> >> >> >> For instance, I would like to see all these limitations on the IEP >> >> page as >> >> >> JIRA tickets. Perhaps, it would be good to create an epic/umbrella >> >> ticket >> >> >> in order to track all activities related to `Read Repair` feature. >> >> Let's do this at merge day to be sure useless issues will not be >> created. >> >> >> >> On Fri, Jun 28, 2019 at 2:01 PM Вячеслав Коптилин < >> >> slava.kopti...@gmail.com> wrote: >> >> >> >>> Hi Anton, >> >>> >> >>> I will take a look at your pull request if you don't mind. >> >>> >> >>> In any way, could you please update the IEP page with the list of >> >>> constraints/limitations of the proposed approach, TODOs, etc? >> >>> For instance, I would like to see all these limitations on the IEP >> page >> >>> as >> >>> JIRA tickets. Perhaps, it would be good to create an epic/umbrella >> ticket >> >>> in order to track all activities related to `Read Repair` feature. >> >>> >> >>> Thanks, >> >>> S. >> >>> >> >>> чт, 20 июн. 2019 г. в 14:15, Anton Vinogradov <a...@apache.org>: >> >>> >> >>> > Igniters, >> >>> > I'm glad to introduce Read Repair feature [0] provides additional >> >>> > consistency guarantee for Ignite. >> >>> > >> >>> > 1) Why we need it? >> >>> > The detailed explanation can be found at IEP-31 [1]. >> >>> > In short, because of bugs, it's possible to gain an inconsistent >> state. >> >>> > We need additional features to handle this case. >> >>> > >> >>> > Currently we able to check cluster using Idle_verify [2] feature, >> but >> >>> it >> >>> > will not fix the data, will not even tell which entries are broken. >> >>> > Read Repair is a feature to understand which entries are broken and >> to >> >>> fix >> >>> > them. >> >>> > >> >>> > 1) How it works? >> >>> > IgniteCache now able to provide special proxy [3] withReadRepair(). >> >>> > This proxy guarantee that data will be gained from all owners and >> >>> compared. >> >>> > In the case of consistency violation situation, data will be >> recovered >> >>> and >> >>> > a special event recorded. >> >>> > >> >>> > 3) Naming? >> >>> > Feature name based on Cassandra's Read Repair feature [4], which is >> >>> pretty >> >>> > similar. >> >>> > >> >>> > 4) Limitations which can be fixed in the future? >> >>> > * MVCC and Near caches are not supported. >> >>> > * Atomic caches can be checked (false positive case is possible on >> >>> this >> >>> > check), but can't be recovered. >> >>> > * Partial entry removal can't be recovered. >> >>> > * Entries streamed using data streamer (using not a "cache.put" >> based >> >>> > updater) and loaded by cache.load >> >>> > are perceived as inconsistent since they may have different >> versions >> >>> for >> >>> > same keys. >> >>> > * Only explicit get operations are supported (getAndReplace, >> >>> getAndPut, >> >>> > etc can be supported in future). >> >>> > >> >>> > 5) What's left? >> >>> > * SQL/ThinClient/etc support. >> >>> > * Metrics (found/repaired). >> >>> > * Simple per-partition recovery feature able to work in the >> >>> background in >> >>> > addition to per-entry recovery feature. >> >>> > >> >>> > 6) Is code checked? >> >>> > * Pull Request #5656 [5] (feature) - has green TC. >> >>> > * Pull Request #6575 [6] (RunAll with the feature enabled for >> every >> >>> get() >> >>> > request) - has a limited amount of failures (because of data >> streamer, >> >>> > cache.load, etc). >> >>> > >> >>> > Thoughts? >> >>> > >> >>> > [0] https://issues.apache.org/jira/browse/IGNITE-10663 >> >>> > [1] >> >>> > >> >>> > >> >>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix >> >>> > [2] >> >>> > >> >>> > >> >>> >> https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums >> >>> > [3] >> >>> > >> >>> > >> >>> >> https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140 >> >>> > [4] >> >>> > >> >>> > >> >>> >> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html >> >>> > [5] https://github.com/apache/ignite/pull/5656 >> >>> > [6] https://github.com/apache/ignite/pull/6575 >> >>> > >> >>> >> >> >> >