Re: "Idle verify" to "Online verify"

Ivan Rakov Mon, 29 Apr 2019 06:59:30 -0700

Hi Anton,

Thanks for sharing your ideas.

I think your approach should work in general. I'll just share myconcerns about possible issues that may come up.

1) Equality of update counters doesn't imply equality of partitionscontent under load.For every update, primary node generates update counter and then updateis delivered to backup node and gets applied with the correspondingupdate counter. For example, there are two transactions (A and B) thatupdate partition X by the following scenario:

- A updates key1 in partition X on primary node and increments counter to 10
- B updates key2 in partition X on primary node and increments counter to 11
- While A is still updating another keys, B is finally committed
- Update of key2 arrives to backup node and sets update counter to 11

Observer will see equal update counters (11), but update of key 1 isstill missing in the backup partition.This is a fundamental problem which is being solved here:https://issues.apache.org/jira/browse/IGNITE-10078"Online verify" should operate with new complex update counters whichtake such "update holes" into account. Otherwise, online verify mayprovide false-positive inconsistency reports.

2) Acquisition and comparison of update counters is fast, but partitionhash calculation is long. We should check that update counter remainsunchanged after every K keys handled.

3)

Another hope is that we'll be able to pause/continue scan, forexample, we'll check 1/3 partitions today, 1/3 tomorrow, and in threedays we'll check the whole cluster.

Totally makes sense.

We may find ourselves into a situation where some "hot" partitions arestill unprocessed, and every next attempt to calculate partition hashfails due to another concurrent update. We should be able to trackprogress of validation (% of calculation time wasted due to concurrentoperations may be a good metric, 100% is the worst case) and provideoption to stop/pause activity.I think, pause should return an "intermediate results report" withinformation about which partitions have been successfully checked. Withsuch report, we can resume activity later: partitions from report willbe just skipped.

4)

Since "Idle verify" uses regular pagmem, I assume it replaces hot datawith persisted.
So, we have to warm up the cluster after each check.
Are there any chances to check without cooling the cluster?

I don't see an easy way to achieve it with our page memory architecture.We definitely can't just read pages from disk directly: we need tosynchronize page access with concurrent update operations and checkpoints.From my point of view, the correct way to solve this issue is improvingour page replacement [1] mechanics by making it truly scan-resistant.

P. S. There's another possible way of achieving online verify: insteadof on-demand hash calculation, we can always keep up-to-date hash valuefor every partition. We'll need to update hash on everyinsert/update/remove operation, but there will be no reordering issuesas per function that we use for aggregating hash results (+) iscommutative. With having pre-calculated partition hash value, we canautomatically detect inconsistent partitions on every PME. What do youthink?

[1] -https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Pagereplacement(rotationwithdisk)


Best Regards,
Ivan Rakov

On 29.04.2019 12:20, Anton Vinogradov wrote:

Igniters and especially Ivan Rakov,
"Idle verify" [1] is a really cool tool, to make sure that cluster isconsistent.
1) But it required to have operations paused during cluster check.
At some clusters, this check requires hours (3-4 hours at cases I saw).
I've checked the code of "idle verify" and it seems it possible tomake it "online" with some assumptions.
Idea:
Currently "Idle verify" checks that partitions hashes, generated this way
while (it.hasNextX()) {
CacheDataRow row = it.nextX();
partHash += row.key().hashCode();
partHash +=Arrays.hashCode(row.value().valueBytes(grpCtx.cacheObjectContext()));
}
, are the same.
What if we'll generate same pairs updateCounter-partitionHash but willcompare hashes only in case counters are the same?So, for example, will ask cluster to generate pairs for 64 partitions,then will find that 55 have the same counters (was not updated duringcheck) and check them.The rest (64-55 = 9) partitions will be re-requested and recheckedwith an additional 55.This way we'll be able to check cluster is consistent even in сaseoperations are in progress (just retrying modified).
Risks and assumptions:
Using this strategy we'll check the cluster's consistency ...eventually, and the check will take more time even on an idle cluster.In case operationsPerTimeToGeneratePartitionHashes > partitionsCountwe'll definitely gain no progress.
But, in case of the load is not high, we'll be able to check all cluster.
Another hope is that we'll be able to pause/continue scan, forexample, we'll check 1/3 partitions today, 1/3 tomorrow, and in threedays we'll check the whole cluster.
Have I missed something?
2) Since "Idle verify" uses regular pagmem, I assume it replaces hotdata with persisted.
So, we have to warm up the cluster after each check.
Are there any chances to check without cooling the cluster?
[1]https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums

Re: "Idle verify" to "Online verify"

Reply via email to