Anton, I did not know mechanics locking entries on backups during prepare phase. Thank you for pointing that out!
пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glu...@gmail.com>: > > Hi Anton, > > > Each get method now checks the consistency. > > Check means: > > 1) tx lock acquired on primary > > 2) gained data from each owner (primary and backups) > > 3) data compared > Did you consider acquiring locks on backups as well during your check, > just like 2PC prepare does? > If there's HB between steps 1 (lock primary) and 2 (update primary + > lock backup + update backup), you may be sure that there will be no > false-positive results and no deadlocks as well. Protocol won't be > complicated: checking read from backup will just wait for commit if it's > in progress. > > Best Regards, > Ivan Rakov > > On 12.07.2019 9:47, Anton Vinogradov wrote: > > Igniters, > > > > Let me explain problem in detail. > > Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc) > > able to see consistency violation because backups are not updated yet. > > This seems to be not a good idea to "fix" code to unlock primary only when > > backups updated, this definitely will cause a performance drop. > > Currently, there is no explicit sync feature allows waiting for backups > > updated during the previous tx. > > Previous tx just sends GridNearTxFinishResponse to the originating node. > > > > Bad ideas how to handle this: > > - retry some times (still possible to gain false positive) > > - lock tx entry on backups (will definitely break failover logic) > > - wait for same entry version on backups during some timeout (will require > > huge changes at "get" logic and false positive still possible) > > > > Is there any simple fix for this issue? > > Thanks for tips in advance. > > > > Ivan, > > thanks for your interest > > > >>> 4. Very fast and lucky txB writes a value 2 for the key on primary and > > backup. > > AFAIK, reordering not possible since backups "prepared" before primary > > releases lock. > > So, consistency guaranteed by failover and by "prepare" feature of 2PC. > > Seems, the problem is NOT with consistency at AI, but with consistency > > detection implementation (RR) and possible "false positive" results. > > BTW, checked 1PC case (only one data node at test) and gained no issues. > > > > On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com> wrote: > > > >> Anton, > >> > >> Is such behavior observed for 2PC or for 1PC optimization? Does not it > >> mean that the things can be even worse and an inconsistent write is > >> possible on a backup? E.g. in scenario: > >> 1. txA writes a value 1 for the key on primary. > >> 2. txA unlocks the key on primary. > >> 3. txA freezes before updating backup. > >> 4. Very fast and lucky txB writes a value 2 for the key on primary and > >> backup. > >> 5. txB wakes up and writes 1 for the key. > >> 6. As result there is 2 on primary and 1 on backup. > >> > >> Naively it seems that locks should be released after all replicas are > >> updated. > >> > >> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>: > >>> Folks, > >>> > >>> Investigating now unexpected repairs [1] in case of ReadRepair usage at > >>> testAccountTxNodeRestart. > >>> Updated [2] the test to check is there any repairs happen. > >>> Test's name now is "testAccountTxNodeRestartWithReadRepair". > >>> > >>> Each get method now checks the consistency. > >>> Check means: > >>> 1) tx lock acquired on primary > >>> 2) gained data from each owner (primary and backups) > >>> 3) data compared > >>> > >>> Sometime, backup may have obsolete value during such check. > >>> > >>> Seems, this happen because tx commit on primary going in the following > >> way > >>> (check code [2] for details): > >>> 1) performing localFinish (releases tx lock) > >>> 2) performing dhtFinish (commits on backups) > >>> 3) transferring control back to the caller > >>> > >>> So, seems, the problem here is that "tx lock released on primary" does > >> not > >>> mean that backups updated, but "commit() method finished at caller's > >>> thread" does. > >>> This means that, currently, there is no happens-before between > >>> 1) thread 1 committed data on primary and tx lock can be reobtained > >>> 2) thread 2 reads from backup > >>> but still strong HB between "commit() finished" and "backup updated" > >>> > >>> So, it seems to be possible, for example, to gain notification by a > >>> continuous query, then read from backup and gain obsolete value. > >>> > >>> Is this "partial happens before" behavior expected? > >>> > >>> [1] https://issues.apache.org/jira/browse/IGNITE-11973 > >>> [2] https://github.com/apache/ignite/pull/6679/files > >>> [3] > >>> > >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx > >> > >> > >> > >> -- > >> Best regards, > >> Ivan Pavlukhin > >> -- Best regards, Ivan Pavlukhin