Anton,

You referenced to failover scenarios. I believe that everything is
described in IEP. But to make this discussion self-sufficient could
you please outline what prevents primary partition relocation when
Read Repair is in progress? Is there a transaction or an exlicit lock?

пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <ivan.glu...@gmail.com>:
>
> Anton,
>
> > Step-by-step:
> > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> > 2) backups locked on prepare
> > 3) primary unlocked on finish
> > 4) backups unlocked on finish (after the primary)
> > correct?
> Yes, this corresponds to my understanding of transactions protocol. With
> minor exception: steps 3 and 4 are inverted in case of one-phase commit.
>
> > Agree, but seems there is no need to acquire the lock, we have just to wait
> > until entry becomes unlocked.
> > - entry locked means that previous tx's "finish" phase is in progress
> > - entry unlocked means reading value is up-to-date (previous "finish" phase
> > finished)
> > correct?
> Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
> queue is not empty (first item in queue is actually the transaction that
> owns lock).
>
> > we have just to wait
> > until entry becomes unlocked.
> This may work.
> If consistency checking code has acquired lock on primary, backup can be
> in two states:
> - not locked - and new locks won't appear as we are holding lock on primary
> - still locked by transaction that owned lock on primary just before our
> checking code - in such case checking code should just wait for lock release
>
> Best Regards,
> Ivan Rakov
>
> On 15.07.2019 9:34, Anton Vinogradov wrote:
> > Ivan R.
> >
> > Thanks for joining!
> >
> > Got an idea, but not sure that got a way of a fix.
> >
> > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
> > acquired on backups during the "prepare" phase and released at "finish"
> > phase after primary fully committed.
> > Step-by-step:
> > 1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
> > 2) backups locked on prepare
> > 3) primary unlocked on finish
> > 4) backups unlocked on finish (after the primary)
> > correct?
> >
> > So, acquiring locks on backups, not at the "prepare" phase, may cause
> > unexpected behavior in case of primary fail or other errors.
> > That's definitely possible to update failover to solve this issue, but it
> > seems to be an overcomplicated way.
> > The main question there, it there any simple way?
> >
> >>> checking read from backup will just wait for commit if it's in progress.
> > Agree, but seems there is no need to acquire the lock, we have just to wait
> > until entry becomes unlocked.
> > - entry locked means that previous tx's "finish" phase is in progress
> > - entry unlocked means reading value is up-to-date (previous "finish" phase
> > finished)
> > correct?
> >
> > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vololo...@gmail.com> wrote:
> >
> >> Anton,
> >>
> >> I did not know mechanics locking entries on backups during prepare
> >> phase. Thank you for pointing that out!
> >>
> >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glu...@gmail.com>:
> >>> Hi Anton,
> >>>
> >>>> Each get method now checks the consistency.
> >>>> Check means:
> >>>> 1) tx lock acquired on primary
> >>>> 2) gained data from each owner (primary and backups)
> >>>> 3) data compared
> >>> Did you consider acquiring locks on backups as well during your check,
> >>> just like 2PC prepare does?
> >>> If there's HB between steps 1 (lock primary) and 2 (update primary +
> >>> lock backup + update backup), you may be sure that there will be no
> >>> false-positive results and no deadlocks as well. Protocol won't be
> >>> complicated: checking read from backup will just wait for commit if it's
> >>> in progress.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>> On 12.07.2019 9:47, Anton Vinogradov wrote:
> >>>> Igniters,
> >>>>
> >>>> Let me explain problem in detail.
> >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync,
> >> 2pc)
> >>>> able to see consistency violation because backups are not updated yet.
> >>>> This seems to be not a good idea to "fix" code to unlock primary only
> >> when
> >>>> backups updated, this definitely will cause a performance drop.
> >>>> Currently, there is no explicit sync feature allows waiting for backups
> >>>> updated during the previous tx.
> >>>> Previous tx just sends GridNearTxFinishResponse to the originating
> >> node.
> >>>> Bad ideas how to handle this:
> >>>> - retry some times (still possible to gain false positive)
> >>>> - lock tx entry on backups (will definitely break failover logic)
> >>>> - wait for same entry version on backups during some timeout (will
> >> require
> >>>> huge changes at "get" logic and false positive still possible)
> >>>>
> >>>> Is there any simple fix for this issue?
> >>>> Thanks for tips in advance.
> >>>>
> >>>> Ivan,
> >>>> thanks for your interest
> >>>>
> >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary
> >> and
> >>>> backup.
> >>>> AFAIK, reordering not possible since backups "prepared" before primary
> >>>> releases lock.
> >>>> So, consistency guaranteed by failover and by "prepare" feature of 2PC.
> >>>> Seems, the problem is NOT with consistency at AI, but with consistency
> >>>> detection implementation (RR) and possible "false positive" results.
> >>>> BTW, checked 1PC case (only one data node at test) and gained no
> >> issues.
> >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com>
> >> wrote:
> >>>>> Anton,
> >>>>>
> >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does not it
> >>>>> mean that the things can be even worse and an inconsistent write is
> >>>>> possible on a backup? E.g. in scenario:
> >>>>> 1. txA writes a value 1 for the key on primary.
> >>>>> 2. txA unlocks the key on primary.
> >>>>> 3. txA freezes before updating backup.
> >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary and
> >>>>> backup.
> >>>>> 5. txB wakes up and writes 1 for the key.
> >>>>> 6. As result there is 2 on primary and 1 on backup.
> >>>>>
> >>>>> Naively it seems that locks should be released after all replicas are
> >>>>> updated.
> >>>>>
> >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>:
> >>>>>> Folks,
> >>>>>>
> >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair usage
> >> at
> >>>>>> testAccountTxNodeRestart.
> >>>>>> Updated [2] the test to check is there any repairs happen.
> >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair".
> >>>>>>
> >>>>>> Each get method now checks the consistency.
> >>>>>> Check means:
> >>>>>> 1) tx lock acquired on primary
> >>>>>> 2) gained data from each owner (primary and backups)
> >>>>>> 3) data compared
> >>>>>>
> >>>>>> Sometime, backup may have obsolete value during such check.
> >>>>>>
> >>>>>> Seems, this happen because tx commit on primary going in the
> >> following
> >>>>> way
> >>>>>> (check code [2] for details):
> >>>>>> 1) performing localFinish (releases tx lock)
> >>>>>> 2) performing dhtFinish (commits on backups)
> >>>>>> 3) transferring control back to the caller
> >>>>>>
> >>>>>> So, seems, the problem here is that "tx lock released on primary"
> >> does
> >>>>> not
> >>>>>> mean that backups updated, but "commit() method finished at caller's
> >>>>>> thread" does.
> >>>>>> This means that, currently, there is no happens-before between
> >>>>>> 1) thread 1 committed data on primary and tx lock can be reobtained
> >>>>>> 2) thread 2 reads from backup
> >>>>>> but still strong HB between "commit() finished" and "backup updated"
> >>>>>>
> >>>>>> So, it seems to be possible, for example, to gain notification by a
> >>>>>> continuous query, then read from backup and gain obsolete value.
> >>>>>>
> >>>>>> Is this "partial happens before" behavior expected?
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973
> >>>>>> [2] https://github.com/apache/ignite/pull/6679/files
> >>>>>> [3]
> >>>>>>
> >> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan Pavlukhin
> >>>>>
> >>
> >>
> >> --
> >> Best regards,
> >> Ivan Pavlukhin
> >>



-- 
Best regards,
Ivan Pavlukhin

Reply via email to