Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Nikolay Izhikov Tue, 27 Feb 2018 06:28:37 -0800

Hello, Alexey.

Great mail, by the way.
I think, it would be great to have this feature in Ignite.


> I haven't removed thread id completely from code.

Can we remove thread id completely from code?
Can you estimate how much effort do we need?

As far as I can see from parent task [1] we need some complex tests to be 
implemented.
Are they presented in prototype?

[1] 
https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655

В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> Hello Igniters!
> 
> Currently we have suspension/resuming implemented for optimistic
> transactions [1].
> Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> supported [4].
> 
> I’m working on a ticket "Suspension/resuming for pessimistic transactions"
> [2].
> Goal of the ticket is to support transaction suspend/resume for pessimistic
> transactions.
> 
> # Benefits of suspending/resuming transaction.
> 
>   1. Full JTA standart support.
>   2. Increase of throughput in high-load scenarios.
>       Suspend operation would allow to release Ignite threads and
> optionally perform some other logic.
> 
> Note, current API has got suspend/resume only for optimistic transactions,
> which confuses users.
> 
> # Real life example.
> 
> Consider the following scenario:
> 
>   1. Application starts Ignite transaction.
>   2. Business logic is executed inside transaction.
>   3. For commit/rollback application need approval message from external
> agent.
>   4. Currently, thread inside Ignite is idle until approval is received.
>   4a. When suspend/resume support is implemented, application can perform
> suspend and release thread inside Ignite.
> 
> # How pessimistic transaction works.
> 
> When we perform put/get operations in pessimistic transactions, lock
> request is sent to remote nodes by `GridNearLockRequest`.
> Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> was performed.
> In pessimistic mode, multiple transaction objects are created - on
> primary, on backup nodes, and on originating node:
> `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`, `GridDhtTxRemote`.
> 
> Thread id is used in logic on these nodes.
> For instance, to check whether thread has successfully locked the key,
> after lock acquisition attempt.
> Or to check whether active transaction exists.
> 
> # Main challenge for implementation.
> 
> I've analysed implementation approaches and see the core issue:
> 
> The essential problem with suspending/resuming lies in thread id field
> transferred to remote nodes during put/get operations.
> 
> Imagine, we want to suspend transaction and resume it in another thread.
> See code snippet below:
> 
> ```
> tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> 
> cache.put(1, 1);
> 
> tx.suspend();
> ....
> 
> // In another thread.
> tx.resume(); // Thread id will be changed in transaction instance.
> ```
> 
> Original thread id is transferred and saved on remote node.
> After resuming thread id on local node differs from remote node.
> I want to avoid one more network round trip to change thread Id on remote
> node after transaction resuming.
> 
> # Design proposal.
> 
> Transaction id (`xid`) can be used instead of thread id on remote nodes.
> The following solution is possible for the ticket :
> 
> Replace thread id by transaction id for sending to remote nodes.
> Thread id will be removed from the following classes:
> `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> `GridDistributedTxPrepareRequest`.
> 
> I haven't removed thread id completely from code. Thread id is moved to
> `GridNearTxLocal`.
> We still need it in near local transaction for many reasons, for example to
> assure only thread started transaction can suspend it in
> `GridNearTxLocal#suspend()`.
> In future we can remove thread id completely. I propose to study this
> question in another ticket.
> 
> Also, thread id is remained in `GridDistributedLockRequest`.
> Lock request used by cache locks and it need to transfer thread id to
> remote nodes.
> For example to use cache locks along with cache operations put/get, see
> `GridNearTxLocal#updateExplicitVersion`.
> As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> 
> Note, that if user suspends transaction and forgets to resume it,
> transaction would be rolled back once timeout has occurred.
> 
> In my design when transaction is suspended, all locked keys remain locked.
> 
> Please see my prototype of proposal implementation [3].
> 
> Proposed changes are relatively small.
> They ensure consistency of information about locks, if thread Id will be
> changed within one transaction (by suspend/resume).
> There will be used correct id for locks on remote nodes. It also requires
> painstaking work, but changes will not affect the logic of oher components.
> 
> Tell me please what do you think? Any suggestions and comments will be
> helpful.
> 
> If you agree with my design I also will do benchmarking.
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-5712
> [2] https://issues.apache.org/jira/browse/IGNITE-5714
> [3] https://github.com/apache/ignite/pull/2789
> [4] Section 3.2.3
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf

signature.asc
Description: This is a digitally signed message part

Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx

Reply via email to