Hello Igniters!

Currently we have suspension/resuming implemented for optimistic
transactions [1].
Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
supported [4].

I’m working on a ticket "Suspension/resuming for pessimistic transactions"
[2].
Goal of the ticket is to support transaction suspend/resume for pessimistic
transactions.

# Benefits of suspending/resuming transaction.

  1. Full JTA standart support.
  2. Increase of throughput in high-load scenarios.
      Suspend operation would allow to release Ignite threads and
optionally perform some other logic.

Note, current API has got suspend/resume only for optimistic transactions,
which confuses users.

# Real life example.

Consider the following scenario:

  1. Application starts Ignite transaction.
  2. Business logic is executed inside transaction.
  3. For commit/rollback application need approval message from external
agent.
  4. Currently, thread inside Ignite is idle until approval is received.
  4a. When suspend/resume support is implemented, application can perform
suspend and release thread inside Ignite.

# How pessimistic transaction works.

When we perform put/get operations in pessimistic transactions, lock
request is sent to remote nodes by `GridNearLockRequest`.
Request contains thread id `IgniteTxAdapter#threadId`, in which operation
was performed.
In pessimistic mode, multiple transaction objects are created - on
primary, on backup nodes, and on originating node:
`GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`, `GridDhtTxRemote`.

Thread id is used in logic on these nodes.
For instance, to check whether thread has successfully locked the key,
after lock acquisition attempt.
Or to check whether active transaction exists.

# Main challenge for implementation.

I've analysed implementation approaches and see the core issue:

The essential problem with suspending/resuming lies in thread id field
transferred to remote nodes during put/get operations.

Imagine, we want to suspend transaction and resume it in another thread.
See code snippet below:

```
tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);

cache.put(1, 1);

tx.suspend();
....

// In another thread.
tx.resume(); // Thread id will be changed in transaction instance.
```

Original thread id is transferred and saved on remote node.
After resuming thread id on local node differs from remote node.
I want to avoid one more network round trip to change thread Id on remote
node after transaction resuming.

# Design proposal.

Transaction id (`xid`) can be used instead of thread id on remote nodes.
The following solution is possible for the ticket :

Replace thread id by transaction id for sending to remote nodes.
Thread id will be removed from the following classes:
`IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
`GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
`GridDistributedTxPrepareRequest`.

I haven't removed thread id completely from code. Thread id is moved to
`GridNearTxLocal`.
We still need it in near local transaction for many reasons, for example to
assure only thread started transaction can suspend it in
`GridNearTxLocal#suspend()`.
In future we can remove thread id completely. I propose to study this
question in another ticket.

Also, thread id is remained in `GridDistributedLockRequest`.
Lock request used by cache locks and it need to transfer thread id to
remote nodes.
For example to use cache locks along with cache operations put/get, see
`GridNearTxLocal#updateExplicitVersion`.
As for pessimistic transaction, thread id in `GridDistributedLockRequest`
is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.

Note, that if user suspends transaction and forgets to resume it,
transaction would be rolled back once timeout has occurred.

In my design when transaction is suspended, all locked keys remain locked.

Please see my prototype of proposal implementation [3].

Proposed changes are relatively small.
They ensure consistency of information about locks, if thread Id will be
changed within one transaction (by suspend/resume).
There will be used correct id for locks on remote nodes. It also requires
painstaking work, but changes will not affect the logic of oher components.

Tell me please what do you think? Any suggestions and comments will be
helpful.

If you agree with my design I also will do benchmarking.

[1] https://issues.apache.org/jira/browse/IGNITE-5712
[2] https://issues.apache.org/jira/browse/IGNITE-5714
[3] https://github.com/apache/ignite/pull/2789
[4] Section 3.2.3
http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
-- 

*Best Regards,*

*Kuznetsov Aleksey*

Reply via email to