PR for this issue is ready to be reviewed.
Changes are relatively small.
1. No network messages have been changed.
2. Thread id usage replaced with xid usage when we work with candidates.
3. All current tests are passed [3]
Please, review. [1], [2]
[1] https://issues.apache.org/jira/browse/IGNITE-5714
[2] https://github.com/apache/ignite/pull/2789
[3]
https://ci.ignite.apache.org/viewLog.html?buildId=1229177&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll
28.02.2018 19:10, Aleksey Kuznetsov пишет:
> What if originating(or primary, or couple of primaries) node(s)
fails when tx is suspended?
Thanks for advice, I've added a couple of failover tests. With the
following scenarios :
Started transaction, perform some put, suspend it and primary node
fails. Works as expected : tx rollbacks.
In another scenario, the originating node fails. Works as expected :
tx rollbacks.
> Do we have tests for this feature?
yeah,we do.
ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <nizhi...@apache.org
<mailto:nizhi...@apache.org>>:
Hello, Aleksey.
> Tests were added for suspending/resuming optimistic transaction
earlier for parent task.
> I adopted them for pessimistic mode. Also tx rollback tests were
aded.
I think, at least, we need new failover tests.
What if originating(or primary, or couple of primaries) node(s)
fails when tx is suspended?
> Note, that if user suspends transaction and forgets to resume it,
> transaction would be rolled back once timeout has occurred.
Do we have tests for this feature?
В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет:
> Thanx for the answer!
>
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
>
> I haven't removed thread id completely from the code because we
tightly
> coupled to it in some parts.
> For instance, look at the following scenario :
>
> "When we suspend transaction, we must make sure only thread started
> transaction can suspend it. So we compare thread id in
> `GridNearTxLocal#suspend()`."
>
> Note that cache locks also make use of thread id, consider the
following
> scenario:
>
> "We create cache lock and perform put operation in the same
thread, which
> create implicit transaction. Transaction would compare both
therad ids in
> tx and cache lock and can apply the optimisation: tx would take
advantage
> of cache lock, and would not create its own lock on key."
>
> If we really needed to remove it completely from the code,
significant
> changes are required both to transactions and cache locks.
>
> It could take several weeks to completely remove thread id.
>
> > As far as I can see from parent task [1] we need some complex
tests to be
>
> implemented.
> > Are they presented in prototype?
>
> Tests were added for suspending/resuming optimistic transaction
earlier for
> parent task.
> I adopted them for pessimistic mode.
> Also tx rollback tests were aded. They check whether timeout
metrics are
> correctly calculated for suspended transactions. You can find
them in
> `GridCacheTransactionalAbstractMetricsSelfTest`.
>
> Please, feel free to propose additional tests for the ticket.
>
>
> вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov
<nizhi...@apache.org <mailto:nizhi...@apache.org>>:
>
> > Hello, Alexey.
> >
> > Great mail, by the way.
> > I think, it would be great to have this feature in Ignite.
> >
> > > I haven't removed thread id completely from code.
> >
> > Can we remove thread id completely from code?
> > Can you estimate how much effort do we need?
> >
> > As far as I can see from parent task [1] we need some complex
tests to be
> > implemented.
> > Are they presented in prototype?
> >
> > [1]
> >
https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
> >
> > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > > Hello Igniters!
> > >
> > > Currently we have suspension/resuming implemented for optimistic
> > > transactions [1].
> > > Unless suspend/resume isn't supported for pessimistic tx JTA
isn't fully
> > > supported [4].
> > >
> > > I’m working on a ticket "Suspension/resuming for pessimistic
> >
> > transactions"
> > > [2].
> > > Goal of the ticket is to support transaction suspend/resume for
> >
> > pessimistic
> > > transactions.
> > >
> > > # Benefits of suspending/resuming transaction.
> > >
> > > 1. Full JTA standart support.
> > > 2. Increase of throughput in high-load scenarios.
> > > Suspend operation would allow to release Ignite
threads and
> > > optionally perform some other logic.
> > >
> > > Note, current API has got suspend/resume only for optimistic
> >
> > transactions,
> > > which confuses users.
> > >
> > > # Real life example.
> > >
> > > Consider the following scenario:
> > >
> > > 1. Application starts Ignite transaction.
> > > 2. Business logic is executed inside transaction.
> > > 3. For commit/rollback application need approval message
from external
> > > agent.
> > > 4. Currently, thread inside Ignite is idle until approval
is received.
> > > 4a. When suspend/resume support is implemented,
application can perform
> > > suspend and release thread inside Ignite.
> > >
> > > # How pessimistic transaction works.
> > >
> > > When we perform put/get operations in pessimistic
transactions, lock
> > > request is sent to remote nodes by `GridNearLockRequest`.
> > > Request contains thread id `IgniteTxAdapter#threadId`, in
which operation
> > > was performed.
> > > In pessimistic mode, multiple transaction objects are
created - on
> > > primary, on backup nodes, and on originating node:
> > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> >
> > `GridDhtTxRemote`.
> > >
> > > Thread id is used in logic on these nodes.
> > > For instance, to check whether thread has successfully
locked the key,
> > > after lock acquisition attempt.
> > > Or to check whether active transaction exists.
> > >
> > > # Main challenge for implementation.
> > >
> > > I've analysed implementation approaches and see the core issue:
> > >
> > > The essential problem with suspending/resuming lies in
thread id field
> > > transferred to remote nodes during put/get operations.
> > >
> > > Imagine, we want to suspend transaction and resume it in
another thread.
> > > See code snippet below:
> > >
> > > ```
> > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> > >
> > > cache.put(1, 1);
> > >
> > > tx.suspend();
> > > ....
> > >
> > > // In another thread.
> > > tx.resume(); // Thread id will be changed in transaction
instance.
> > > ```
> > >
> > > Original thread id is transferred and saved on remote node.
> > > After resuming thread id on local node differs from remote node.
> > > I want to avoid one more network round trip to change thread
Id on remote
> > > node after transaction resuming.
> > >
> > > # Design proposal.
> > >
> > > Transaction id (`xid`) can be used instead of thread id on
remote nodes.
> > > The following solution is possible for the ticket :
> > >
> > > Replace thread id by transaction id for sending to remote nodes.
> > > Thread id will be removed from the following classes:
> > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > > `GridDistributedTxFinishRequest`,
`GridDistributedTxFinishResponse`,
> > > `GridDistributedTxPrepareRequest`.
> > >
> > > I haven't removed thread id completely from code. Thread id
is moved to
> > > `GridNearTxLocal`.
> > > We still need it in near local transaction for many reasons,
for example
> >
> > to
> > > assure only thread started transaction can suspend it in
> > > `GridNearTxLocal#suspend()`.
> > > In future we can remove thread id completely. I propose to
study this
> > > question in another ticket.
> > >
> > > Also, thread id is remained in `GridDistributedLockRequest`.
> > > Lock request used by cache locks and it need to transfer
thread id to
> > > remote nodes.
> > > For example to use cache locks along with cache operations
put/get, see
> > > `GridNearTxLocal#updateExplicitVersion`.
> > > As for pessimistic transaction, thread id in
`GridDistributedLockRequest`
> > > is set to `UNDEFINED_THREAD_ID`, which means we must not use
it remotely.
> > >
> > > Note, that if user suspends transaction and forgets to
resume it,
> > > transaction would be rolled back once timeout has occurred.
> > >
> > > In my design when transaction is suspended, all locked keys
remain
> >
> > locked.
> > >
> > > Please see my prototype of proposal implementation [3].
> > >
> > > Proposed changes are relatively small.
> > > They ensure consistency of information about locks, if
thread Id will be
> > > changed within one transaction (by suspend/resume).
> > > There will be used correct id for locks on remote nodes. It
also requires
> > > painstaking work, but changes will not affect the logic of oher
> >
> > components.
> > >
> > > Tell me please what do you think? Any suggestions and
comments will be
> > > helpful.
> > >
> > > If you agree with my design I also will do benchmarking.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > > [3] https://github.com/apache/ignite/pull/2789
> > > [4] Section 3.2.3
> > >
> >
> >
http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf
>
>
--
/Best Regards,/
*/Kuznetsov Aleksey/*