> What if originating(or primary, or couple of primaries) node(s) fails when tx is suspended?
Thanks for advice, I've added a couple of failover tests. With the following scenarios : Started transaction, perform some put, suspend it and primary node fails. Works as expected : tx rollbacks. In another scenario, the originating node fails. Works as expected : tx rollbacks. > Do we have tests for this feature? yeah,we do. ср, 28 февр. 2018 г. в 14:45, Nikolay Izhikov <nizhi...@apache.org>: > Hello, Aleksey. > > > Tests were added for suspending/resuming optimistic transaction earlier > for parent task. > > I adopted them for pessimistic mode. Also tx rollback tests were aded. > > I think, at least, we need new failover tests. > > What if originating(or primary, or couple of primaries) node(s) fails when > tx is suspended? > > > Note, that if user suspends transaction and forgets to resume it, > > transaction would be rolled back once timeout has occurred. > > Do we have tests for this feature? > > В Ср, 28/02/2018 в 10:29 +0000, Aleksey Kuznetsov пишет: > > Thanx for the answer! > > > > > Can we remove thread id completely from code? > > > Can you estimate how much effort do we need? > > > > I haven't removed thread id completely from the code because we tightly > > coupled to it in some parts. > > For instance, look at the following scenario : > > > > "When we suspend transaction, we must make sure only thread started > > transaction can suspend it. So we compare thread id in > > `GridNearTxLocal#suspend()`." > > > > Note that cache locks also make use of thread id, consider the following > > scenario: > > > > "We create cache lock and perform put operation in the same thread, which > > create implicit transaction. Transaction would compare both therad ids in > > tx and cache lock and can apply the optimisation: tx would take advantage > > of cache lock, and would not create its own lock on key." > > > > If we really needed to remove it completely from the code, significant > > changes are required both to transactions and cache locks. > > > > It could take several weeks to completely remove thread id. > > > > > As far as I can see from parent task [1] we need some complex tests to > be > > > > implemented. > > > Are they presented in prototype? > > > > Tests were added for suspending/resuming optimistic transaction earlier > for > > parent task. > > I adopted them for pessimistic mode. > > Also tx rollback tests were aded. They check whether timeout metrics are > > correctly calculated for suspended transactions. You can find them in > > `GridCacheTransactionalAbstractMetricsSelfTest`. > > > > Please, feel free to propose additional tests for the ticket. > > > > > > вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <nizhi...@apache.org>: > > > > > Hello, Alexey. > > > > > > Great mail, by the way. > > > I think, it would be great to have this feature in Ignite. > > > > > > > I haven't removed thread id completely from code. > > > > > > Can we remove thread id completely from code? > > > Can you estimate how much effort do we need? > > > > > > As far as I can see from parent task [1] we need some complex tests to > be > > > implemented. > > > Are they presented in prototype? > > > > > > [1] > > > > https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655 > > > > > > В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет: > > > > Hello Igniters! > > > > > > > > Currently we have suspension/resuming implemented for optimistic > > > > transactions [1]. > > > > Unless suspend/resume isn't supported for pessimistic tx JTA isn't > fully > > > > supported [4]. > > > > > > > > I’m working on a ticket "Suspension/resuming for pessimistic > > > > > > transactions" > > > > [2]. > > > > Goal of the ticket is to support transaction suspend/resume for > > > > > > pessimistic > > > > transactions. > > > > > > > > # Benefits of suspending/resuming transaction. > > > > > > > > 1. Full JTA standart support. > > > > 2. Increase of throughput in high-load scenarios. > > > > Suspend operation would allow to release Ignite threads and > > > > optionally perform some other logic. > > > > > > > > Note, current API has got suspend/resume only for optimistic > > > > > > transactions, > > > > which confuses users. > > > > > > > > # Real life example. > > > > > > > > Consider the following scenario: > > > > > > > > 1. Application starts Ignite transaction. > > > > 2. Business logic is executed inside transaction. > > > > 3. For commit/rollback application need approval message from > external > > > > agent. > > > > 4. Currently, thread inside Ignite is idle until approval is > received. > > > > 4a. When suspend/resume support is implemented, application can > perform > > > > suspend and release thread inside Ignite. > > > > > > > > # How pessimistic transaction works. > > > > > > > > When we perform put/get operations in pessimistic transactions, lock > > > > request is sent to remote nodes by `GridNearLockRequest`. > > > > Request contains thread id `IgniteTxAdapter#threadId`, in which > operation > > > > was performed. > > > > In pessimistic mode, multiple transaction objects are created - on > > > > primary, on backup nodes, and on originating node: > > > > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`, > > > > > > `GridDhtTxRemote`. > > > > > > > > Thread id is used in logic on these nodes. > > > > For instance, to check whether thread has successfully locked the > key, > > > > after lock acquisition attempt. > > > > Or to check whether active transaction exists. > > > > > > > > # Main challenge for implementation. > > > > > > > > I've analysed implementation approaches and see the core issue: > > > > > > > > The essential problem with suspending/resuming lies in thread id > field > > > > transferred to remote nodes during put/get operations. > > > > > > > > Imagine, we want to suspend transaction and resume it in another > thread. > > > > See code snippet below: > > > > > > > > ``` > > > > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE); > > > > > > > > cache.put(1, 1); > > > > > > > > tx.suspend(); > > > > .... > > > > > > > > // In another thread. > > > > tx.resume(); // Thread id will be changed in transaction instance. > > > > ``` > > > > > > > > Original thread id is transferred and saved on remote node. > > > > After resuming thread id on local node differs from remote node. > > > > I want to avoid one more network round trip to change thread Id on > remote > > > > node after transaction resuming. > > > > > > > > # Design proposal. > > > > > > > > Transaction id (`xid`) can be used instead of thread id on remote > nodes. > > > > The following solution is possible for the ticket : > > > > > > > > Replace thread id by transaction id for sending to remote nodes. > > > > Thread id will be removed from the following classes: > > > > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`, > > > > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`, > > > > `GridDistributedTxPrepareRequest`. > > > > > > > > I haven't removed thread id completely from code. Thread id is moved > to > > > > `GridNearTxLocal`. > > > > We still need it in near local transaction for many reasons, for > example > > > > > > to > > > > assure only thread started transaction can suspend it in > > > > `GridNearTxLocal#suspend()`. > > > > In future we can remove thread id completely. I propose to study this > > > > question in another ticket. > > > > > > > > Also, thread id is remained in `GridDistributedLockRequest`. > > > > Lock request used by cache locks and it need to transfer thread id to > > > > remote nodes. > > > > For example to use cache locks along with cache operations put/get, > see > > > > `GridNearTxLocal#updateExplicitVersion`. > > > > As for pessimistic transaction, thread id in > `GridDistributedLockRequest` > > > > is set to `UNDEFINED_THREAD_ID`, which means we must not use it > remotely. > > > > > > > > Note, that if user suspends transaction and forgets to resume it, > > > > transaction would be rolled back once timeout has occurred. > > > > > > > > In my design when transaction is suspended, all locked keys remain > > > > > > locked. > > > > > > > > Please see my prototype of proposal implementation [3]. > > > > > > > > Proposed changes are relatively small. > > > > They ensure consistency of information about locks, if thread Id > will be > > > > changed within one transaction (by suspend/resume). > > > > There will be used correct id for locks on remote nodes. It also > requires > > > > painstaking work, but changes will not affect the logic of oher > > > > > > components. > > > > > > > > Tell me please what do you think? Any suggestions and comments will > be > > > > helpful. > > > > > > > > If you agree with my design I also will do benchmarking. > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-5712 > > > > [2] https://issues.apache.org/jira/browse/IGNITE-5714 > > > > [3] https://github.com/apache/ignite/pull/2789 > > > > [4] Section 3.2.3 > > > > > > > > > > > http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf > > > > -- *Best Regards,* *Kuznetsov Aleksey*