Mikhail Petrov created IGNITE-25538:
---------------------------------------
Summary: ROLLED_BACK transactions are not removed from active
transactions list
Key: IGNITE-25538
URL: https://issues.apache.org/jira/browse/IGNITE-25538
Project: Ignite
Issue Type: Bug
Reporter: Mikhail Petrov
User can observe the following output of `control.sh tx` command:
{code:java}
Matching transactions:
TcpDiscoveryNode [id=34fd49ed-c325-4a93-a32c-3726c1c19130,
addrs=[10.19.138.119], order=3, ver=16.1.3#20241226-sha1:900bfa69,
isClient=false, consistentId=epk_rb_si_pplad-pprbrbepk0071.ca.sbrf.ru]
Tx: [xid=0a2e8e50791-00000000-156e-2f01-0000-000000000013,
label=UcpSearchServiceDecorator.searchByClientId, state=ROLLED_BACK,
startTime=2025-05-26 23:53:58.515, duration=224437 sec,
isolation=READ_COMMITTED, concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec,
size=0, dhtNodes=[], nearXid=0a2e8e50791-00000000-156e-2f01-0000-000000000013,
parentNodeIds=[86cc9e5e]]
Tx: [xid=087e3040791-00000000-156e-2f01-0000-000000000030,
label=bs-ucp-4g-update-service, state=ROLLED_BACK, startTime=2025-05-25
23:45:45.961, duration=311329 sec, isolation=READ_COMMITTED,
concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec, size=0, dhtNodes=[],
nearXid=087e3040791-00000000-156e-2f01-0000-000000000030,
parentNodeIds=[60400a24]]
Tx: [xid=0e60d620791-00000000-156e-2f01-0000-000000000035,
label=CloudClientSearchService.byCriteria, state=ROLLED_BACK,
startTime=2025-05-24 23:49:05.016, duration=397530 sec,
isolation=READ_COMMITTED, concurrency=PESSIMISTIC, topVer=N/A, timeout=0 sec,
size=0, dhtNodes=[], nearXid=0e60d620791-00000000-156e-2f01-0000-000000000035,
parentNodeIds=[448e854c]]
TcpDiscoveryNode [id=9f11128e-c5a2-4700-af6b-c4777edfa31b,
addrs=[10.19.138.75], order=54, ver=16.1.3#20241226-sha1:900bfa69,
isClient=false, consistentId=epk_rb_si_pplad-pprbrbepk0025.ca.sbrf.ru]
Command [TX] finished with code: 0
{code}
>From the user perspective the mentioned output can be interpreted as bunch of
>LRTs (long running transaction). Moreover this transactions cannot be `killed`
>through contro.sh --kill command and are present in active transactions list
>until node is rebooted.
Reproducer:
1. Start server node.
2. Start tx through thin client with timeout.
3. Inject sleep in IgniteTxManager#onCreated after isCompleted check with value
greater than tx timeout. It can definitely be a case if the thread that started
the transactions is switched by the scheduler.
4. Wait for tx to complete with timeout error.
The described above "hanging" transactions in ROLLED_BACK state do not hold any
data key locks and does not affect PME in any way.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)