[jira] [Assigned] (IGNITE-25665) Persist pending entries list in "aipersist" engine

Roman Puchkovskiy (Jira) Wed, 12 Nov 2025 03:00:09 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-25665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Puchkovskiy reassigned IGNITE-25665:
------------------------------------------

    Assignee: Roman Puchkovskiy  (was: Kirill Tkalenko)

> Persist pending entries list in "aipersist" engine
> --------------------------------------------------
>
>                 Key: IGNITE-25665
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25665
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> We need to persistently track pending rows to ensure they are preserved after 
> a cluster restart. Otherwise, we risk losing them and inadvertently marking 
> transaction statuses as aborted (as described in the root issue). This could 
> lead to resolving write intents as aborted, resulting in permanent client 
> data loss.
> h3. Definition of done
> Pending rows are persisted and fully recovered upon cluster restart.
> h3. Design
> The idea is to have a persistent double-linked list, constructed on a subset 
> of row versions that represent write intents.
> Currently, each version chain represents the following structure:
> {code:java}
> Chain 1 = [timestamp, row] -> ... -> []
> Chain 2 = [timestamp, row] -> ... -> []{code}
> What we want to do is to connect all the chains that have write intents as 
> their heads (i.e. {{{}timestamp == 0L{}}}), and enrich them with an 
> information that would allow restoring information about pending transactions:
> {code:java}
> ...
>           ^                     | 
>           |                     v
> Chain 1 = [rowId, timestamp, row] -> ... -> []
>           ^                     | 
>           |                     v
> Chain 2 = [rowId, timestamp, row] -> ... -> []
>           ^                     | 
>           |                     v
> ...{code}
> This means enriching {{RowVersion}} class with:
>  * {{RowId}} (16 bytes).
>  * Link to the previous list node, "nullable", 6 bytes.
>  * Link to the next list node, "nullable", 6 bytes.
> 28 bytes in total. That's a lot already. Commit replication group ID and 
> transaction ID will be stored in a tree as metadata, because it would be 
> other 22 bytes of constantly duplicated data.
> Since version chains don't have transaction ID, we will get it from version 
> chain tree when starting the replica.
> {{// TODO it is possible to introduce a *getAll* operation on the B+Tree, 
> which should make this reading faster.}}
> New partition storage API will be required to read this list.
> Obviously, the change must be backwards-compatible.
> We should probably disable it for {{{}aimem{}}}, because it's just a memory 
> overhead in that case, it doesn't provide anything useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IGNITE-25665) Persist pending entries list in "aipersist" engine

Reply via email to