[
https://issues.apache.org/jira/browse/IGNITE-25270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vladislav Pyatkov updated IGNITE-25270:
---------------------------------------
Description:
h3. Motivation
Our transaction protocol needs to know which entries were used in a transaction
in a specific partition to switch their state from a write intent to a regular
entry.
{code:java|title=StorageUpdateHandler}
/** A container for rows that were inserted, updated or removed. */
private final PendingRows pendingRows = new PendingRows();
{code}
The set of the rows is in volatile storage (they disappear after the node
restart). It leads to issues like IGNITE-25079.
h3. Implementation details
The idea is to organize a doubly linked list that would connect all version
chain heads that correspond to write intents. This is effectively a “pending
tree”, but not a tree.
The structure:
* Partition’s meta has a link to the list’s head.
* Each version chain element will be enriched with two links: “prev” and
“next”.
* If it’s not a write intent (commit timestamp is not 0), then these links
should be equal to 0.
* If it’s a write intent, then the version chain element should be a node
inside of a doubly-linked list.
Upon restart:
* Scan the list to get all RowIds of all write intents.
* Find each individual ID in the main partition tree to retrieve information
about their transactions, and construct volatile pending rows tree at the same
time.
Problems:
* A simple scan of the list is not enough, we have to perform a tree lookup
for each RowId that we’ve found. It might be expensive.
* Constructing a linked list on top of tree nodes directly is impossible
because trees always move data between nodes.
Cleanup can be performed concurrently (see scheduleAsyncWriteIntentSwitch),
which means that our doubly linked list could be concurrently modified at any
place. Guaranteeing the correctness of concurrent modifications is not an easy
task that requires very careful consideration. This problem might be a stopper
for this particular solution.
h3. Definition of done
Write a document where a solution would be described.
Jira tasks should be created.
was:
h3. Motivation
Our transaction protocol needs to know which entries were used in a transaction
in a specific partition to switch their state from a write intent to a regular
entry.
{code:title=StorageUpdateHandler}
/** A container for rows that were inserted, updated or removed. */
private final PendingRows pendingRows = new PendingRows();
{code}
The set of the rows is in volatile storage (they disappear after the node
restart). It leads to issues like IGNITE-25079.
h3. Definition of done
Write a document where a solution would be described.
Jira tasks should be created.
> Come up with solution that allow persisting pending entries and does not have
> a performance impact
> --------------------------------------------------------------------------------------------------
>
> Key: IGNITE-25270
> URL: https://issues.apache.org/jira/browse/IGNITE-25270
> Project: Ignite
> Issue Type: Test
> Reporter: Vladislav Pyatkov
> Assignee: Vladislav Pyatkov
> Priority: Major
> Labels: ignite-3
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> h3. Motivation
> Our transaction protocol needs to know which entries were used in a
> transaction in a specific partition to switch their state from a write intent
> to a regular entry.
> {code:java|title=StorageUpdateHandler}
> /** A container for rows that were inserted, updated or removed. */
> private final PendingRows pendingRows = new PendingRows();
> {code}
> The set of the rows is in volatile storage (they disappear after the node
> restart). It leads to issues like IGNITE-25079.
> h3. Implementation details
> The idea is to organize a doubly linked list that would connect all version
> chain heads that correspond to write intents. This is effectively a “pending
> tree”, but not a tree.
> The structure:
> * Partition’s meta has a link to the list’s head.
> * Each version chain element will be enriched with two links: “prev” and
> “next”.
> * If it’s not a write intent (commit timestamp is not 0), then these links
> should be equal to 0.
> * If it’s a write intent, then the version chain element should be a node
> inside of a doubly-linked list.
> Upon restart:
> * Scan the list to get all RowIds of all write intents.
> * Find each individual ID in the main partition tree to retrieve information
> about their transactions, and construct volatile pending rows tree at the
> same time.
> Problems:
> * A simple scan of the list is not enough, we have to perform a tree lookup
> for each RowId that we’ve found. It might be expensive.
> * Constructing a linked list on top of tree nodes directly is impossible
> because trees always move data between nodes.
> Cleanup can be performed concurrently (see scheduleAsyncWriteIntentSwitch),
> which means that our doubly linked list could be concurrently modified at any
> place. Guaranteeing the correctness of concurrent modifications is not an
> easy task that requires very careful consideration. This problem might be a
> stopper for this particular solution.
> h3. Definition of done
> Write a document where a solution would be described.
> Jira tasks should be created.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)