[jira] [Updated] (IGNITE-24904) Design the way to distinguish the absence of tx state due to the transaction from the case of loss the transaction state due to data loss in commit partition

Denis Chudov (Jira) Fri, 20 Jun 2025 07:26:37 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-24904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Chudov updated IGNITE-24904:
----------------------------------
    Description: 
See IGNITE-24817 for the scenario.

The fundamental problem is that, for now, there is no way to distinguish the 
reason of absence of tx state: it may be never existing or it may be lost due 
to data loss.

The idea that we may start from: each time a replication group restores 
majority, it writes the current time in the storage. Write intents contain 
their creation time. If we see during the WI resolution that the time of 
majority restoration is greater than write intent creation time, then highly 
likely the tx state is lost.

Also, we can try to recover the latest known state from other cluster nodes, 
but it may have been vacuumized there.

 

After the design is ready, we can move further with, for example:
 * lazy marking the partitions with unresolvable write intents as DEGRADED (or 
another new status) and returning them back to HEALTHY after the commit 
partition is recovered again;
 * providing CLI tool for listing and probably manual resolving of such write 
intents (may be blocked by IGNITE-25665 which introduces the way to get all 
write intents without full storage scan)

  was:
See IGNITE-24817 for the scenario.

The fundamental problem is that, for now, there is no way to distinguish the 
reason of absence of tx state: it may be never existing or it may be lost due 
to data loss.

The idea that we may start from: each time a replication group restores 
majority, it writes the current time in the storage. Write intents contain 
their creation time. If we see during the WI resolution that the time of 
majority restoration is greater than write intent creation time, then highly 
likely the tx state is lost.

Also, we can try to recover the latest known state from other cluster nodes, 
but it may have been vacuumized there.


> Design the way to distinguish the absence of tx state due to the transaction 
> from the case of loss the transaction state due to data loss in commit 
> partition
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-24904
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24904
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> See IGNITE-24817 for the scenario.
> The fundamental problem is that, for now, there is no way to distinguish the 
> reason of absence of tx state: it may be never existing or it may be lost due 
> to data loss.
> The idea that we may start from: each time a replication group restores 
> majority, it writes the current time in the storage. Write intents contain 
> their creation time. If we see during the WI resolution that the time of 
> majority restoration is greater than write intent creation time, then highly 
> likely the tx state is lost.
> Also, we can try to recover the latest known state from other cluster nodes, 
> but it may have been vacuumized there.
>  
> After the design is ready, we can move further with, for example:
>  * lazy marking the partitions with unresolvable write intents as DEGRADED 
> (or another new status) and returning them back to HEALTHY after the commit 
> partition is recovered again;
>  * providing CLI tool for listing and probably manual resolving of such write 
> intents (may be blocked by IGNITE-25665 which introduces the way to get all 
> write intents without full storage scan)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-24904) Design the way to distinguish the absence of tx state due to the transaction from the case of loss the transaction state due to data loss in commit partition

Reply via email to