[jira] [Updated] (IGNITE-19692) Design Resilient Distributed Operations mechanism

Roman Puchkovskiy (Jira) Fri, 09 Jun 2023 00:10:07 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-19692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Puchkovskiy updated IGNITE-19692:
---------------------------------------
    Description: 
We need a mechanism that would allow to do the following:
 # Execute an operation on all (or some of) partitions of a table
 # The whole operation is split into sub-operations (each of which operate on a 
single partition)
 # Each sub-operation must be resilient: that is, if the node that hosts it 
restarts or the partition moves to another node, the operation should proceed
 # When a sub-operation ends, it notifies the operation tracker/coordinator
 # When all sub-operations end, the tracker might take some action (like 
starting a subsequent operation)
 # The tracker is also resilient

We need such a mechanism in a few places in the system:
 # Transaction cleanup?
 # Index build
 # Table data validation as a part of a schema change that requires a 
validation (like a narrowing type change)

Probably, more applications of the mechanism will emerge.

 

On the possible implementation: the tracker could be collocated with table's 
primary replica (that would guarantee that at most one tracker exists at all 
times). We could store the data needed to track the operation in the 
Meta-Storage under a prefix corresponding to the table, like 
'ops.<tableId>.<opType>.<opKey>'. We could store the completion status for each 
of the partitions there along with some operation-wide status.

  was:
We need a mechanism that would allow to do the following:
 # Execute an operation on all (or some of) partitions of a table
 # The whole operation is split into sub-operations (each of which operate on a 
single partition)
 # Each sub-operation must be resilient: that is, if the node that hosts it 
restarts or the partition moves to another node, the operation should proceed
 # When a sub-operation ends, it notifies the operation tracker/coordinator
 # When all sub-operations end, the tracker might take some action (like 
starting a subsequent operation)
 # The tracker is also resilient

We need such a mechanism in a few places in the system:
 # Transaction cleanup?
 # Index build
 # Table data validation as a part of a schema change that requires a 
validation (like a narrowing type change)

Probably, more applications of the mechanism will emerge.


> Design Resilient Distributed Operations mechanism
> -------------------------------------------------
>
>                 Key: IGNITE-19692
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19692
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> We need a mechanism that would allow to do the following:
>  # Execute an operation on all (or some of) partitions of a table
>  # The whole operation is split into sub-operations (each of which operate on 
> a single partition)
>  # Each sub-operation must be resilient: that is, if the node that hosts it 
> restarts or the partition moves to another node, the operation should proceed
>  # When a sub-operation ends, it notifies the operation tracker/coordinator
>  # When all sub-operations end, the tracker might take some action (like 
> starting a subsequent operation)
>  # The tracker is also resilient
> We need such a mechanism in a few places in the system:
>  # Transaction cleanup?
>  # Index build
>  # Table data validation as a part of a schema change that requires a 
> validation (like a narrowing type change)
> Probably, more applications of the mechanism will emerge.
>  
> On the possible implementation: the tracker could be collocated with table's 
> primary replica (that would guarantee that at most one tracker exists at all 
> times). We could store the data needed to track the operation in the 
> Meta-Storage under a prefix corresponding to the table, like 
> 'ops.<tableId>.<opType>.<opKey>'. We could store the completion status for 
> each of the partitions there along with some operation-wide status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-19692) Design Resilient Distributed Operations mechanism

Reply via email to