[ 
https://issues.apache.org/jira/browse/IGNITE-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-18640:
-------------------------------------
    Description: 
h3. Motivation

As a prerequisite, it's worth to mention that placement drive itself should be 
reliable and have corresponding fail-over logic, meaning that placement driver 
service should be distributed in a way that if one of its nodes fails another 
one picks up the flag. On the other hand, despite the fact, that it's valid to 
have more than one PD active actors (the one that will check topology, send 
leaseGrant msg, etc) it's better to have one only in order to reduce the amount 
of unnecessary calculations, messaging duplication and so on. So, to sum up:
 * PD may work on top of meta storage, using it as a consensus provider.
 * There may be more than one active PD actors, that try to evaluate primary 
replica along with corresponding lease, send leaseGrant msg, etc, meaning that 
actions should be idempotent or that we should have an ability to skip 
stale/concurrent triggers.
 * It worth to have at least best-effort single actor selection logic.

h3. Definition of Done
 * Almost always (because of best-effort nature, it's not always) there's only 
one PD active actor if there's a majority in ms group.
 * If for some reason active actor fails, another one will picks up the flag as 
fast as possible.
 * It's still valid to have multiple active actors at the same time. If you 
guys have any ideas of how to implement not more than one actor, please share 
them.

h3. Implementation Notes

Assuming that we have a distributed onLeaderElected(Peer leader, long term) 
callback we may implement following logic on PlacementDriverManager#start()
 * register ms.onLeaderElected()    
{code:java}
ms.onLeaderElected((leader, term) -> {
        if (term > lastSeenTerm) {
            if (leader.equlas(localNode)) {
                // Become an active actor.
            } else {
                // Discard activeness. 
            }
        } else {
            // No-op, just a stale update.
        }
    });{code}

 * refreshLeader and to exact the same logic as the one mentioned above in 
order to become and active actor if there already was a leader during listener 
registration.

  was:
h3. Motivation

As a prerequisite, it's worth to mention that placement drive itself should be 
reliable and have corresponding fail-over logic, meaning that placement driver 
service should be distributed in a way that if one of its nodes fails another 
one picks up the flag. On the other hand, despite the fact, that it's valid to 
have more than one PD active actors (the one that will check topology, send 
leaseGrant msg, etc) it's better to have one only in order to reduce the amount 
of unnecessary calculations, messaging duplication and so on. So, to sum up:
 * PD may work on top of meta storage, using it as a consensus provider.
 * There may be more than one active PD actors, that try to evaluate primary 
replica along with corresponding lease, send leaseGrant msg, etc, meaning that 
actions should be idempotent or that we should have an ability to skip 
stale/concurrent triggers.
 * It worth to have at least best-effort single actor selection logic.

h3. Definition of Done
 * Almost always (because of best-effort nature, it's not always) there's only 
one PD active actor if there's a majority in ms group.
 * If for some reason active actor fails, another one will picks up the flag as 
fast as possible.
 * It's still valid to have multiple active actors at the same time. If you 
guys have any ideas of how to implement not more than one actor, please share 
them.

h3. Implementation Notes

Assuming that we have a distributed onLeaderElected(Peer leader, long term) 
[callback|https://issues.apache.org/jira/browse/IGNITE-18639] we may implement 
following logic on PlacementDriverManager#start()
 # register ms.onLeaderElected()    
{code:java}
ms.onLeaderElected((leader, term) -> {
        if (term > lastSeenTerm) {
            if (leader.equlas(localNode)) {
                // Become an active actor.
            } else {
                // Discard activeness. 
            }
        } else {
            // No-op, just a stale update.
        }
    });{code}

 # refreshLeader and to exact the same logic as the one mentioned above in 
order to become and active actor if there already was a leader during listener 
registration.


> Implement placement driver best-effort single actor selector and fail-over
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-18640
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18640
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> As a prerequisite, it's worth to mention that placement drive itself should 
> be reliable and have corresponding fail-over logic, meaning that placement 
> driver service should be distributed in a way that if one of its nodes fails 
> another one picks up the flag. On the other hand, despite the fact, that it's 
> valid to have more than one PD active actors (the one that will check 
> topology, send leaseGrant msg, etc) it's better to have one only in order to 
> reduce the amount of unnecessary calculations, messaging duplication and so 
> on. So, to sum up:
>  * PD may work on top of meta storage, using it as a consensus provider.
>  * There may be more than one active PD actors, that try to evaluate primary 
> replica along with corresponding lease, send leaseGrant msg, etc, meaning 
> that actions should be idempotent or that we should have an ability to skip 
> stale/concurrent triggers.
>  * It worth to have at least best-effort single actor selection logic.
> h3. Definition of Done
>  * Almost always (because of best-effort nature, it's not always) there's 
> only one PD active actor if there's a majority in ms group.
>  * If for some reason active actor fails, another one will picks up the flag 
> as fast as possible.
>  * It's still valid to have multiple active actors at the same time. If you 
> guys have any ideas of how to implement not more than one actor, please share 
> them.
> h3. Implementation Notes
> Assuming that we have a distributed onLeaderElected(Peer leader, long term) 
> callback we may implement following logic on PlacementDriverManager#start()
>  * register ms.onLeaderElected()    
> {code:java}
> ms.onLeaderElected((leader, term) -> {
>         if (term > lastSeenTerm) {
>             if (leader.equlas(localNode)) {
>                 // Become an active actor.
>             } else {
>                 // Discard activeness. 
>             }
>         } else {
>             // No-op, just a stale update.
>         }
>     });{code}
>  * refreshLeader and to exact the same logic as the one mentioned above in 
> order to become and active actor if there already was a leader during 
> listener registration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to