Moving open containers across pipelines may lead to other complications.
Open containers are marked by BCSID which is specific to latest ratis log
transaction index on the container replica on a given pipeline. If the open
containers are moved to a new pipeline, the container may not accept any
updates or potentially can get corrupted otherwise because of BCSID
mismatch.

Thanks
Shashi

On Fri, Mar 13, 2020 at 1:27 PM timmycheng(程力) <[email protected]>
wrote:

> Hey Stephen,
>
> Thanks for this great write-up.
>
> Regarding Problem 3 for pipeline to be long lived, do we consider other
> ways rather than destroy the pipeline. AFAIK, destroying pipelines is
> expensive. Another option is to basically seal/pause the open containers
> and transfer them to a new pipeline. During the transition, containers are
> read-only on old pipeline and after containers are transfer to new pipeline
> (meaning new pipeline is created and fully registers itself on SCM DB),
> containers are write-able on new pipelines. We probably need to have
> ref-count for containers to know how many reads are still in flight for
> race condition purpose.
> This could save some cost that destroy pipeline may bring.
>
> -Li
>
> On 2020/3/12, 9:19 PM, "Stephen O'Donnell" <[email protected]>
> wrote:
>
>     We had a discussion yesterday with some of the team related to network
>     topology and we came up with the following list of proposals which
> probably
>     need to be implemented to cover some edge cases and make the feature
> more
>     supportable. I am sharing them here to gather any further ideas,
> problems
>     and feedback before we attempt to fix these issues.
>
>
>     Problem 1:
>
>     As of now, there is no tool to tell us if any containers are not
> replicated
>     on 2 racks.
>
>     Solution:
>
>     A feature should be added to Recon to check the replication and
> highlight
>     containers which are not on two racks.
>
>
>     Problem 2:
>
>     If closed containers somehow end up on only 1 rack, there is no
> facility to
>     correct that.
>
>     Solution:
>
>     Replication Manager should be extended to check for both under
> replicated
>     and mis-replicated containers and it should work to correct them. It
> was
>     also suggested that if a container has only 2 replicas on 1 rack, the
>     cluster is rack aware, and no node is available from another rack,
>     replication manager should not schedule a 3rd copy on the same rack. It
>     should instead wait for a node on another rack to become available.
>
>     Problem 3:
>
>     If pipelines get created which are not rack tolerant, then they will be
>     long lived and will create containers which are not rack tolerant for a
>     long time. This can happen if nodes from another rack are not available
>     when pipelines are being created, or 1 rack of a 2 rack cluster is
> stopped.
>
>     Solution:
>
>     The existing pipeline scrubber should be extended to check for
> pipelines
>     which are not rack tolerant and also check if there are nodes available
>     from at least two racks. If so, it will destroy non-rack tolerant
> pipelines
>     in a controlled fashion.
>
>     For a badly configured cluster, eg rack_1 has 10 nodes, rack_2 has 1
> node,
>     we should never create non-rack tolerant pipelines even though it will
>     reduce the cluster throughput. That is, the fall back option when
> creating
>     pipelines should only be used when there is only 1 rack available.
>
>
>     Problem 4:
>
>     With the existing design, pipelines start to be created as soon as 3
> nodes
>     have registered with SCM. If 3 nodes from the same rack register
> first, the
>     system does not know the cluster is rack aware as yet (the current
> logic
>     checks the number of racks which have checked in) and so it will
> create a
>     non-rack tolerant pipeline. The solution to problem 3 can take care of
>     this, but it seems it would be better to try to prevent these bad
> pipelines
>     getting created to begin with. Additionally, with multi-raft, it would
> be
>     better to have most nodes registered before creating pipelines to
> spread
>     them out across the cluster more evenly.
>
>     Solution:
>
>     SCM already has a Safemode check. It is the ideal place to add a check
> like
>     this and we decided it would make sense to have some safe mode rules
> which
>     must pass before pipelines can start to be created. Several ideas were
>     discussed:
>
>     1. Wait for a static number of nodes to register. This is simple, but a
>     static configuration that must be changed as the cluster grows is not
>     ideal. This check already exists for exiting safemode, but it would
> need to
>     be changed slightly to block pipeline creation too.
>
>     2. Wait for the node count to stabilize. In this way, the safemode rule
>     would check the node count has not changed during some interval of
> time,
>     implying all nodes have registered. A negative is slowing down the
> startup
>     time, but due to (3) below this would not be a problem on an
> established
>     cluster.
>
>     3. Wait for some percentage of the total expected containers to be
>     reported, which would imply most of the expected nodes have registered.
>     This check is already present to exit safe mode, so we would need it to
>     block pipeline creation too. The one negative is that it may not work
> well
>     for clusters with a small number of nodes or few containers (ie new
>     clusters). It would also be possible for all containers to be reported
> with
>     only one third of the nodes registered in an extreme case.
>
>     4. Wait for at least 2 racks to be registered if the cluster is
> configured
>     as rack tolerant. This does help with ensuring the pipelines are spread
>     across all the nodes.
>
>     This area needs some more exploration to figure out which of these
> ideas is
>     best.
>
>
>     Problem 5:
>
>     The closed container replication policy is different from the pipeline
>     policy and it is possible to configure Replication Manager to use an
>     incompatible policy.
>
>     Solution:
>
>     It may not be possible or desirable to merge the closed container
> placement
>     policy with the pipeline policy, but we need to think about unifying
> the
>     configuration so it is not possible to set incompatible options.
>
>     Thanks,
>
>     Stephen.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to