Re: Discuss: Network Topology edge cases(Internet mail)

Shashikant Banerjee Mon, 16 Mar 2020 01:41:27 -0700

Thanks  timmycheng,

As long as there are no more updates once and after the transfers on the
open containers start, it should be fine. But, that makes it read only
replica , thereby temporarily disabling the containers effectively means
closing the containers itself.We can definitely choose to close containers
and transfer without destroying the pipeline.


Thanks
Shashi



On Mon, Mar 16, 2020 at 1:56 PM timmycheng(程力) <[email protected]>
wrote:

> Hey Shashi,
>
> The idea was to actual seal or lock down containers while it's being
> transferred. During transfer, BCSID can stop being updated because there
> will be no new container replicas.
> We can choose to accept reads or not based on costs. Overall, the idea is
> to temporarily 'disable' containers and re-enable them once they are in new
> healthy pipelines.
> DestroyPipeline costs more than this.
>
> -Li
>
> On 2020/3/13, 5:27 PM, "Shashikant Banerjee" <[email protected]>
> wrote:
>
>     Moving open containers across pipelines may lead to other
> complications.
>     Open containers are marked by BCSID which is specific to latest ratis
> log
>     transaction index on the container replica on a given pipeline. If the
> open
>     containers are moved to a new pipeline, the container may not accept
> any
>     updates or potentially can get corrupted otherwise because of BCSID
>     mismatch.
>
>     Thanks
>     Shashi
>
>     On Fri, Mar 13, 2020 at 1:27 PM timmycheng(程力) <[email protected]
> >
>     wrote:
>
>     > Hey Stephen,
>     >
>     > Thanks for this great write-up.
>     >
>     > Regarding Problem 3 for pipeline to be long lived, do we consider
> other
>     > ways rather than destroy the pipeline. AFAIK, destroying pipelines is
>     > expensive. Another option is to basically seal/pause the open
> containers
>     > and transfer them to a new pipeline. During the transition,
> containers are
>     > read-only on old pipeline and after containers are transfer to new
> pipeline
>     > (meaning new pipeline is created and fully registers itself on SCM
> DB),
>     > containers are write-able on new pipelines. We probably need to have
>     > ref-count for containers to know how many reads are still in flight
> for
>     > race condition purpose.
>     > This could save some cost that destroy pipeline may bring.
>     >
>     > -Li
>     >
>     > On 2020/3/12, 9:19 PM, "Stephen O'Donnell" <[email protected]
> .INVALID>
>     > wrote:
>     >
>     >     We had a discussion yesterday with some of the team related to
> network
>     >     topology and we came up with the following list of proposals
> which
>     > probably
>     >     need to be implemented to cover some edge cases and make the
> feature
>     > more
>     >     supportable. I am sharing them here to gather any further ideas,
>     > problems
>     >     and feedback before we attempt to fix these issues.
>     >
>     >
>     >     Problem 1:
>     >
>     >     As of now, there is no tool to tell us if any containers are not
>     > replicated
>     >     on 2 racks.
>     >
>     >     Solution:
>     >
>     >     A feature should be added to Recon to check the replication and
>     > highlight
>     >     containers which are not on two racks.
>     >
>     >
>     >     Problem 2:
>     >
>     >     If closed containers somehow end up on only 1 rack, there is no
>     > facility to
>     >     correct that.
>     >
>     >     Solution:
>     >
>     >     Replication Manager should be extended to check for both under
>     > replicated
>     >     and mis-replicated containers and it should work to correct
> them. It
>     > was
>     >     also suggested that if a container has only 2 replicas on 1
> rack, the
>     >     cluster is rack aware, and no node is available from another
> rack,
>     >     replication manager should not schedule a 3rd copy on the same
> rack. It
>     >     should instead wait for a node on another rack to become
> available.
>     >
>     >     Problem 3:
>     >
>     >     If pipelines get created which are not rack tolerant, then they
> will be
>     >     long lived and will create containers which are not rack
> tolerant for a
>     >     long time. This can happen if nodes from another rack are not
> available
>     >     when pipelines are being created, or 1 rack of a 2 rack cluster
> is
>     > stopped.
>     >
>     >     Solution:
>     >
>     >     The existing pipeline scrubber should be extended to check for
>     > pipelines
>     >     which are not rack tolerant and also check if there are nodes
> available
>     >     from at least two racks. If so, it will destroy non-rack tolerant
>     > pipelines
>     >     in a controlled fashion.
>     >
>     >     For a badly configured cluster, eg rack_1 has 10 nodes, rack_2
> has 1
>     > node,
>     >     we should never create non-rack tolerant pipelines even though
> it will
>     >     reduce the cluster throughput. That is, the fall back option when
>     > creating
>     >     pipelines should only be used when there is only 1 rack
> available.
>     >
>     >
>     >     Problem 4:
>     >
>     >     With the existing design, pipelines start to be created as soon
> as 3
>     > nodes
>     >     have registered with SCM. If 3 nodes from the same rack register
>     > first, the
>     >     system does not know the cluster is rack aware as yet (the
> current
>     > logic
>     >     checks the number of racks which have checked in) and so it will
>     > create a
>     >     non-rack tolerant pipeline. The solution to problem 3 can take
> care of
>     >     this, but it seems it would be better to try to prevent these bad
>     > pipelines
>     >     getting created to begin with. Additionally, with multi-raft, it
> would
>     > be
>     >     better to have most nodes registered before creating pipelines to
>     > spread
>     >     them out across the cluster more evenly.
>     >
>     >     Solution:
>     >
>     >     SCM already has a Safemode check. It is the ideal place to add a
> check
>     > like
>     >     this and we decided it would make sense to have some safe mode
> rules
>     > which
>     >     must pass before pipelines can start to be created. Several
> ideas were
>     >     discussed:
>     >
>     >     1. Wait for a static number of nodes to register. This is
> simple, but a
>     >     static configuration that must be changed as the cluster grows
> is not
>     >     ideal. This check already exists for exiting safemode, but it
> would
>     > need to
>     >     be changed slightly to block pipeline creation too.
>     >
>     >     2. Wait for the node count to stabilize. In this way, the
> safemode rule
>     >     would check the node count has not changed during some interval
> of
>     > time,
>     >     implying all nodes have registered. A negative is slowing down
> the
>     > startup
>     >     time, but due to (3) below this would not be a problem on an
>     > established
>     >     cluster.
>     >
>     >     3. Wait for some percentage of the total expected containers to
> be
>     >     reported, which would imply most of the expected nodes have
> registered.
>     >     This check is already present to exit safe mode, so we would
> need it to
>     >     block pipeline creation too. The one negative is that it may not
> work
>     > well
>     >     for clusters with a small number of nodes or few containers (ie
> new
>     >     clusters). It would also be possible for all containers to be
> reported
>     > with
>     >     only one third of the nodes registered in an extreme case.
>     >
>     >     4. Wait for at least 2 racks to be registered if the cluster is
>     > configured
>     >     as rack tolerant. This does help with ensuring the pipelines are
> spread
>     >     across all the nodes.
>     >
>     >     This area needs some more exploration to figure out which of
> these
>     > ideas is
>     >     best.
>     >
>     >
>     >     Problem 5:
>     >
>     >     The closed container replication policy is different from the
> pipeline
>     >     policy and it is possible to configure Replication Manager to
> use an
>     >     incompatible policy.
>     >
>     >     Solution:
>     >
>     >     It may not be possible or desirable to merge the closed container
>     > placement
>     >     policy with the pipeline policy, but we need to think about
> unifying
>     > the
>     >     configuration so it is not possible to set incompatible options.
>     >
>     >     Thanks,
>     >
>     >     Stephen.
>     >
>     >
>     >
>     > ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: [email protected]
>     > For additional commands, e-mail: [email protected]
>     >
>     >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Discuss: Network Topology edge cases(Internet mail)

Reply via email to