Moving open containers across pipelines may lead to other complications. Open containers are marked by BCSID which is specific to latest ratis log transaction index on the container replica on a given pipeline. If the open containers are moved to a new pipeline, the container may not accept any updates or potentially can get corrupted otherwise because of BCSID mismatch.
Thanks Shashi On Fri, Mar 13, 2020 at 1:27 PM timmycheng(程力) <[email protected]> wrote: > Hey Stephen, > > Thanks for this great write-up. > > Regarding Problem 3 for pipeline to be long lived, do we consider other > ways rather than destroy the pipeline. AFAIK, destroying pipelines is > expensive. Another option is to basically seal/pause the open containers > and transfer them to a new pipeline. During the transition, containers are > read-only on old pipeline and after containers are transfer to new pipeline > (meaning new pipeline is created and fully registers itself on SCM DB), > containers are write-able on new pipelines. We probably need to have > ref-count for containers to know how many reads are still in flight for > race condition purpose. > This could save some cost that destroy pipeline may bring. > > -Li > > On 2020/3/12, 9:19 PM, "Stephen O'Donnell" <[email protected]> > wrote: > > We had a discussion yesterday with some of the team related to network > topology and we came up with the following list of proposals which > probably > need to be implemented to cover some edge cases and make the feature > more > supportable. I am sharing them here to gather any further ideas, > problems > and feedback before we attempt to fix these issues. > > > Problem 1: > > As of now, there is no tool to tell us if any containers are not > replicated > on 2 racks. > > Solution: > > A feature should be added to Recon to check the replication and > highlight > containers which are not on two racks. > > > Problem 2: > > If closed containers somehow end up on only 1 rack, there is no > facility to > correct that. > > Solution: > > Replication Manager should be extended to check for both under > replicated > and mis-replicated containers and it should work to correct them. It > was > also suggested that if a container has only 2 replicas on 1 rack, the > cluster is rack aware, and no node is available from another rack, > replication manager should not schedule a 3rd copy on the same rack. It > should instead wait for a node on another rack to become available. > > Problem 3: > > If pipelines get created which are not rack tolerant, then they will be > long lived and will create containers which are not rack tolerant for a > long time. This can happen if nodes from another rack are not available > when pipelines are being created, or 1 rack of a 2 rack cluster is > stopped. > > Solution: > > The existing pipeline scrubber should be extended to check for > pipelines > which are not rack tolerant and also check if there are nodes available > from at least two racks. If so, it will destroy non-rack tolerant > pipelines > in a controlled fashion. > > For a badly configured cluster, eg rack_1 has 10 nodes, rack_2 has 1 > node, > we should never create non-rack tolerant pipelines even though it will > reduce the cluster throughput. That is, the fall back option when > creating > pipelines should only be used when there is only 1 rack available. > > > Problem 4: > > With the existing design, pipelines start to be created as soon as 3 > nodes > have registered with SCM. If 3 nodes from the same rack register > first, the > system does not know the cluster is rack aware as yet (the current > logic > checks the number of racks which have checked in) and so it will > create a > non-rack tolerant pipeline. The solution to problem 3 can take care of > this, but it seems it would be better to try to prevent these bad > pipelines > getting created to begin with. Additionally, with multi-raft, it would > be > better to have most nodes registered before creating pipelines to > spread > them out across the cluster more evenly. > > Solution: > > SCM already has a Safemode check. It is the ideal place to add a check > like > this and we decided it would make sense to have some safe mode rules > which > must pass before pipelines can start to be created. Several ideas were > discussed: > > 1. Wait for a static number of nodes to register. This is simple, but a > static configuration that must be changed as the cluster grows is not > ideal. This check already exists for exiting safemode, but it would > need to > be changed slightly to block pipeline creation too. > > 2. Wait for the node count to stabilize. In this way, the safemode rule > would check the node count has not changed during some interval of > time, > implying all nodes have registered. A negative is slowing down the > startup > time, but due to (3) below this would not be a problem on an > established > cluster. > > 3. Wait for some percentage of the total expected containers to be > reported, which would imply most of the expected nodes have registered. > This check is already present to exit safe mode, so we would need it to > block pipeline creation too. The one negative is that it may not work > well > for clusters with a small number of nodes or few containers (ie new > clusters). It would also be possible for all containers to be reported > with > only one third of the nodes registered in an extreme case. > > 4. Wait for at least 2 racks to be registered if the cluster is > configured > as rack tolerant. This does help with ensuring the pipelines are spread > across all the nodes. > > This area needs some more exploration to figure out which of these > ideas is > best. > > > Problem 5: > > The closed container replication policy is different from the pipeline > policy and it is possible to configure Replication Manager to use an > incompatible policy. > > Solution: > > It may not be possible or desirable to merge the closed container > placement > policy with the pipeline policy, but we need to think about unifying > the > configuration so it is not possible to set incompatible options. > > Thanks, > > Stephen. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
