Hi Tony, My explanations are inline below with prefix [HC].
From: Tony Li [mailto:tony1ath...@gmail.com] On Behalf Of tony...@tony.li Sent: Monday, April 15, 2019 11:37 AM To: Huaimo Chen <huaimo.c...@huawei.com> Cc: lsr@ietf.org Subject: Re: [Lsr] Min Links for Multiple Failures on Flooding Topology Hi Huaimo, Consider the case that the multiple failures occur at almost the same time and there is no other failure following these multiple failures shortly. In this case, the backup paths will connect the live end nodes of the failures on the FT, and the FT partition is fixed. Note that we assume that the real network topology is not split. So there will be working backup paths that connect the FT partitions. I don’t think you can just assume that the network topology is not damaged in some way. After all, the FT partitioned because of real failures. Your algorithm for computing backup paths relies on old information about the topology of other partitions. That information may be out of date and incorrect. This can lead to the paths not being viable. [HC]: In general, using the backup paths for the failures to repair the FT partition works well to some extend even though the nodes in one partition have some information about the topology that the nodes in another partition do not have. At first, it can survive two or more failures. For example, it can repair the FT partition created by the failures of any two links on the FT. It can also repair the FT partition caused by the failures of any two links on the FT and the failure of any other link if the algorithm for computing backup paths is enhanced to consider using diverse links (i.e., the backup paths are computed in such a way that one backup path does not share any link with the other backup paths if possible). Secondly, it helps the convergence. The backup paths for a failure are created by the nodes close (or say local) to the failure and connect the two partitions. After this local repair of the FT partition, the link states in the two partitions are synchronized, and the topology is converged. Locally repairing the FT partition is faster than the repairing by the area leader. Moreover, if one extra backup path for a failure is computed and enabled for temporary flooding, the FT partition created by one more failure can be repaired. The network topology is damaged. This is considered in the algorithm for computing backup paths. Every node uses its LSDB for computing backup paths if needed. The FT is portioned, but the real network topology is not, and is used. For the information about a failure that is on one partition and not on another partition, if the part damaged by this failure is not used in any other backup paths, then there is no problem; otherwise (i.e., it is used in some backup paths), if the damaged part is used only in one backup path, the backup paths can survive the damaged part if that one backup path does not share any link with the other backup paths. For the case that the multiple failures occur at almost the same time and then other failures happen following the (old) multiple failures shortly, we can consider all these failures (i.e., the old multiple failures and the other failures including destination node failure) as new multiple failures. For these new multiple failures, the backup paths for all these failures will connect the live end nodes of the failures on the FT, and the FT partition is fixed. Again, how can you consider any new failures in the computation of the backup paths? You have no way of getting information since the FT partitioned. [HC]: The information about the new failures will flood through the backup paths for the old failures if the backup paths are created. Refer to the explanation above. It’s true that just adding one link may not heal the partition. This is exactly why we feel we need to iterate. Each iteration increases the size of the partition (or decreases the cut set, depending on how you want to look at it) and because the graph is finite, we are guaranteed to converge. The convergence in this way may take a long time. Agreed. However, the alternatives that we have on the table right now are (a) take several iterations, or (b) turn on many links and risk a cascade failure. Each iteration takes a time Ti. If we need n (n > 1) iterations, then the total time for convergence is n times Ti. For centralized flooding reduction, Ti is the time including three parts: 1) for the link states (i.e., LSPs or LSAs) representing some of the multiple failures to travel to the area leader in the network over the FT that is partitioned, 2) the leader to compute and send a new FT, and 3) the other nodes (i.e., non area leader nodes) receive and build the new FT. We agree that it is linear in the time taken to add new nodes to the topology. However, due to rate limiting, I suspect that the overall time is more constrained by the rate limiting. After a multiple failure, all nodes can evaluate the state of their adjacencies against the current, existing, and now partitioned FT. Information about the partition has been flooded using the current FT, but only within the partitions. Any node with an adjacency outside of its partition can enable temporary flooding to one neighbor and begin rate limiting. While rate limiting is in progress, a new area leader election and FT computation can happen within the partition. This may or may not yield LSDB updates. Similarly, the temporary addition to the flooding topology may or may not yield LSDB updates. Whether the area leader election and FT computation is faster than the rate limiting is unknown, since both are implementation dependent. [HC]: It seems that using the backup paths for the failures to repair the FT partition may make the convergence faster. Refer to the second point in the first explanation above. It may be better to use both the backup paths for the failures and the rate limiting. The former repairs the partitions locally, and the latter helps the former to work better. Thus the convergence will be faster. Best Regards, Huaimo Tony
_______________________________________________ Lsr mailing list Lsr@ietf.org https://www.ietf.org/mailman/listinfo/lsr