The real purpose behind the re-architecting was to fix that issue by not having TM request the CRConfig at all, so there'd be nothing to sync. We could also go the other way entirely and eliminate the monitoring.json config so that only Snapshots contain all the information needed to monitor and route a CDN. Then if we really wanted to strip out mids so that TR doesn't pull more information than is required we could just add a TM API endpoint that serves up the full routing state information. That could also help keep the health information in sync with the Snapshots by combining them into a single payload - although that alone still isn't enough to ensure a sync between the two, I really think that's just inevitable since TR can always request information from TM before e.g. it finishes polling a cache that had a threshold change in the latest Snapshot that would put it offline.
On Wed, Feb 3, 2021 at 10:59 AM Rawlin Peters <[email protected]> wrote: > Yeah, I thought I could assuage some of the worries about making this > change, but I've begun to think that the value of this change isn't > necessarily worth the effort. We basically just wanted to remove MID > caches from the CRConfig because TR doesn't need that information, but > I'm not sure that alone warrants re-architecting the CRConfig flow. > > But there still is the problem of TM requesting CRConfig and > monitoring.json out of sync, which we should probably fix, even though > I'm not sure it's caused us any significant issues yet. > > - Rawlin > > On Tue, Feb 2, 2021 at 10:35 AM Eric Friedrich > <[email protected]> wrote: > > > > I know there was a bunch of discussion in the PR, but I think some of the > > points raised around inconsistent state and race conditions have not been > > satisfactorily addressed. > > > > I understand that there is the opportunity for similar types of > > inconsistencies today, but all our improvement should work towards > > increasing the reliability of the system. As written today, I think the > > reliability could potentially stay the same, but would likely be > decreased > > as a result of this change > > > > On Wed, Jan 20, 2021 at 1:47 PM ocket 8888 <[email protected]> wrote: > > > > > Is anyone still poring over this? We've got one approval so far and I'd > > > love to come to a decision about whether or not to adopt this > blueprint. > > > > > > On Fri, Dec 11, 2020 at 11:14 AM ocket 8888 <[email protected]> > wrote: > > > > > > > Hello everyone, I've written up a blueprint ( > > > > https://github.com/apache/trafficcontrol/pull/5367) for reworking > CDN > > > > Snapshots and Monitoring payloads, and I wanted it to be on > everyone's > > > > radar. > > > > > > > > The gist of it is that TM using two data sources is causing some > > > > hard-to-pin-down race conditions, so it shouldn't do that. So the > plan is > > > > to separate out exactly what TM needs in the Monitoring payloads, and > > > then > > > > Traffic Router can fetch CDN Snapshots directly from Traffic Ops > instead > > > of > > > > using TM as a sort of proxy. > > > > > > > >
