Also a +1 on the defining a standard common language that this library will use instead of direct dependencies on TO data. I know it has been bandied about for years that instead of having a direct TOdata->ATS config instead we have TOdata->cache side library->generic CDN language->$LocalCacheSoftwareConfig
Would be nice to keep that dream alive. So now you would have a library to handle TO data conversion to generic data, then separate libraries per cache software to determine config from that data On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]> wrote: > > It sounds like: > (A) everyone is +1 on cache-side config generation > (B) most people are -1 on caches connecting directly to the TO DB > (C) most people are +1 on TO pushing data to ORT instead of the other way > around > (D) most people are -1 on using Kafka for cache configs > > For (A) I'm +1 on the approach (ORT sidecar), but I think if we can > design the ats config gen library in a way that it just takes in a > well-defined set of input data and returns strings (the generated > config files) as output, it shouldn't really matter if the input data > comes from direct DB queries, API calls from ORT to TO, or pushed data > from TO to ORT. Whatever that library ends up looking like, it should > just be a matter of getting that data from some source, and converting > it into the input format expected by the library. The library should > not have any dependency on external data -- only what has been passed > into the library's function. Then we will get a lot of nice benefits > in terms of testability and reusability. > > Testability: > Given a set of input data, we can expect certain output in terms of > the ATS config files, so it would be easy to write unit tests for. > Right now that's a hard thing to do because we have to mock out every > single DB call for those unit tests > > Reusability: > The library could be shared between the TO API and this new > ORT-sidecar thing, and the only difference should be that the TO API > runs a set of DB queries to populate the input data whereas the > ORT-sidecar thing would run a set of TO API calls to populate the > input data. > > I know it might be more difficult to come up with that well-defined > input interface than to just make DB or API calls whenever you need > data in the library, but I think it would be well worth the effort for > those reasons above. > > - Rawlin > > On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]> wrote: > > > > Well, in that spirit: > > > > - Cache-side processing: +1. I suppose given the fact that we wouldn't want > > to rewrite the entire configuration generation logic at once, there's no > > reason to prefer this being part of ORT immediately versus separate. Either > > way, there's no real "extra step". Though I must admit, I am sad to see > > this written in Go and not built into my baby: ORT.py > > > > - Invert LoC for config update: +1 because this essentially lets you do > > server configuration snapshots for free, in addition to the other benefits, > > and that's a pretty requested feature, I think. For servers that are > > unreachable from TO, there's a problem, and after backing off for some set > > number of retries they should probably just be marked offline with reason > > e.g. "unable to contact for updates". This is a bit off-topic, though, imo, > > because it's sort of independent from what Rob's suggesting, and that > > utility isn't even really designed with that in mind - at least just yet. A > > conversation for after it's finished. > > > > - Invert LoC for data selection: +1 (I think). Letting the cache server > > decide what it needs to know allows you to decouple it from any particular > > cache server, which would let us support things like Grove or NGinx more > > easily. Or at least allow people to write their own config generators > > (plugins?) for those servers. Though honestly, it's probably always going > > to want the entire profile/parameter set and information for assigned > > delivery services anyway, just then it'll decide what's important and > > what's meaningless. This is somewhat more related, imo, since I _think_ > > (without looking at any code) that what Rob's thing does now is just > > request the information it thinks it needs, and builds the configs with > > that. I'd be interested to hear more about "fragility when out-of-sync" > > thing. Or maybe I'm misunderstanding the concept? If what you mean is > > something more like "the cache server selects what specific parameters it > > needs" then I'm -1, but you should be able to get all of the parameters and > > their modified dates with one call to `/profiles?name={{name}}` and then > > decide from there. So the server still tells you everything that just > > changed. Stuff like CG assignment/params and DS assignment/params would > > likewise still need to be checked normally. So +1 for caches deciding what > > API endpoints to call, -1 for big globs of "this is what was updated" being > > pushed to the cache server, and -1 for cache servers trying to guess what > > might have changed instead of checking everything. > > > > - Direct Database Connection: -1 > > - Kafka: -1 > > > > > > > "This is true, but you can also run the cache-config generator to > > visually inspect them as well" > > > > yeah, but then you need to either expose that configuration generator's > > output to the internet or you need to `ssh` into a cache server or > > something to inspect it individually. I think his concern is not being able > > to see it in Traffic Portal, which is arguably safer and much easier than > > the other two options, respectively. > > My response would be "but you can still see arbitrary configuration, so > > maybe that just needs to be made easier to view and understand to > > compensate. Like, instead of > > > > { > > "configFile": "records.config", > > "name": "CONFIG proxy.config.http.insert_response_via_str", > > "secure": false, > > "value": "INT 3" > > }, > > > > you see something like 'encoded Via header verbosity: none/low/med/high'" > > > > > > On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]> wrote: > > > > > This is true, but you can also run the cache-config generator to > > > visually inspect them as well. That makes it easy to visually inspect > > > them as well as to pipe them to diff and mechanically inspect them. So > > > we don't lose the ability entirely, we just move it from one place to > > > another. > > > > > > On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey > > > <[email protected]> wrote: > > > > > > > > A small point, but TO currently allows one to visually inspect/validate > > > the generated configuration files. I don't know how critical that > > > functionality is (I personally found it invaluable when testing logging > > > configuration changes), but it seems like we either have the generation > > > logic in two places (ORT and TO), or we lose that ability in TO by moving > > > all the logic to the cache. > > > > > > > > - Geoff > > > > > > > > On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote: > > > > > > > > my feedback: > > > > > > > > 1. i like the idea of slimming down TO. It's gotten way too fat. > > > Basically > > > > deprecating these api endpoints at some point and letting "something > > > else" > > > > do the job of config generation: > > > > > > > > GET /api/$version/servers/#id/configfiles/ats > > > > GET /api/$version/profiles/#id/configfiles/ats/#filename > > > > GET /api/$version/servers/#id/configfiles/ats/#filename > > > > GET /api/$version/cdns/#id/configfiles/ats/#filename > > > > > > > > 2. i don't really care if that "something else" is a sidecar to ORT > > > or > > > > actually ORT. will let you guys hash that out. > > > > > > > > 3. i like the idea of that "something else" eventually being able to > > > handle > > > > a push vs. a pull as rawlin suggested. > > > > > > > > 4. a bit curious how "cache snapshots" would work as rob suggested > > > > in > > > > > > > https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation > > > - > > > > would you look at a cache snapshot diff and then snapshot (which > > > would > > > > queue updates in the background)? > > > > > > > > otherwise, thanks for taking the initiative, rob. and looking > > > forward to > > > > seeing what comes of this that will make TC safer/more efficient. > > > > > > > > jeremy > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan < > > > [email protected]> > > > > wrote: > > > > > > > > > Smaller, simpler pieces closer to the cache that do one job are > > > > far > > > > > simpler to maintain, triage, and build. I'm not a fan of trying > > > to inject > > > > > a message bus in the middle of everything. > > > > > > > > > > Jonathan G > > > > > > > > > > > > > > > On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]> > > > wrote: > > > > > > > > > > To throw a completely different idea out there . . . some time > > > ago > > > > > Matt Mills was talking about using Kafka as the configuration > > > transport > > > > > mechanism for Traffic Control. The idea is to use a Kafka > > > compacted topic > > > > > as the configuration source. TO would write database updates to > > > Kafka, and > > > > > the ORT equivalent would pull its configuration from Kafka. > > > > > > > > > > To explain compacted topics a bit, a standard Kafka message is > > > a key > > > > > and a payload; in a compacted topics, only the most recent > > > message/payload > > > > > with a particular key is kept. As a result, reading all the > > > messages from > > > > > a topic will give you the current state of what's basically a key > > > value > > > > > store, with the benefit of not doing actual mutations of data. So > > > a cache > > > > > could get the full expected configuration by reading all the > > > existing > > > > > messages on the appropriate topic, as well as get new updates to > > > > > configuration by listening for new Kafka messages. > > > > > > > > > > This leaves the load on the Kafka brokers, which I can assure > > > you > > > > > given recent experience, is minimal. TO would only have the > > > responsibility > > > > > of writing database updates to Kafka, ORT only would need to read > > > > > individual updates (and be smart enough to know how and when to > > > apply them > > > > > -- perhaps hints could be provided in the payload?). The result > > > is TO is > > > > > "pushing" updates to the caches (via Kafka) as Rawlin was > > > proposing, and > > > > > ORT could still pull the full configuration whenever necessary > > > with no hit > > > > > to Postgres or TO. > > > > > > > > > > Now this is obviously a radical shift (and there are no doubt > > > other > > > > > ways to implement the basic idea), but It seemed worth bringing > > > > up. > > > > > > > > > > - Geoff > > > > > > > > > > On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]> > > > wrote: > > > > > > > > > > +1 on this > > > > > > > > > > On Jul 30, 2019, at 6:01 PM, Rawlin Peters < > > > > > [email protected]> wrote: > > > > > > > > > > I've been thinking for a while now that ORT's current > > > pull-based > > > > > model > > > > > of checking for queued updates is not really ideal, and I > > > was > > > > > hoping > > > > > with "ORT 2.0" that we would switch that paradigm around > > > to where > > > > > TO > > > > > itself would push updates out to queued caches. That way > > > TO would > > > > > never get overloaded because we could tune the level of > > > concurrency > > > > > for pushing out updates (based on server capacity/specs), > > > and we > > > > > would > > > > > eliminate the "waiting period" between the time updates > > > are queued > > > > > and > > > > > the time ORT actually updates the config on the cache. > > > > > > > > > > I think cache-side config generation is a good idea in > > > terms of > > > > > enabling canary deployments, but as CDNs continue to scale > > > by > > > > > adding > > > > > more and more caches, we might want to get out ahead of > > > the ORT > > > > > load/waiting problem by flipping that paradigm from "pull" > > > to > > > > > "push" > > > > > somehow. Then instead of 1000 caches all asking TO the > > > > same > > > > > question > > > > > and causing 1000 duplicated reads from the DB, TO would > > > just read > > > > > the > > > > > one answer from the DB and send it to all the caches, > > > further > > > > > reducing > > > > > load on the DB as well. The data in the "push" request > > > from TO to > > > > > ORT > > > > > 2.0 would contain all the information ORT would request > > > from the > > > > > API > > > > > itself, not the actual config files. > > > > > > > > > > With the API transition from Perl to Go, I think we're > > > eliminating > > > > > the > > > > > Perl CPU bottleneck from TO, but the next bottleneck seems > > > like it > > > > > would be reading from the DB due to the constantly growing > > > number > > > > > of > > > > > concurrent ORT requests as a CDN scales up. We should keep > > > that in > > > > > mind for whatever "ORT 2.0"-type changes we're making so > > > that it > > > > > won't > > > > > make flipping that paradigm around even harder. > > > > > > > > > > - Rawlin > > > > > > > > > > > On Tue, Jul 30, 2019 at 4:23 PM Robert Butts < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > >> I'm confused why this is separate from ORT. > > > > > > > > > > > > Because ORT does a lot more than just fetching config > > > files. > > > > > Rewriting all > > > > > > of ORT in Go would be considerably more work. > > > Contrawise, if we > > > > > were to put > > > > > > the config generation in the ORT script itself, we would > > > have to > > > > > write it > > > > > > all from scratch in Perl (the old config gen used the > > > database > > > > > directly, > > > > > > it'd still have to be rewritten) or Python. This was > > > just the > > > > > easiest path > > > > > > forward. > > > > > > > > > > > >> I feel like this logic should just be replacing the > > > config > > > > > fetching logic > > > > > > of ORT > > > > > > > > > > > > That's exactly what it does: the PR changes ORT to call > > > this app > > > > > instead of > > > > > > calling Traffic Ops over HTTP: > > > > > > > > > > > > > > https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485 > > > > > > > > > > > >> Is that the eventual plan? Or does our vision of the > > > future > > > > > include this > > > > > > *and* ORT? > > > > > > > > > > > > I reserve the right to develop a strong opinion about > > > that in > > > > > the future. > > > > > > > > > > > > > > > > > > On Tue, Jul 30, 2019 at 3:17 PM ocket8888 < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > >>> "I'm just looking for consensus that this is the right > > > > > approach." > > > > > >> > > > > > >> Umm... sort of. I think moving cache configuration to > > > the cache > > > > > itself > > > > > >> is a great idea, > > > > > >> > > > > > >> but I'm confused why this is separate from ORT. Like if > > > this is > > > > > going to > > > > > >> be generating the > > > > > >> > > > > > >> configs and it's already right there on the server, I > > > feel like > > > > > this > > > > > >> logic should just be > > > > > >> > > > > > >> replacing the config fetching logic of ORT (and > > > personally I > > > > > think a > > > > > >> neat place to try it > > > > > >> > > > > > >> out would be in ORT.py). > > > > > >> > > > > > >> > > > > > >> Is that the eventual plan? Or does our vision of the > > > future > > > > > include this > > > > > >> *and* ORT? > > > > > >> > > > > > >> > > > > > >>> On 7/30/19 2:15 PM, Robert Butts wrote: > > > > > >>> Hi all! I've been working on moving the ATS config > > > generation > > > > > from > > > > > >> Traffic > > > > > >>> Ops to a standalone app alongside ORT, that queries > > > > the > > > > > standard TO API > > > > > >> to > > > > > >>> generate its data. I just wanted to put it here, and > > > get some > > > > > feedback, > > > > > >> to > > > > > >>> make sure the community agrees this is the right > > > direction. > > > > > >>> > > > > > >>> There's a (very) brief spec here: (I might put more > > > detail > > > > > into it later, > > > > > >>> let me know if that's important to anyone) > > > > > >>> > > > > > >> > > > > > > > > https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation > > > > > >>> > > > > > >>> And the Draft PR is here: > > > > > >>> https://github.com/apache/trafficcontrol/pull/3762 > > > > > >>> > > > > > >>> This has a number of advantages: > > > > > >>> 1. TO is a monolith, this moves a significant amount > > > of logic > > > > > out of it, > > > > > >>> into a smaller per-cache app/library that's easier to > > > test, > > > > > validate, > > > > > >>> rewrite, deploy, canary, rollback, etc. > > > > > >>> 2. Deploying cache config changes is much smaller and > > > safer. > > > > > Instead of > > > > > >>> having to deploy (and potentially roll back) TO, you > > > can > > > > > canary deploy on > > > > > >>> one cache at a time. > > > > > >>> 3. This makes TC more cache-agnostic. It moves cache > > > config > > > > > generation > > > > > >>> logic out of TO, and into an independent app/library. > > > The app > > > > > (atstccfg) > > > > > >> is > > > > > >>> actually very similar to Grove's config generator > > > > > (grovetccfg). This > > > > > >> makes > > > > > >>> it easier and more obvious how to write config > > > generators for > > > > > other > > > > > >> proxies. > > > > > >>> 4. By using the API and putting the generator > > > functions in a > > > > > library, > > > > > >> this > > > > > >>> really gives a lot more flexibility to put the config > > > gen > > > > > anywhere you > > > > > >> want > > > > > >>> without too much work. You could easily put it in an > > > HTTP > > > > > service, or > > > > > >> even > > > > > >>> put it back in TO via a Plugin. That's not something > > > that's > > > > > really > > > > > >> possible > > > > > >>> with the existing system, generating directly from the > > > > > database. > > > > > >>> > > > > > >>> Right now, I'm just looking for consensus that this is > > > the > > > > > right > > > > > >> approach. > > > > > >>> Does the community agree this is the right direction? > > > Are > > > > > there concerns? > > > > > >>> Would anyone like more details about anything in > > > particular? > > > > > >>> > > > > > >>> Thanks, > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
