> On Aug 1, 2019, at 18:10, Evan Zelkowitz <[email protected]> wrote: > > Also a +1 on the defining a standard common language that this library > will use instead of direct dependencies on TO data. I know it has been > bandied about for years that instead of having a direct TOdata->ATS > config instead we have TOdata->cache side library->generic CDN > language->$LocalCacheSoftwareConfig > > Would be nice to keep that dream alive. So now you would have a > library to handle TO data conversion to generic data, then separate > libraries per cache software to determine config from that data
Maybe use https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json <https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json> as a start for this generic CDN language? Rgds, JvD > > On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]> wrote: >> >> It sounds like: >> (A) everyone is +1 on cache-side config generation >> (B) most people are -1 on caches connecting directly to the TO DB >> (C) most people are +1 on TO pushing data to ORT instead of the other way >> around >> (D) most people are -1 on using Kafka for cache configs >> >> For (A) I'm +1 on the approach (ORT sidecar), but I think if we can >> design the ats config gen library in a way that it just takes in a >> well-defined set of input data and returns strings (the generated >> config files) as output, it shouldn't really matter if the input data >> comes from direct DB queries, API calls from ORT to TO, or pushed data >> from TO to ORT. Whatever that library ends up looking like, it should >> just be a matter of getting that data from some source, and converting >> it into the input format expected by the library. The library should >> not have any dependency on external data -- only what has been passed >> into the library's function. Then we will get a lot of nice benefits >> in terms of testability and reusability. >> >> Testability: >> Given a set of input data, we can expect certain output in terms of >> the ATS config files, so it would be easy to write unit tests for. >> Right now that's a hard thing to do because we have to mock out every >> single DB call for those unit tests >> >> Reusability: >> The library could be shared between the TO API and this new >> ORT-sidecar thing, and the only difference should be that the TO API >> runs a set of DB queries to populate the input data whereas the >> ORT-sidecar thing would run a set of TO API calls to populate the >> input data. >> >> I know it might be more difficult to come up with that well-defined >> input interface than to just make DB or API calls whenever you need >> data in the library, but I think it would be well worth the effort for >> those reasons above. >> >> - Rawlin >> >> On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]> wrote: >>> >>> Well, in that spirit: >>> >>> - Cache-side processing: +1. I suppose given the fact that we wouldn't want >>> to rewrite the entire configuration generation logic at once, there's no >>> reason to prefer this being part of ORT immediately versus separate. Either >>> way, there's no real "extra step". Though I must admit, I am sad to see >>> this written in Go and not built into my baby: ORT.py >>> >>> - Invert LoC for config update: +1 because this essentially lets you do >>> server configuration snapshots for free, in addition to the other benefits, >>> and that's a pretty requested feature, I think. For servers that are >>> unreachable from TO, there's a problem, and after backing off for some set >>> number of retries they should probably just be marked offline with reason >>> e.g. "unable to contact for updates". This is a bit off-topic, though, imo, >>> because it's sort of independent from what Rob's suggesting, and that >>> utility isn't even really designed with that in mind - at least just yet. A >>> conversation for after it's finished. >>> >>> - Invert LoC for data selection: +1 (I think). Letting the cache server >>> decide what it needs to know allows you to decouple it from any particular >>> cache server, which would let us support things like Grove or NGinx more >>> easily. Or at least allow people to write their own config generators >>> (plugins?) for those servers. Though honestly, it's probably always going >>> to want the entire profile/parameter set and information for assigned >>> delivery services anyway, just then it'll decide what's important and >>> what's meaningless. This is somewhat more related, imo, since I _think_ >>> (without looking at any code) that what Rob's thing does now is just >>> request the information it thinks it needs, and builds the configs with >>> that. I'd be interested to hear more about "fragility when out-of-sync" >>> thing. Or maybe I'm misunderstanding the concept? If what you mean is >>> something more like "the cache server selects what specific parameters it >>> needs" then I'm -1, but you should be able to get all of the parameters and >>> their modified dates with one call to `/profiles?name={{name}}` and then >>> decide from there. So the server still tells you everything that just >>> changed. Stuff like CG assignment/params and DS assignment/params would >>> likewise still need to be checked normally. So +1 for caches deciding what >>> API endpoints to call, -1 for big globs of "this is what was updated" being >>> pushed to the cache server, and -1 for cache servers trying to guess what >>> might have changed instead of checking everything. >>> >>> - Direct Database Connection: -1 >>> - Kafka: -1 >>> >>> >>>> "This is true, but you can also run the cache-config generator to >>> visually inspect them as well" >>> >>> yeah, but then you need to either expose that configuration generator's >>> output to the internet or you need to `ssh` into a cache server or >>> something to inspect it individually. I think his concern is not being able >>> to see it in Traffic Portal, which is arguably safer and much easier than >>> the other two options, respectively. >>> My response would be "but you can still see arbitrary configuration, so >>> maybe that just needs to be made easier to view and understand to >>> compensate. Like, instead of >>> >>> { >>> "configFile": "records.config", >>> "name": "CONFIG proxy.config.http.insert_response_via_str", >>> "secure": false, >>> "value": "INT 3" >>> }, >>> >>> you see something like 'encoded Via header verbosity: none/low/med/high'" >>> >>> >>> On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]> wrote: >>> >>>> This is true, but you can also run the cache-config generator to >>>> visually inspect them as well. That makes it easy to visually inspect >>>> them as well as to pipe them to diff and mechanically inspect them. So >>>> we don't lose the ability entirely, we just move it from one place to >>>> another. >>>> >>>> On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey >>>> <[email protected]> wrote: >>>>> >>>>> A small point, but TO currently allows one to visually inspect/validate >>>> the generated configuration files. I don't know how critical that >>>> functionality is (I personally found it invaluable when testing logging >>>> configuration changes), but it seems like we either have the generation >>>> logic in two places (ORT and TO), or we lose that ability in TO by moving >>>> all the logic to the cache. >>>>> >>>>> - Geoff >>>>> >>>>> On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote: >>>>> >>>>> my feedback: >>>>> >>>>> 1. i like the idea of slimming down TO. It's gotten way too fat. >>>> Basically >>>>> deprecating these api endpoints at some point and letting "something >>>> else" >>>>> do the job of config generation: >>>>> >>>>> GET /api/$version/servers/#id/configfiles/ats >>>>> GET /api/$version/profiles/#id/configfiles/ats/#filename >>>>> GET /api/$version/servers/#id/configfiles/ats/#filename >>>>> GET /api/$version/cdns/#id/configfiles/ats/#filename >>>>> >>>>> 2. i don't really care if that "something else" is a sidecar to ORT >>>> or >>>>> actually ORT. will let you guys hash that out. >>>>> >>>>> 3. i like the idea of that "something else" eventually being able to >>>> handle >>>>> a push vs. a pull as rawlin suggested. >>>>> >>>>> 4. a bit curious how "cache snapshots" would work as rob suggested in >>>>> >>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation >>>> - >>>>> would you look at a cache snapshot diff and then snapshot (which >>>> would >>>>> queue updates in the background)? >>>>> >>>>> otherwise, thanks for taking the initiative, rob. and looking >>>> forward to >>>>> seeing what comes of this that will make TC safer/more efficient. >>>>> >>>>> jeremy >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan < >>>> [email protected]> >>>>> wrote: >>>>> >>>>>> Smaller, simpler pieces closer to the cache that do one job are far >>>>>> simpler to maintain, triage, and build. I'm not a fan of trying >>>> to inject >>>>>> a message bus in the middle of everything. >>>>>> >>>>>> Jonathan G >>>>>> >>>>>> >>>>>> On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]> >>>> wrote: >>>>>> >>>>>> To throw a completely different idea out there . . . some time >>>> ago >>>>>> Matt Mills was talking about using Kafka as the configuration >>>> transport >>>>>> mechanism for Traffic Control. The idea is to use a Kafka >>>> compacted topic >>>>>> as the configuration source. TO would write database updates to >>>> Kafka, and >>>>>> the ORT equivalent would pull its configuration from Kafka. >>>>>> >>>>>> To explain compacted topics a bit, a standard Kafka message is >>>> a key >>>>>> and a payload; in a compacted topics, only the most recent >>>> message/payload >>>>>> with a particular key is kept. As a result, reading all the >>>> messages from >>>>>> a topic will give you the current state of what's basically a key >>>> value >>>>>> store, with the benefit of not doing actual mutations of data. So >>>> a cache >>>>>> could get the full expected configuration by reading all the >>>> existing >>>>>> messages on the appropriate topic, as well as get new updates to >>>>>> configuration by listening for new Kafka messages. >>>>>> >>>>>> This leaves the load on the Kafka brokers, which I can assure >>>> you >>>>>> given recent experience, is minimal. TO would only have the >>>> responsibility >>>>>> of writing database updates to Kafka, ORT only would need to read >>>>>> individual updates (and be smart enough to know how and when to >>>> apply them >>>>>> -- perhaps hints could be provided in the payload?). The result >>>> is TO is >>>>>> "pushing" updates to the caches (via Kafka) as Rawlin was >>>> proposing, and >>>>>> ORT could still pull the full configuration whenever necessary >>>> with no hit >>>>>> to Postgres or TO. >>>>>> >>>>>> Now this is obviously a radical shift (and there are no doubt >>>> other >>>>>> ways to implement the basic idea), but It seemed worth bringing up. >>>>>> >>>>>> - Geoff >>>>>> >>>>>> On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]> >>>> wrote: >>>>>> >>>>>> +1 on this >>>>>> >>>>>> On Jul 30, 2019, at 6:01 PM, Rawlin Peters < >>>>>> [email protected]> wrote: >>>>>> >>>>>> I've been thinking for a while now that ORT's current >>>> pull-based >>>>>> model >>>>>> of checking for queued updates is not really ideal, and I >>>> was >>>>>> hoping >>>>>> with "ORT 2.0" that we would switch that paradigm around >>>> to where >>>>>> TO >>>>>> itself would push updates out to queued caches. That way >>>> TO would >>>>>> never get overloaded because we could tune the level of >>>> concurrency >>>>>> for pushing out updates (based on server capacity/specs), >>>> and we >>>>>> would >>>>>> eliminate the "waiting period" between the time updates >>>> are queued >>>>>> and >>>>>> the time ORT actually updates the config on the cache. >>>>>> >>>>>> I think cache-side config generation is a good idea in >>>> terms of >>>>>> enabling canary deployments, but as CDNs continue to scale >>>> by >>>>>> adding >>>>>> more and more caches, we might want to get out ahead of >>>> the ORT >>>>>> load/waiting problem by flipping that paradigm from "pull" >>>> to >>>>>> "push" >>>>>> somehow. Then instead of 1000 caches all asking TO the same >>>>>> question >>>>>> and causing 1000 duplicated reads from the DB, TO would >>>> just read >>>>>> the >>>>>> one answer from the DB and send it to all the caches, >>>> further >>>>>> reducing >>>>>> load on the DB as well. The data in the "push" request >>>> from TO to >>>>>> ORT >>>>>> 2.0 would contain all the information ORT would request >>>> from the >>>>>> API >>>>>> itself, not the actual config files. >>>>>> >>>>>> With the API transition from Perl to Go, I think we're >>>> eliminating >>>>>> the >>>>>> Perl CPU bottleneck from TO, but the next bottleneck seems >>>> like it >>>>>> would be reading from the DB due to the constantly growing >>>> number >>>>>> of >>>>>> concurrent ORT requests as a CDN scales up. We should keep >>>> that in >>>>>> mind for whatever "ORT 2.0"-type changes we're making so >>>> that it >>>>>> won't >>>>>> make flipping that paradigm around even harder. >>>>>> >>>>>> - Rawlin >>>>>> >>>>>>> On Tue, Jul 30, 2019 at 4:23 PM Robert Butts < >>>> [email protected]> >>>>>> wrote: >>>>>>> >>>>>>>> I'm confused why this is separate from ORT. >>>>>>> >>>>>>> Because ORT does a lot more than just fetching config >>>> files. >>>>>> Rewriting all >>>>>>> of ORT in Go would be considerably more work. >>>> Contrawise, if we >>>>>> were to put >>>>>>> the config generation in the ORT script itself, we would >>>> have to >>>>>> write it >>>>>>> all from scratch in Perl (the old config gen used the >>>> database >>>>>> directly, >>>>>>> it'd still have to be rewritten) or Python. This was >>>> just the >>>>>> easiest path >>>>>>> forward. >>>>>>> >>>>>>>> I feel like this logic should just be replacing the >>>> config >>>>>> fetching logic >>>>>>> of ORT >>>>>>> >>>>>>> That's exactly what it does: the PR changes ORT to call >>>> this app >>>>>> instead of >>>>>>> calling Traffic Ops over HTTP: >>>>>>> >>>>>> >>>> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485 >>>>>>> >>>>>>>> Is that the eventual plan? Or does our vision of the >>>> future >>>>>> include this >>>>>>> *and* ORT? >>>>>>> >>>>>>> I reserve the right to develop a strong opinion about >>>> that in >>>>>> the future. >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 < >>>> [email protected]> >>>>>> wrote: >>>>>>> >>>>>>>>> "I'm just looking for consensus that this is the right >>>>>> approach." >>>>>>>> >>>>>>>> Umm... sort of. I think moving cache configuration to >>>> the cache >>>>>> itself >>>>>>>> is a great idea, >>>>>>>> >>>>>>>> but I'm confused why this is separate from ORT. Like if >>>> this is >>>>>> going to >>>>>>>> be generating the >>>>>>>> >>>>>>>> configs and it's already right there on the server, I >>>> feel like >>>>>> this >>>>>>>> logic should just be >>>>>>>> >>>>>>>> replacing the config fetching logic of ORT (and >>>> personally I >>>>>> think a >>>>>>>> neat place to try it >>>>>>>> >>>>>>>> out would be in ORT.py). >>>>>>>> >>>>>>>> >>>>>>>> Is that the eventual plan? Or does our vision of the >>>> future >>>>>> include this >>>>>>>> *and* ORT? >>>>>>>> >>>>>>>> >>>>>>>>> On 7/30/19 2:15 PM, Robert Butts wrote: >>>>>>>>> Hi all! I've been working on moving the ATS config >>>> generation >>>>>> from >>>>>>>> Traffic >>>>>>>>> Ops to a standalone app alongside ORT, that queries the >>>>>> standard TO API >>>>>>>> to >>>>>>>>> generate its data. I just wanted to put it here, and >>>> get some >>>>>> feedback, >>>>>>>> to >>>>>>>>> make sure the community agrees this is the right >>>> direction. >>>>>>>>> >>>>>>>>> There's a (very) brief spec here: (I might put more >>>> detail >>>>>> into it later, >>>>>>>>> let me know if that's important to anyone) >>>>>>>>> >>>>>>>> >>>>>> >>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation >>>>>>>>> >>>>>>>>> And the Draft PR is here: >>>>>>>>> https://github.com/apache/trafficcontrol/pull/3762 >>>>>>>>> >>>>>>>>> This has a number of advantages: >>>>>>>>> 1. TO is a monolith, this moves a significant amount >>>> of logic >>>>>> out of it, >>>>>>>>> into a smaller per-cache app/library that's easier to >>>> test, >>>>>> validate, >>>>>>>>> rewrite, deploy, canary, rollback, etc. >>>>>>>>> 2. Deploying cache config changes is much smaller and >>>> safer. >>>>>> Instead of >>>>>>>>> having to deploy (and potentially roll back) TO, you >>>> can >>>>>> canary deploy on >>>>>>>>> one cache at a time. >>>>>>>>> 3. This makes TC more cache-agnostic. It moves cache >>>> config >>>>>> generation >>>>>>>>> logic out of TO, and into an independent app/library. >>>> The app >>>>>> (atstccfg) >>>>>>>> is >>>>>>>>> actually very similar to Grove's config generator >>>>>> (grovetccfg). This >>>>>>>> makes >>>>>>>>> it easier and more obvious how to write config >>>> generators for >>>>>> other >>>>>>>> proxies. >>>>>>>>> 4. By using the API and putting the generator >>>> functions in a >>>>>> library, >>>>>>>> this >>>>>>>>> really gives a lot more flexibility to put the config >>>> gen >>>>>> anywhere you >>>>>>>> want >>>>>>>>> without too much work. You could easily put it in an >>>> HTTP >>>>>> service, or >>>>>>>> even >>>>>>>>> put it back in TO via a Plugin. That's not something >>>> that's >>>>>> really >>>>>>>> possible >>>>>>>>> with the existing system, generating directly from the >>>>>> database. >>>>>>>>> >>>>>>>>> Right now, I'm just looking for consensus that this is >>>> the >>>>>> right >>>>>>>> approach. >>>>>>>>> Does the community agree this is the right direction? >>>> Are >>>>>> there concerns? >>>>>>>>> Would anyone like more details about anything in >>>> particular? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>
