Re: [EXTERNAL] Re: Cache-Side Config Generation

Evan Zelkowitz Thu, 01 Aug 2019 18:11:10 -0700

Also a +1 on the defining a standard common language that this library
will use instead of direct dependencies on TO data. I know it has been
bandied about for years that instead of having a direct TOdata->ATS
config instead we have TOdata->cache side library->generic CDN
language->$LocalCacheSoftwareConfig


Would be nice to keep that dream alive. So now you would have a
library to handle TO data conversion to generic data, then separate
libraries per cache software to determine config from that data

On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]> wrote:
>
> It sounds like:
> (A) everyone is +1 on cache-side config generation
> (B) most people are -1 on caches connecting directly to the TO DB
> (C) most people are +1 on TO pushing data to ORT instead of the other way 
> around
> (D) most people are -1 on using Kafka for cache configs
>
> For (A) I'm +1 on the approach (ORT sidecar), but I think if we can
> design the ats config gen library in a way that it just takes in a
> well-defined set of input data and returns strings (the generated
> config files) as output, it shouldn't really matter if the input data
> comes from direct DB queries, API calls from ORT to TO, or pushed data
> from TO to ORT. Whatever that library ends up looking like, it should
> just be a matter of getting that data from some source, and converting
> it into the input format expected by the library. The library should
> not have any dependency on external data -- only what has been passed
> into the library's function. Then we will get a lot of nice benefits
> in terms of testability and reusability.
>
> Testability:
> Given a set of input data, we can expect certain output in terms of
> the ATS config files, so it would be easy to write unit tests for.
> Right now that's a hard thing to do because we have to mock out every
> single DB call for those unit tests
>
> Reusability:
> The library could be shared between the TO API and this new
> ORT-sidecar thing, and the only difference should be that the TO API
> runs a set of DB queries to populate the input data whereas the
> ORT-sidecar thing would run a set of TO API calls to populate the
> input data.
>
> I know it might be more difficult to come up with that well-defined
> input interface than to just make DB or API calls whenever you need
> data in the library, but I think it would be well worth the effort for
> those reasons above.
>
> - Rawlin
>
> On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]> wrote:
> >
> > Well, in that spirit:
> >
> > - Cache-side processing: +1. I suppose given the fact that we wouldn't want
> > to rewrite the entire configuration generation logic at once, there's no
> > reason to prefer this being part of ORT immediately versus separate. Either
> > way, there's no real "extra step". Though I must admit, I am sad to see
> > this written in Go and not built into my baby: ORT.py
> >
> > - Invert LoC for config update: +1 because this essentially lets you do
> > server configuration snapshots for free, in addition to the other benefits,
> > and that's a pretty requested feature, I think. For servers that are
> > unreachable from TO, there's a problem, and after backing off for some set
> > number of retries they should probably just be marked offline with reason
> > e.g. "unable to contact for updates". This is a bit off-topic, though, imo,
> > because it's sort of independent from what Rob's suggesting, and that
> > utility isn't even really designed with that in mind - at least just yet. A
> > conversation for after it's finished.
> >
> > - Invert LoC for data selection: +1 (I think). Letting the cache server
> > decide what it needs to know allows you to decouple it from any particular
> > cache server, which would let us support things like Grove or NGinx more
> > easily. Or at least allow people to write their own config generators
> > (plugins?) for those servers. Though honestly, it's probably always going
> > to want the entire profile/parameter set and information for assigned
> > delivery services anyway, just then it'll decide what's important and
> > what's meaningless. This is somewhat more related, imo, since I _think_
> > (without looking at any code) that what Rob's thing does now is just
> > request the information it thinks it needs, and builds the configs with
> > that. I'd be interested to hear more about "fragility when out-of-sync"
> > thing. Or maybe I'm misunderstanding the concept? If what you mean is
> > something more like "the cache server selects what specific parameters it
> > needs" then I'm -1, but you should be able to get all of the parameters and
> > their modified dates with one call to `/profiles?name={{name}}` and then
> > decide from there. So the server still tells you everything that just
> > changed. Stuff like CG assignment/params and DS assignment/params would
> > likewise still need to be checked normally. So +1 for caches deciding what
> > API endpoints to call, -1 for big globs of "this is what was updated" being
> > pushed to the cache server, and -1 for cache servers trying to guess what
> > might have changed instead of checking everything.
> >
> > - Direct Database Connection: -1
> > - Kafka: -1
> >
> >
> > > "This is true, but you can also run the cache-config generator to
> > visually inspect them as well"
> >
> > yeah, but then you need to either expose that configuration generator's
> > output to the internet or you need to `ssh` into a cache server or
> > something to inspect it individually. I think his concern is not being able
> > to see it in Traffic Portal, which is arguably safer and much easier than
> > the other two options, respectively.
> > My response would be "but you can still see arbitrary configuration, so
> > maybe that just needs to be made easier to view and understand to
> > compensate. Like, instead of
> >
> > {
> > "configFile": "records.config",
> > "name": "CONFIG proxy.config.http.insert_response_via_str",
> > "secure": false,
> > "value": "INT 3"
> > },
> >
> > you see something like 'encoded Via header verbosity: none/low/med/high'"
> >
> >
> > On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]> wrote:
> >
> > > This is true, but you can also run the cache-config generator to
> > > visually inspect them as well. That makes it easy to visually inspect
> > > them as well as to pipe them to diff and mechanically inspect them. So
> > > we don't lose the ability entirely, we just move it from one place to
> > > another.
> > >
> > > On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey
> > > <[email protected]> wrote:
> > > >
> > > > A small point, but TO currently allows one to visually inspect/validate
> > > the generated configuration files.  I don't know how critical that
> > > functionality is (I personally found it invaluable when testing logging
> > > configuration changes), but it seems like we either have the generation
> > > logic in two places (ORT and TO), or we lose that ability in TO by moving
> > > all the logic to the cache.
> > > >
> > > > - Geoff
> > > >
> > > > On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote:
> > > >
> > > >     my feedback:
> > > >
> > > >     1. i like the idea of slimming down TO. It's gotten way too fat.
> > > Basically
> > > >     deprecating these api endpoints at some point and letting "something
> > > else"
> > > >     do the job of config generation:
> > > >
> > > >     GET /api/$version/servers/#id/configfiles/ats
> > > >     GET /api/$version/profiles/#id/configfiles/ats/#filename
> > > >     GET /api/$version/servers/#id/configfiles/ats/#filename
> > > >     GET /api/$version/cdns/#id/configfiles/ats/#filename
> > > >
> > > >     2.  i don't really care if that "something else" is a sidecar to ORT
> > > or
> > > >     actually ORT. will let you guys hash that out.
> > > >
> > > >     3. i like the idea of that "something else" eventually being able to
> > > handle
> > > >     a push vs. a pull as rawlin suggested.
> > > >
> > > >     4. a bit curious how "cache snapshots" would work as rob suggested 
> > > > in
> > > >
> > > https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> > > -
> > > >     would you look at a cache snapshot diff and then snapshot (which
> > > would
> > > >     queue updates in the background)?
> > > >
> > > >     otherwise, thanks for taking the initiative, rob. and looking
> > > forward to
> > > >     seeing what comes of this that will make TC safer/more efficient.
> > > >
> > > >     jeremy
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >     On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan <
> > > [email protected]>
> > > >     wrote:
> > > >
> > > >     > Smaller, simpler pieces closer to the cache that do one job are 
> > > > far
> > > >     > simpler to maintain, triage, and build.  I'm not a fan of trying
> > > to inject
> > > >     > a message bus in the middle of everything.
> > > >     >
> > > >     > Jonathan G
> > > >     >
> > > >     >
> > > >     > On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]>
> > > wrote:
> > > >     >
> > > >     >     To throw a completely different idea out there . . . some time
> > > ago
> > > >     > Matt Mills was talking about using Kafka as the configuration
> > > transport
> > > >     > mechanism for Traffic Control.  The idea is to use a Kafka
> > > compacted topic
> > > >     > as the configuration source.  TO would write database updates to
> > > Kafka, and
> > > >     > the ORT equivalent would pull its configuration from Kafka.
> > > >     >
> > > >     >     To explain compacted topics a bit, a standard Kafka message is
> > > a key
> > > >     > and a payload; in a compacted topics, only the most recent
> > > message/payload
> > > >     > with a particular key is kept.  As a result, reading all the
> > > messages from
> > > >     > a topic will give you the current state of what's basically a key
> > > value
> > > >     > store, with the benefit of not doing actual mutations of data.  So
> > > a cache
> > > >     > could get the full expected configuration by reading all the
> > > existing
> > > >     > messages on the appropriate topic, as well as get new updates to
> > > >     > configuration by listening for new Kafka messages.
> > > >     >
> > > >     >     This leaves the load on the Kafka brokers, which I can assure
> > > you
> > > >     > given recent experience, is minimal.  TO would only have the
> > > responsibility
> > > >     > of writing database updates to Kafka, ORT only would need to read
> > > >     > individual updates (and be smart enough to know how and when to
> > > apply them
> > > >     > -- perhaps hints could be provided in the payload?).  The result
> > > is TO is
> > > >     > "pushing" updates to the caches (via Kafka) as Rawlin was
> > > proposing, and
> > > >     > ORT could still pull the full configuration whenever necessary
> > > with no hit
> > > >     > to Postgres or TO.
> > > >     >
> > > >     >     Now this is obviously a radical shift (and there are no doubt
> > > other
> > > >     > ways to implement the basic idea), but It seemed worth bringing 
> > > > up.
> > > >     >
> > > >     >     - Geoff
> > > >     >
> > > >     >     On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]>
> > > wrote:
> > > >     >
> > > >     >         +1 on this
> > > >     >
> > > >     >         On Jul 30, 2019, at 6:01 PM, Rawlin Peters <
> > > >     > [email protected]> wrote:
> > > >     >
> > > >     >         I've been thinking for a while now that ORT's current
> > > pull-based
> > > >     > model
> > > >     >         of checking for queued updates is not really ideal, and I
> > > was
> > > >     > hoping
> > > >     >         with "ORT 2.0" that we would switch that paradigm around
> > > to where
> > > >     > TO
> > > >     >         itself would push updates out to queued caches. That way
> > > TO would
> > > >     >         never get overloaded because we could tune the level of
> > > concurrency
> > > >     >         for pushing out updates (based on server capacity/specs),
> > > and we
> > > >     > would
> > > >     >         eliminate the "waiting period" between the time updates
> > > are queued
> > > >     > and
> > > >     >         the time ORT actually updates the config on the cache.
> > > >     >
> > > >     >         I think cache-side config generation is a good idea in
> > > terms of
> > > >     >         enabling canary deployments, but as CDNs continue to scale
> > > by
> > > >     > adding
> > > >     >         more and more caches, we might want to get out ahead of
> > > the ORT
> > > >     >         load/waiting problem by flipping that paradigm from "pull"
> > > to
> > > >     > "push"
> > > >     >         somehow. Then instead of 1000 caches all asking TO the 
> > > > same
> > > >     > question
> > > >     >         and causing 1000 duplicated reads from the DB, TO would
> > > just read
> > > >     > the
> > > >     >         one answer from the DB and send it to all the caches,
> > > further
> > > >     > reducing
> > > >     >         load on the DB as well. The data in the "push" request
> > > from TO to
> > > >     > ORT
> > > >     >         2.0 would contain all the information ORT would request
> > > from the
> > > >     > API
> > > >     >         itself, not the actual config files.
> > > >     >
> > > >     >         With the API transition from Perl to Go, I think we're
> > > eliminating
> > > >     > the
> > > >     >         Perl CPU bottleneck from TO, but the next bottleneck seems
> > > like it
> > > >     >         would be reading from the DB due to the constantly growing
> > > number
> > > >     > of
> > > >     >         concurrent ORT requests as a CDN scales up. We should keep
> > > that in
> > > >     >         mind for whatever "ORT 2.0"-type changes we're making so
> > > that it
> > > >     > won't
> > > >     >         make flipping that paradigm around even harder.
> > > >     >
> > > >     >         - Rawlin
> > > >     >
> > > >     >         > On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <
> > > [email protected]>
> > > >     > wrote:
> > > >     >         >
> > > >     >         >> I'm confused why this is separate from ORT.
> > > >     >         >
> > > >     >         > Because ORT does a lot more than just fetching config
> > > files.
> > > >     > Rewriting all
> > > >     >         > of ORT in Go would be considerably more work.
> > > Contrawise, if we
> > > >     > were to put
> > > >     >         > the config generation in the ORT script itself, we would
> > > have to
> > > >     > write it
> > > >     >         > all from scratch in Perl (the old config gen used the
> > > database
> > > >     > directly,
> > > >     >         > it'd still have to be rewritten) or Python. This was
> > > just the
> > > >     > easiest path
> > > >     >         > forward.
> > > >     >         >
> > > >     >         >> I feel like this logic should just be replacing the
> > > config
> > > >     > fetching logic
> > > >     >         > of ORT
> > > >     >         >
> > > >     >         > That's exactly what it does: the PR changes ORT to call
> > > this app
> > > >     > instead of
> > > >     >         > calling Traffic Ops over HTTP:
> > > >     >         >
> > > >     >
> > > https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
> > > >     >         >
> > > >     >         >> Is that the eventual plan? Or does our vision of the
> > > future
> > > >     > include this
> > > >     >         > *and* ORT?
> > > >     >         >
> > > >     >         > I reserve the right to develop a strong opinion about
> > > that in
> > > >     > the future.
> > > >     >         >
> > > >     >         >
> > > >     >         > On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <
> > > [email protected]>
> > > >     > wrote:
> > > >     >         >
> > > >     >         >>> "I'm just looking for consensus that this is the right
> > > >     > approach."
> > > >     >         >>
> > > >     >         >> Umm... sort of. I think moving cache configuration to
> > > the cache
> > > >     > itself
> > > >     >         >> is a great idea,
> > > >     >         >>
> > > >     >         >> but I'm confused why this is separate from ORT. Like if
> > > this is
> > > >     > going to
> > > >     >         >> be generating the
> > > >     >         >>
> > > >     >         >> configs and it's already right there on the server, I
> > > feel like
> > > >     > this
> > > >     >         >> logic should just be
> > > >     >         >>
> > > >     >         >> replacing the config fetching logic of ORT (and
> > > personally I
> > > >     > think a
> > > >     >         >> neat place to try it
> > > >     >         >>
> > > >     >         >> out would be in ORT.py).
> > > >     >         >>
> > > >     >         >>
> > > >     >         >> Is that the eventual plan? Or does our vision of the
> > > future
> > > >     > include this
> > > >     >         >> *and* ORT?
> > > >     >         >>
> > > >     >         >>
> > > >     >         >>> On 7/30/19 2:15 PM, Robert Butts wrote:
> > > >     >         >>> Hi all! I've been working on moving the ATS config
> > > generation
> > > >     > from
> > > >     >         >> Traffic
> > > >     >         >>> Ops to a standalone app alongside ORT, that queries 
> > > > the
> > > >     > standard TO API
> > > >     >         >> to
> > > >     >         >>> generate its data. I just wanted to put it here, and
> > > get some
> > > >     > feedback,
> > > >     >         >> to
> > > >     >         >>> make sure the community agrees this is the right
> > > direction.
> > > >     >         >>>
> > > >     >         >>> There's a (very) brief spec here: (I might put more
> > > detail
> > > >     > into it later,
> > > >     >         >>> let me know if that's important to anyone)
> > > >     >         >>>
> > > >     >         >>
> > > >     >
> > > https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> > > >     >         >>>
> > > >     >         >>> And the Draft PR is here:
> > > >     >         >>> https://github.com/apache/trafficcontrol/pull/3762
> > > >     >         >>>
> > > >     >         >>> This has a number of advantages:
> > > >     >         >>> 1. TO is a monolith, this moves a significant amount
> > > of logic
> > > >     > out of it,
> > > >     >         >>> into a smaller per-cache app/library that's easier to
> > > test,
> > > >     > validate,
> > > >     >         >>> rewrite, deploy, canary, rollback, etc.
> > > >     >         >>> 2. Deploying cache config changes is much smaller and
> > > safer.
> > > >     > Instead of
> > > >     >         >>> having to deploy (and potentially roll back) TO, you
> > > can
> > > >     > canary deploy on
> > > >     >         >>> one cache at a time.
> > > >     >         >>> 3. This makes TC more cache-agnostic. It moves cache
> > > config
> > > >     > generation
> > > >     >         >>> logic out of TO, and into an independent app/library.
> > > The app
> > > >     > (atstccfg)
> > > >     >         >> is
> > > >     >         >>> actually very similar to Grove's config generator
> > > >     > (grovetccfg). This
> > > >     >         >> makes
> > > >     >         >>> it easier and more obvious how to write config
> > > generators for
> > > >     > other
> > > >     >         >> proxies.
> > > >     >         >>> 4. By using the API and putting the generator
> > > functions in a
> > > >     > library,
> > > >     >         >> this
> > > >     >         >>> really gives a lot more flexibility to put the config
> > > gen
> > > >     > anywhere you
> > > >     >         >> want
> > > >     >         >>> without too much work. You could easily put it in an
> > > HTTP
> > > >     > service, or
> > > >     >         >> even
> > > >     >         >>> put it back in TO via a Plugin. That's not something
> > > that's
> > > >     > really
> > > >     >         >> possible
> > > >     >         >>> with the existing system, generating directly from the
> > > >     > database.
> > > >     >         >>>
> > > >     >         >>> Right now, I'm just looking for consensus that this is
> > > the
> > > >     > right
> > > >     >         >> approach.
> > > >     >         >>> Does the community agree this is the right direction?
> > > Are
> > > >     > there concerns?
> > > >     >         >>> Would anyone like more details about anything in
> > > particular?
> > > >     >         >>>
> > > >     >         >>> Thanks,
> > > >     >         >>>
> > > >     >         >>
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >
> > > >
> > >

Re: [EXTERNAL] Re: Cache-Side Config Generation

Reply via email to