Re: [EXTERNAL] Re: Cache-Side Config Generation

Chris Lemmons Wed, 31 Jul 2019 10:01:49 -0700

This is true, but you can also run the cache-config generator to
visually inspect them as well. That makes it easy to visually inspect
them as well as to pipe them to diff and mechanically inspect them. So
we don't lose the ability entirely, we just move it from one place to
another.


On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey
<[email protected]> wrote:
>
> A small point, but TO currently allows one to visually inspect/validate the 
> generated configuration files.  I don't know how critical that functionality 
> is (I personally found it invaluable when testing logging configuration 
> changes), but it seems like we either have the generation logic in two places 
> (ORT and TO), or we lose that ability in TO by moving all the logic to the 
> cache.
>
> - Geoff
>
> On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote:
>
>     my feedback:
>
>     1. i like the idea of slimming down TO. It's gotten way too fat. Basically
>     deprecating these api endpoints at some point and letting "something else"
>     do the job of config generation:
>
>     GET /api/$version/servers/#id/configfiles/ats
>     GET /api/$version/profiles/#id/configfiles/ats/#filename
>     GET /api/$version/servers/#id/configfiles/ats/#filename
>     GET /api/$version/cdns/#id/configfiles/ats/#filename
>
>     2.  i don't really care if that "something else" is a sidecar to ORT or
>     actually ORT. will let you guys hash that out.
>
>     3. i like the idea of that "something else" eventually being able to 
> handle
>     a push vs. a pull as rawlin suggested.
>
>     4. a bit curious how "cache snapshots" would work as rob suggested in
>     
> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation -
>     would you look at a cache snapshot diff and then snapshot (which would
>     queue updates in the background)?
>
>     otherwise, thanks for taking the initiative, rob. and looking forward to
>     seeing what comes of this that will make TC safer/more efficient.
>
>     jeremy
>
>
>
>
>
>     On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan <[email protected]>
>     wrote:
>
>     > Smaller, simpler pieces closer to the cache that do one job are far
>     > simpler to maintain, triage, and build.  I'm not a fan of trying to 
> inject
>     > a message bus in the middle of everything.
>     >
>     > Jonathan G
>     >
>     >
>     > On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]> wrote:
>     >
>     >     To throw a completely different idea out there . . . some time ago
>     > Matt Mills was talking about using Kafka as the configuration transport
>     > mechanism for Traffic Control.  The idea is to use a Kafka compacted 
> topic
>     > as the configuration source.  TO would write database updates to Kafka, 
> and
>     > the ORT equivalent would pull its configuration from Kafka.
>     >
>     >     To explain compacted topics a bit, a standard Kafka message is a key
>     > and a payload; in a compacted topics, only the most recent 
> message/payload
>     > with a particular key is kept.  As a result, reading all the messages 
> from
>     > a topic will give you the current state of what's basically a key value
>     > store, with the benefit of not doing actual mutations of data.  So a 
> cache
>     > could get the full expected configuration by reading all the existing
>     > messages on the appropriate topic, as well as get new updates to
>     > configuration by listening for new Kafka messages.
>     >
>     >     This leaves the load on the Kafka brokers, which I can assure you
>     > given recent experience, is minimal.  TO would only have the 
> responsibility
>     > of writing database updates to Kafka, ORT only would need to read
>     > individual updates (and be smart enough to know how and when to apply 
> them
>     > -- perhaps hints could be provided in the payload?).  The result is TO 
> is
>     > "pushing" updates to the caches (via Kafka) as Rawlin was proposing, and
>     > ORT could still pull the full configuration whenever necessary with no 
> hit
>     > to Postgres or TO.
>     >
>     >     Now this is obviously a radical shift (and there are no doubt other
>     > ways to implement the basic idea), but It seemed worth bringing up.
>     >
>     >     - Geoff
>     >
>     >     On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]> wrote:
>     >
>     >         +1 on this
>     >
>     >         On Jul 30, 2019, at 6:01 PM, Rawlin Peters <
>     > [email protected]> wrote:
>     >
>     >         I've been thinking for a while now that ORT's current pull-based
>     > model
>     >         of checking for queued updates is not really ideal, and I was
>     > hoping
>     >         with "ORT 2.0" that we would switch that paradigm around to 
> where
>     > TO
>     >         itself would push updates out to queued caches. That way TO 
> would
>     >         never get overloaded because we could tune the level of 
> concurrency
>     >         for pushing out updates (based on server capacity/specs), and we
>     > would
>     >         eliminate the "waiting period" between the time updates are 
> queued
>     > and
>     >         the time ORT actually updates the config on the cache.
>     >
>     >         I think cache-side config generation is a good idea in terms of
>     >         enabling canary deployments, but as CDNs continue to scale by
>     > adding
>     >         more and more caches, we might want to get out ahead of the ORT
>     >         load/waiting problem by flipping that paradigm from "pull" to
>     > "push"
>     >         somehow. Then instead of 1000 caches all asking TO the same
>     > question
>     >         and causing 1000 duplicated reads from the DB, TO would just 
> read
>     > the
>     >         one answer from the DB and send it to all the caches, further
>     > reducing
>     >         load on the DB as well. The data in the "push" request from TO 
> to
>     > ORT
>     >         2.0 would contain all the information ORT would request from the
>     > API
>     >         itself, not the actual config files.
>     >
>     >         With the API transition from Perl to Go, I think we're 
> eliminating
>     > the
>     >         Perl CPU bottleneck from TO, but the next bottleneck seems like 
> it
>     >         would be reading from the DB due to the constantly growing 
> number
>     > of
>     >         concurrent ORT requests as a CDN scales up. We should keep that 
> in
>     >         mind for whatever "ORT 2.0"-type changes we're making so that it
>     > won't
>     >         make flipping that paradigm around even harder.
>     >
>     >         - Rawlin
>     >
>     >         > On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <[email protected]>
>     > wrote:
>     >         >
>     >         >> I'm confused why this is separate from ORT.
>     >         >
>     >         > Because ORT does a lot more than just fetching config files.
>     > Rewriting all
>     >         > of ORT in Go would be considerably more work. Contrawise, if 
> we
>     > were to put
>     >         > the config generation in the ORT script itself, we would have 
> to
>     > write it
>     >         > all from scratch in Perl (the old config gen used the database
>     > directly,
>     >         > it'd still have to be rewritten) or Python. This was just the
>     > easiest path
>     >         > forward.
>     >         >
>     >         >> I feel like this logic should just be replacing the config
>     > fetching logic
>     >         > of ORT
>     >         >
>     >         > That's exactly what it does: the PR changes ORT to call this 
> app
>     > instead of
>     >         > calling Traffic Ops over HTTP:
>     >         >
>     > 
> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
>     >         >
>     >         >> Is that the eventual plan? Or does our vision of the future
>     > include this
>     >         > *and* ORT?
>     >         >
>     >         > I reserve the right to develop a strong opinion about that in
>     > the future.
>     >         >
>     >         >
>     >         > On Tue, Jul 30, 2019 at 3:17 PM ocket8888 
> <[email protected]>
>     > wrote:
>     >         >
>     >         >>> "I'm just looking for consensus that this is the right
>     > approach."
>     >         >>
>     >         >> Umm... sort of. I think moving cache configuration to the 
> cache
>     > itself
>     >         >> is a great idea,
>     >         >>
>     >         >> but I'm confused why this is separate from ORT. Like if this 
> is
>     > going to
>     >         >> be generating the
>     >         >>
>     >         >> configs and it's already right there on the server, I feel 
> like
>     > this
>     >         >> logic should just be
>     >         >>
>     >         >> replacing the config fetching logic of ORT (and personally I
>     > think a
>     >         >> neat place to try it
>     >         >>
>     >         >> out would be in ORT.py).
>     >         >>
>     >         >>
>     >         >> Is that the eventual plan? Or does our vision of the future
>     > include this
>     >         >> *and* ORT?
>     >         >>
>     >         >>
>     >         >>> On 7/30/19 2:15 PM, Robert Butts wrote:
>     >         >>> Hi all! I've been working on moving the ATS config 
> generation
>     > from
>     >         >> Traffic
>     >         >>> Ops to a standalone app alongside ORT, that queries the
>     > standard TO API
>     >         >> to
>     >         >>> generate its data. I just wanted to put it here, and get 
> some
>     > feedback,
>     >         >> to
>     >         >>> make sure the community agrees this is the right direction.
>     >         >>>
>     >         >>> There's a (very) brief spec here: (I might put more detail
>     > into it later,
>     >         >>> let me know if that's important to anyone)
>     >         >>>
>     >         >>
>     > 
> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
>     >         >>>
>     >         >>> And the Draft PR is here:
>     >         >>> https://github.com/apache/trafficcontrol/pull/3762
>     >         >>>
>     >         >>> This has a number of advantages:
>     >         >>> 1. TO is a monolith, this moves a significant amount of 
> logic
>     > out of it,
>     >         >>> into a smaller per-cache app/library that's easier to test,
>     > validate,
>     >         >>> rewrite, deploy, canary, rollback, etc.
>     >         >>> 2. Deploying cache config changes is much smaller and safer.
>     > Instead of
>     >         >>> having to deploy (and potentially roll back) TO, you can
>     > canary deploy on
>     >         >>> one cache at a time.
>     >         >>> 3. This makes TC more cache-agnostic. It moves cache config
>     > generation
>     >         >>> logic out of TO, and into an independent app/library. The 
> app
>     > (atstccfg)
>     >         >> is
>     >         >>> actually very similar to Grove's config generator
>     > (grovetccfg). This
>     >         >> makes
>     >         >>> it easier and more obvious how to write config generators 
> for
>     > other
>     >         >> proxies.
>     >         >>> 4. By using the API and putting the generator functions in a
>     > library,
>     >         >> this
>     >         >>> really gives a lot more flexibility to put the config gen
>     > anywhere you
>     >         >> want
>     >         >>> without too much work. You could easily put it in an HTTP
>     > service, or
>     >         >> even
>     >         >>> put it back in TO via a Plugin. That's not something that's
>     > really
>     >         >> possible
>     >         >>> with the existing system, generating directly from the
>     > database.
>     >         >>>
>     >         >>> Right now, I'm just looking for consensus that this is the
>     > right
>     >         >> approach.
>     >         >>> Does the community agree this is the right direction? Are
>     > there concerns?
>     >         >>> Would anyone like more details about anything in particular?
>     >         >>>
>     >         >>> Thanks,
>     >         >>>
>     >         >>
>     >
>     >
>     >
>     >
>     >
>
>

Re: [EXTERNAL] Re: Cache-Side Config Generation

Reply via email to