Re: [EXTERNAL] Re: Cache-Side Config Generation

Chris Lemmons Thu, 01 Aug 2019 21:38:25 -0700

JvD, you're spoiling the surprise ending! :D

But I do think one of the goals is to transmit subsets of the data,
not single enormous data objects. And we have to get from here to
there in a sane, supportable, and testable way.


It's been a couple of days and I haven't heard any opposition to
cache-side processing. Sounds like the community has consensus on this
point.

There're some un-handled objections outstanding on the other issues,
particularly that of inverting LoC for updates. I think we can
engineer consensus out of that issue, eventually, though. But we
should do first things first and do cache-side processing first. Maybe
we'll learn something new in the process.

On Thu, Aug 1, 2019 at 7:46 PM Jan van Doorn <[email protected]> wrote:
>
>
>
> > On Aug 1, 2019, at 18:10, Evan Zelkowitz <[email protected]> wrote:
> >
> > Also a +1 on the defining a standard common language that this library
> > will use instead of direct dependencies on TO data. I know it has been
> > bandied about for years that instead of having a direct TOdata->ATS
> > config instead we have TOdata->cache side library->generic CDN
> > language->$LocalCacheSoftwareConfig
> >
> > Would be nice to keep that dream alive. So now you would have a
> > library to handle TO data conversion to generic data, then separate
> > libraries per cache software to determine config from that data
>
> Maybe use 
> https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json
>  
> <https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json>
>  as a start for this generic CDN language?
>
> Rgds,
> JvD
>
> >
> > On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]> 
> > wrote:
> >>
> >> It sounds like:
> >> (A) everyone is +1 on cache-side config generation
> >> (B) most people are -1 on caches connecting directly to the TO DB
> >> (C) most people are +1 on TO pushing data to ORT instead of the other way 
> >> around
> >> (D) most people are -1 on using Kafka for cache configs
> >>
> >> For (A) I'm +1 on the approach (ORT sidecar), but I think if we can
> >> design the ats config gen library in a way that it just takes in a
> >> well-defined set of input data and returns strings (the generated
> >> config files) as output, it shouldn't really matter if the input data
> >> comes from direct DB queries, API calls from ORT to TO, or pushed data
> >> from TO to ORT. Whatever that library ends up looking like, it should
> >> just be a matter of getting that data from some source, and converting
> >> it into the input format expected by the library. The library should
> >> not have any dependency on external data -- only what has been passed
> >> into the library's function. Then we will get a lot of nice benefits
> >> in terms of testability and reusability.
> >>
> >> Testability:
> >> Given a set of input data, we can expect certain output in terms of
> >> the ATS config files, so it would be easy to write unit tests for.
> >> Right now that's a hard thing to do because we have to mock out every
> >> single DB call for those unit tests
> >>
> >> Reusability:
> >> The library could be shared between the TO API and this new
> >> ORT-sidecar thing, and the only difference should be that the TO API
> >> runs a set of DB queries to populate the input data whereas the
> >> ORT-sidecar thing would run a set of TO API calls to populate the
> >> input data.
> >>
> >> I know it might be more difficult to come up with that well-defined
> >> input interface than to just make DB or API calls whenever you need
> >> data in the library, but I think it would be well worth the effort for
> >> those reasons above.
> >>
> >> - Rawlin
> >>
> >> On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]> wrote:
> >>>
> >>> Well, in that spirit:
> >>>
> >>> - Cache-side processing: +1. I suppose given the fact that we wouldn't 
> >>> want
> >>> to rewrite the entire configuration generation logic at once, there's no
> >>> reason to prefer this being part of ORT immediately versus separate. 
> >>> Either
> >>> way, there's no real "extra step". Though I must admit, I am sad to see
> >>> this written in Go and not built into my baby: ORT.py
> >>>
> >>> - Invert LoC for config update: +1 because this essentially lets you do
> >>> server configuration snapshots for free, in addition to the other 
> >>> benefits,
> >>> and that's a pretty requested feature, I think. For servers that are
> >>> unreachable from TO, there's a problem, and after backing off for some set
> >>> number of retries they should probably just be marked offline with reason
> >>> e.g. "unable to contact for updates". This is a bit off-topic, though, 
> >>> imo,
> >>> because it's sort of independent from what Rob's suggesting, and that
> >>> utility isn't even really designed with that in mind - at least just yet. 
> >>> A
> >>> conversation for after it's finished.
> >>>
> >>> - Invert LoC for data selection: +1 (I think). Letting the cache server
> >>> decide what it needs to know allows you to decouple it from any particular
> >>> cache server, which would let us support things like Grove or NGinx more
> >>> easily. Or at least allow people to write their own config generators
> >>> (plugins?) for those servers. Though honestly, it's probably always going
> >>> to want the entire profile/parameter set and information for assigned
> >>> delivery services anyway, just then it'll decide what's important and
> >>> what's meaningless. This is somewhat more related, imo, since I _think_
> >>> (without looking at any code) that what Rob's thing does now is just
> >>> request the information it thinks it needs, and builds the configs with
> >>> that. I'd be interested to hear more about "fragility when out-of-sync"
> >>> thing. Or maybe I'm misunderstanding the concept? If what you mean is
> >>> something more like "the cache server selects what specific parameters it
> >>> needs" then I'm -1, but you should be able to get all of the parameters 
> >>> and
> >>> their modified dates with one call to `/profiles?name={{name}}` and then
> >>> decide from there. So the server still tells you everything that just
> >>> changed. Stuff like CG assignment/params and DS assignment/params would
> >>> likewise still need to be checked normally. So +1 for caches deciding what
> >>> API endpoints to call, -1 for big globs of "this is what was updated" 
> >>> being
> >>> pushed to the cache server, and -1 for cache servers trying to guess what
> >>> might have changed instead of checking everything.
> >>>
> >>> - Direct Database Connection: -1
> >>> - Kafka: -1
> >>>
> >>>
> >>>> "This is true, but you can also run the cache-config generator to
> >>> visually inspect them as well"
> >>>
> >>> yeah, but then you need to either expose that configuration generator's
> >>> output to the internet or you need to `ssh` into a cache server or
> >>> something to inspect it individually. I think his concern is not being 
> >>> able
> >>> to see it in Traffic Portal, which is arguably safer and much easier than
> >>> the other two options, respectively.
> >>> My response would be "but you can still see arbitrary configuration, so
> >>> maybe that just needs to be made easier to view and understand to
> >>> compensate. Like, instead of
> >>>
> >>> {
> >>> "configFile": "records.config",
> >>> "name": "CONFIG proxy.config.http.insert_response_via_str",
> >>> "secure": false,
> >>> "value": "INT 3"
> >>> },
> >>>
> >>> you see something like 'encoded Via header verbosity: none/low/med/high'"
> >>>
> >>>
> >>> On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]> wrote:
> >>>
> >>>> This is true, but you can also run the cache-config generator to
> >>>> visually inspect them as well. That makes it easy to visually inspect
> >>>> them as well as to pipe them to diff and mechanically inspect them. So
> >>>> we don't lose the ability entirely, we just move it from one place to
> >>>> another.
> >>>>
> >>>> On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey
> >>>> <[email protected]> wrote:
> >>>>>
> >>>>> A small point, but TO currently allows one to visually inspect/validate
> >>>> the generated configuration files.  I don't know how critical that
> >>>> functionality is (I personally found it invaluable when testing logging
> >>>> configuration changes), but it seems like we either have the generation
> >>>> logic in two places (ORT and TO), or we lose that ability in TO by moving
> >>>> all the logic to the cache.
> >>>>>
> >>>>> - Geoff
> >>>>>
> >>>>> On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote:
> >>>>>
> >>>>>    my feedback:
> >>>>>
> >>>>>    1. i like the idea of slimming down TO. It's gotten way too fat.
> >>>> Basically
> >>>>>    deprecating these api endpoints at some point and letting "something
> >>>> else"
> >>>>>    do the job of config generation:
> >>>>>
> >>>>>    GET /api/$version/servers/#id/configfiles/ats
> >>>>>    GET /api/$version/profiles/#id/configfiles/ats/#filename
> >>>>>    GET /api/$version/servers/#id/configfiles/ats/#filename
> >>>>>    GET /api/$version/cdns/#id/configfiles/ats/#filename
> >>>>>
> >>>>>    2.  i don't really care if that "something else" is a sidecar to ORT
> >>>> or
> >>>>>    actually ORT. will let you guys hash that out.
> >>>>>
> >>>>>    3. i like the idea of that "something else" eventually being able to
> >>>> handle
> >>>>>    a push vs. a pull as rawlin suggested.
> >>>>>
> >>>>>    4. a bit curious how "cache snapshots" would work as rob suggested in
> >>>>>
> >>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> >>>> -
> >>>>>    would you look at a cache snapshot diff and then snapshot (which
> >>>> would
> >>>>>    queue updates in the background)?
> >>>>>
> >>>>>    otherwise, thanks for taking the initiative, rob. and looking
> >>>> forward to
> >>>>>    seeing what comes of this that will make TC safer/more efficient.
> >>>>>
> >>>>>    jeremy
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>    On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan <
> >>>> [email protected]>
> >>>>>    wrote:
> >>>>>
> >>>>>> Smaller, simpler pieces closer to the cache that do one job are far
> >>>>>> simpler to maintain, triage, and build.  I'm not a fan of trying
> >>>> to inject
> >>>>>> a message bus in the middle of everything.
> >>>>>>
> >>>>>> Jonathan G
> >>>>>>
> >>>>>>
> >>>>>> On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>    To throw a completely different idea out there . . . some time
> >>>> ago
> >>>>>> Matt Mills was talking about using Kafka as the configuration
> >>>> transport
> >>>>>> mechanism for Traffic Control.  The idea is to use a Kafka
> >>>> compacted topic
> >>>>>> as the configuration source.  TO would write database updates to
> >>>> Kafka, and
> >>>>>> the ORT equivalent would pull its configuration from Kafka.
> >>>>>>
> >>>>>>    To explain compacted topics a bit, a standard Kafka message is
> >>>> a key
> >>>>>> and a payload; in a compacted topics, only the most recent
> >>>> message/payload
> >>>>>> with a particular key is kept.  As a result, reading all the
> >>>> messages from
> >>>>>> a topic will give you the current state of what's basically a key
> >>>> value
> >>>>>> store, with the benefit of not doing actual mutations of data.  So
> >>>> a cache
> >>>>>> could get the full expected configuration by reading all the
> >>>> existing
> >>>>>> messages on the appropriate topic, as well as get new updates to
> >>>>>> configuration by listening for new Kafka messages.
> >>>>>>
> >>>>>>    This leaves the load on the Kafka brokers, which I can assure
> >>>> you
> >>>>>> given recent experience, is minimal.  TO would only have the
> >>>> responsibility
> >>>>>> of writing database updates to Kafka, ORT only would need to read
> >>>>>> individual updates (and be smart enough to know how and when to
> >>>> apply them
> >>>>>> -- perhaps hints could be provided in the payload?).  The result
> >>>> is TO is
> >>>>>> "pushing" updates to the caches (via Kafka) as Rawlin was
> >>>> proposing, and
> >>>>>> ORT could still pull the full configuration whenever necessary
> >>>> with no hit
> >>>>>> to Postgres or TO.
> >>>>>>
> >>>>>>    Now this is obviously a radical shift (and there are no doubt
> >>>> other
> >>>>>> ways to implement the basic idea), but It seemed worth bringing up.
> >>>>>>
> >>>>>>    - Geoff
> >>>>>>
> >>>>>>    On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]>
> >>>> wrote:
> >>>>>>
> >>>>>>        +1 on this
> >>>>>>
> >>>>>>        On Jul 30, 2019, at 6:01 PM, Rawlin Peters <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>        I've been thinking for a while now that ORT's current
> >>>> pull-based
> >>>>>> model
> >>>>>>        of checking for queued updates is not really ideal, and I
> >>>> was
> >>>>>> hoping
> >>>>>>        with "ORT 2.0" that we would switch that paradigm around
> >>>> to where
> >>>>>> TO
> >>>>>>        itself would push updates out to queued caches. That way
> >>>> TO would
> >>>>>>        never get overloaded because we could tune the level of
> >>>> concurrency
> >>>>>>        for pushing out updates (based on server capacity/specs),
> >>>> and we
> >>>>>> would
> >>>>>>        eliminate the "waiting period" between the time updates
> >>>> are queued
> >>>>>> and
> >>>>>>        the time ORT actually updates the config on the cache.
> >>>>>>
> >>>>>>        I think cache-side config generation is a good idea in
> >>>> terms of
> >>>>>>        enabling canary deployments, but as CDNs continue to scale
> >>>> by
> >>>>>> adding
> >>>>>>        more and more caches, we might want to get out ahead of
> >>>> the ORT
> >>>>>>        load/waiting problem by flipping that paradigm from "pull"
> >>>> to
> >>>>>> "push"
> >>>>>>        somehow. Then instead of 1000 caches all asking TO the same
> >>>>>> question
> >>>>>>        and causing 1000 duplicated reads from the DB, TO would
> >>>> just read
> >>>>>> the
> >>>>>>        one answer from the DB and send it to all the caches,
> >>>> further
> >>>>>> reducing
> >>>>>>        load on the DB as well. The data in the "push" request
> >>>> from TO to
> >>>>>> ORT
> >>>>>>        2.0 would contain all the information ORT would request
> >>>> from the
> >>>>>> API
> >>>>>>        itself, not the actual config files.
> >>>>>>
> >>>>>>        With the API transition from Perl to Go, I think we're
> >>>> eliminating
> >>>>>> the
> >>>>>>        Perl CPU bottleneck from TO, but the next bottleneck seems
> >>>> like it
> >>>>>>        would be reading from the DB due to the constantly growing
> >>>> number
> >>>>>> of
> >>>>>>        concurrent ORT requests as a CDN scales up. We should keep
> >>>> that in
> >>>>>>        mind for whatever "ORT 2.0"-type changes we're making so
> >>>> that it
> >>>>>> won't
> >>>>>>        make flipping that paradigm around even harder.
> >>>>>>
> >>>>>>        - Rawlin
> >>>>>>
> >>>>>>> On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <
> >>>> [email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> I'm confused why this is separate from ORT.
> >>>>>>>
> >>>>>>> Because ORT does a lot more than just fetching config
> >>>> files.
> >>>>>> Rewriting all
> >>>>>>> of ORT in Go would be considerably more work.
> >>>> Contrawise, if we
> >>>>>> were to put
> >>>>>>> the config generation in the ORT script itself, we would
> >>>> have to
> >>>>>> write it
> >>>>>>> all from scratch in Perl (the old config gen used the
> >>>> database
> >>>>>> directly,
> >>>>>>> it'd still have to be rewritten) or Python. This was
> >>>> just the
> >>>>>> easiest path
> >>>>>>> forward.
> >>>>>>>
> >>>>>>>> I feel like this logic should just be replacing the
> >>>> config
> >>>>>> fetching logic
> >>>>>>> of ORT
> >>>>>>>
> >>>>>>> That's exactly what it does: the PR changes ORT to call
> >>>> this app
> >>>>>> instead of
> >>>>>>> calling Traffic Ops over HTTP:
> >>>>>>>
> >>>>>>
> >>>> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
> >>>>>>>
> >>>>>>>> Is that the eventual plan? Or does our vision of the
> >>>> future
> >>>>>> include this
> >>>>>>> *and* ORT?
> >>>>>>>
> >>>>>>> I reserve the right to develop a strong opinion about
> >>>> that in
> >>>>>> the future.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <
> >>>> [email protected]>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>>> "I'm just looking for consensus that this is the right
> >>>>>> approach."
> >>>>>>>>
> >>>>>>>> Umm... sort of. I think moving cache configuration to
> >>>> the cache
> >>>>>> itself
> >>>>>>>> is a great idea,
> >>>>>>>>
> >>>>>>>> but I'm confused why this is separate from ORT. Like if
> >>>> this is
> >>>>>> going to
> >>>>>>>> be generating the
> >>>>>>>>
> >>>>>>>> configs and it's already right there on the server, I
> >>>> feel like
> >>>>>> this
> >>>>>>>> logic should just be
> >>>>>>>>
> >>>>>>>> replacing the config fetching logic of ORT (and
> >>>> personally I
> >>>>>> think a
> >>>>>>>> neat place to try it
> >>>>>>>>
> >>>>>>>> out would be in ORT.py).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Is that the eventual plan? Or does our vision of the
> >>>> future
> >>>>>> include this
> >>>>>>>> *and* ORT?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 7/30/19 2:15 PM, Robert Butts wrote:
> >>>>>>>>> Hi all! I've been working on moving the ATS config
> >>>> generation
> >>>>>> from
> >>>>>>>> Traffic
> >>>>>>>>> Ops to a standalone app alongside ORT, that queries the
> >>>>>> standard TO API
> >>>>>>>> to
> >>>>>>>>> generate its data. I just wanted to put it here, and
> >>>> get some
> >>>>>> feedback,
> >>>>>>>> to
> >>>>>>>>> make sure the community agrees this is the right
> >>>> direction.
> >>>>>>>>>
> >>>>>>>>> There's a (very) brief spec here: (I might put more
> >>>> detail
> >>>>>> into it later,
> >>>>>>>>> let me know if that's important to anyone)
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> >>>>>>>>>
> >>>>>>>>> And the Draft PR is here:
> >>>>>>>>> https://github.com/apache/trafficcontrol/pull/3762
> >>>>>>>>>
> >>>>>>>>> This has a number of advantages:
> >>>>>>>>> 1. TO is a monolith, this moves a significant amount
> >>>> of logic
> >>>>>> out of it,
> >>>>>>>>> into a smaller per-cache app/library that's easier to
> >>>> test,
> >>>>>> validate,
> >>>>>>>>> rewrite, deploy, canary, rollback, etc.
> >>>>>>>>> 2. Deploying cache config changes is much smaller and
> >>>> safer.
> >>>>>> Instead of
> >>>>>>>>> having to deploy (and potentially roll back) TO, you
> >>>> can
> >>>>>> canary deploy on
> >>>>>>>>> one cache at a time.
> >>>>>>>>> 3. This makes TC more cache-agnostic. It moves cache
> >>>> config
> >>>>>> generation
> >>>>>>>>> logic out of TO, and into an independent app/library.
> >>>> The app
> >>>>>> (atstccfg)
> >>>>>>>> is
> >>>>>>>>> actually very similar to Grove's config generator
> >>>>>> (grovetccfg). This
> >>>>>>>> makes
> >>>>>>>>> it easier and more obvious how to write config
> >>>> generators for
> >>>>>> other
> >>>>>>>> proxies.
> >>>>>>>>> 4. By using the API and putting the generator
> >>>> functions in a
> >>>>>> library,
> >>>>>>>> this
> >>>>>>>>> really gives a lot more flexibility to put the config
> >>>> gen
> >>>>>> anywhere you
> >>>>>>>> want
> >>>>>>>>> without too much work. You could easily put it in an
> >>>> HTTP
> >>>>>> service, or
> >>>>>>>> even
> >>>>>>>>> put it back in TO via a Plugin. That's not something
> >>>> that's
> >>>>>> really
> >>>>>>>> possible
> >>>>>>>>> with the existing system, generating directly from the
> >>>>>> database.
> >>>>>>>>>
> >>>>>>>>> Right now, I'm just looking for consensus that this is
> >>>> the
> >>>>>> right
> >>>>>>>> approach.
> >>>>>>>>> Does the community agree this is the right direction?
> >>>> Are
> >>>>>> there concerns?
> >>>>>>>>> Would anyone like more details about anything in
> >>>> particular?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
>

Re: [EXTERNAL] Re: Cache-Side Config Generation

Reply via email to