Re: [EXTERNAL] Re: Cache-Side Config Generation

Dave Neuman Fri, 02 Aug 2019 15:29:14 -0700

The original intention of this thread was cache-side config generation, we
should take other conversations to other threads if we think now is the
time to have them.
First of all, thanks for putting the thought and time into this.  I do
think large changes like this might be better first discussed and
requirements understood before we write code.  I realize that it's hard to
actually accomplish that with so many different opinions, but I do think
it's the right thing to do.  Having discussions up front lets us think
through things as a group and in my experience often leads to a better
solution.  Maybe we can experiment with talking it out over a google
hangout and then posting our decisions on list?  Anyway, I think this is a
good idea as long as we are sticking with the ORT paradigm and I like the
upsides of it.  Since there is a PR already, some things I would like to
see added are A) tests - we should not be developing any new functionality
without tests, B) documentation - this changes the way the system works and
we need to document how it changes things, C) we need to figure out how to
make it easy for normal end-users without ssh access to be able to see our
config files.  This functionality is important and we should not take it
away from our users.  I think if we address those three things this
solution will be in a good place.  Again, I think this is the best solution
with the assumption that we are sticking with ORT.  If we decided to go in
a different direction, like Voluspa, then a new conversation is warranted.


Thanks,
Dave


On Thu, Aug 1, 2019 at 10:37 PM Chris Lemmons <[email protected]> wrote:

> JvD, you're spoiling the surprise ending! :D
>
> But I do think one of the goals is to transmit subsets of the data,
> not single enormous data objects. And we have to get from here to
> there in a sane, supportable, and testable way.
>
> It's been a couple of days and I haven't heard any opposition to
> cache-side processing. Sounds like the community has consensus on this
> point.
>
> There're some un-handled objections outstanding on the other issues,
> particularly that of inverting LoC for updates. I think we can
> engineer consensus out of that issue, eventually, though. But we
> should do first things first and do cache-side processing first. Maybe
> we'll learn something new in the process.
>
> On Thu, Aug 1, 2019 at 7:46 PM Jan van Doorn <[email protected]> wrote:
> >
> >
> >
> > > On Aug 1, 2019, at 18:10, Evan Zelkowitz <[email protected]> wrote:
> > >
> > > Also a +1 on the defining a standard common language that this library
> > > will use instead of direct dependencies on TO data. I know it has been
> > > bandied about for years that instead of having a direct TOdata->ATS
> > > config instead we have TOdata->cache side library->generic CDN
> > > language->$LocalCacheSoftwareConfig
> > >
> > > Would be nice to keep that dream alive. So now you would have a
> > > library to handle TO data conversion to generic data, then separate
> > > libraries per cache software to determine config from that data
> >
> > Maybe use
> https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json
> <
> https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json>
> as a start for this generic CDN language?
> >
> > Rgds,
> > JvD
> >
> > >
> > > On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]>
> wrote:
> > >>
> > >> It sounds like:
> > >> (A) everyone is +1 on cache-side config generation
> > >> (B) most people are -1 on caches connecting directly to the TO DB
> > >> (C) most people are +1 on TO pushing data to ORT instead of the other
> way around
> > >> (D) most people are -1 on using Kafka for cache configs
> > >>
> > >> For (A) I'm +1 on the approach (ORT sidecar), but I think if we can
> > >> design the ats config gen library in a way that it just takes in a
> > >> well-defined set of input data and returns strings (the generated
> > >> config files) as output, it shouldn't really matter if the input data
> > >> comes from direct DB queries, API calls from ORT to TO, or pushed data
> > >> from TO to ORT. Whatever that library ends up looking like, it should
> > >> just be a matter of getting that data from some source, and converting
> > >> it into the input format expected by the library. The library should
> > >> not have any dependency on external data -- only what has been passed
> > >> into the library's function. Then we will get a lot of nice benefits
> > >> in terms of testability and reusability.
> > >>
> > >> Testability:
> > >> Given a set of input data, we can expect certain output in terms of
> > >> the ATS config files, so it would be easy to write unit tests for.
> > >> Right now that's a hard thing to do because we have to mock out every
> > >> single DB call for those unit tests
> > >>
> > >> Reusability:
> > >> The library could be shared between the TO API and this new
> > >> ORT-sidecar thing, and the only difference should be that the TO API
> > >> runs a set of DB queries to populate the input data whereas the
> > >> ORT-sidecar thing would run a set of TO API calls to populate the
> > >> input data.
> > >>
> > >> I know it might be more difficult to come up with that well-defined
> > >> input interface than to just make DB or API calls whenever you need
> > >> data in the library, but I think it would be well worth the effort for
> > >> those reasons above.
> > >>
> > >> - Rawlin
> > >>
> > >> On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]>
> wrote:
> > >>>
> > >>> Well, in that spirit:
> > >>>
> > >>> - Cache-side processing: +1. I suppose given the fact that we
> wouldn't want
> > >>> to rewrite the entire configuration generation logic at once,
> there's no
> > >>> reason to prefer this being part of ORT immediately versus separate.
> Either
> > >>> way, there's no real "extra step". Though I must admit, I am sad to
> see
> > >>> this written in Go and not built into my baby: ORT.py
> > >>>
> > >>> - Invert LoC for config update: +1 because this essentially lets you
> do
> > >>> server configuration snapshots for free, in addition to the other
> benefits,
> > >>> and that's a pretty requested feature, I think. For servers that are
> > >>> unreachable from TO, there's a problem, and after backing off for
> some set
> > >>> number of retries they should probably just be marked offline with
> reason
> > >>> e.g. "unable to contact for updates". This is a bit off-topic,
> though, imo,
> > >>> because it's sort of independent from what Rob's suggesting, and that
> > >>> utility isn't even really designed with that in mind - at least just
> yet. A
> > >>> conversation for after it's finished.
> > >>>
> > >>> - Invert LoC for data selection: +1 (I think). Letting the cache
> server
> > >>> decide what it needs to know allows you to decouple it from any
> particular
> > >>> cache server, which would let us support things like Grove or NGinx
> more
> > >>> easily. Or at least allow people to write their own config generators
> > >>> (plugins?) for those servers. Though honestly, it's probably always
> going
> > >>> to want the entire profile/parameter set and information for assigned
> > >>> delivery services anyway, just then it'll decide what's important and
> > >>> what's meaningless. This is somewhat more related, imo, since I
> _think_
> > >>> (without looking at any code) that what Rob's thing does now is just
> > >>> request the information it thinks it needs, and builds the configs
> with
> > >>> that. I'd be interested to hear more about "fragility when
> out-of-sync"
> > >>> thing. Or maybe I'm misunderstanding the concept? If what you mean is
> > >>> something more like "the cache server selects what specific
> parameters it
> > >>> needs" then I'm -1, but you should be able to get all of the
> parameters and
> > >>> their modified dates with one call to `/profiles?name={{name}}` and
> then
> > >>> decide from there. So the server still tells you everything that just
> > >>> changed. Stuff like CG assignment/params and DS assignment/params
> would
> > >>> likewise still need to be checked normally. So +1 for caches
> deciding what
> > >>> API endpoints to call, -1 for big globs of "this is what was
> updated" being
> > >>> pushed to the cache server, and -1 for cache servers trying to guess
> what
> > >>> might have changed instead of checking everything.
> > >>>
> > >>> - Direct Database Connection: -1
> > >>> - Kafka: -1
> > >>>
> > >>>
> > >>>> "This is true, but you can also run the cache-config generator to
> > >>> visually inspect them as well"
> > >>>
> > >>> yeah, but then you need to either expose that configuration
> generator's
> > >>> output to the internet or you need to `ssh` into a cache server or
> > >>> something to inspect it individually. I think his concern is not
> being able
> > >>> to see it in Traffic Portal, which is arguably safer and much easier
> than
> > >>> the other two options, respectively.
> > >>> My response would be "but you can still see arbitrary configuration,
> so
> > >>> maybe that just needs to be made easier to view and understand to
> > >>> compensate. Like, instead of
> > >>>
> > >>> {
> > >>> "configFile": "records.config",
> > >>> "name": "CONFIG proxy.config.http.insert_response_via_str",
> > >>> "secure": false,
> > >>> "value": "INT 3"
> > >>> },
> > >>>
> > >>> you see something like 'encoded Via header verbosity:
> none/low/med/high'"
> > >>>
> > >>>
> > >>> On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]>
> wrote:
> > >>>
> > >>>> This is true, but you can also run the cache-config generator to
> > >>>> visually inspect them as well. That makes it easy to visually
> inspect
> > >>>> them as well as to pipe them to diff and mechanically inspect them.
> So
> > >>>> we don't lose the ability entirely, we just move it from one place
> to
> > >>>> another.
> > >>>>
> > >>>> On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey
> > >>>> <[email protected]> wrote:
> > >>>>>
> > >>>>> A small point, but TO currently allows one to visually
> inspect/validate
> > >>>> the generated configuration files.  I don't know how critical that
> > >>>> functionality is (I personally found it invaluable when testing
> logging
> > >>>> configuration changes), but it seems like we either have the
> generation
> > >>>> logic in two places (ORT and TO), or we lose that ability in TO by
> moving
> > >>>> all the logic to the cache.
> > >>>>>
> > >>>>> - Geoff
> > >>>>>
> > >>>>> On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]>
> wrote:
> > >>>>>
> > >>>>>    my feedback:
> > >>>>>
> > >>>>>    1. i like the idea of slimming down TO. It's gotten way too fat.
> > >>>> Basically
> > >>>>>    deprecating these api endpoints at some point and letting
> "something
> > >>>> else"
> > >>>>>    do the job of config generation:
> > >>>>>
> > >>>>>    GET /api/$version/servers/#id/configfiles/ats
> > >>>>>    GET /api/$version/profiles/#id/configfiles/ats/#filename
> > >>>>>    GET /api/$version/servers/#id/configfiles/ats/#filename
> > >>>>>    GET /api/$version/cdns/#id/configfiles/ats/#filename
> > >>>>>
> > >>>>>    2.  i don't really care if that "something else" is a sidecar
> to ORT
> > >>>> or
> > >>>>>    actually ORT. will let you guys hash that out.
> > >>>>>
> > >>>>>    3. i like the idea of that "something else" eventually being
> able to
> > >>>> handle
> > >>>>>    a push vs. a pull as rawlin suggested.
> > >>>>>
> > >>>>>    4. a bit curious how "cache snapshots" would work as rob
> suggested in
> > >>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> > >>>> -
> > >>>>>    would you look at a cache snapshot diff and then snapshot (which
> > >>>> would
> > >>>>>    queue updates in the background)?
> > >>>>>
> > >>>>>    otherwise, thanks for taking the initiative, rob. and looking
> > >>>> forward to
> > >>>>>    seeing what comes of this that will make TC safer/more
> efficient.
> > >>>>>
> > >>>>>    jeremy
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>    On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan <
> > >>>> [email protected]>
> > >>>>>    wrote:
> > >>>>>
> > >>>>>> Smaller, simpler pieces closer to the cache that do one job are
> far
> > >>>>>> simpler to maintain, triage, and build.  I'm not a fan of trying
> > >>>> to inject
> > >>>>>> a message bus in the middle of everything.
> > >>>>>>
> > >>>>>> Jonathan G
> > >>>>>>
> > >>>>>>
> > >>>>>> On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>    To throw a completely different idea out there . . . some time
> > >>>> ago
> > >>>>>> Matt Mills was talking about using Kafka as the configuration
> > >>>> transport
> > >>>>>> mechanism for Traffic Control.  The idea is to use a Kafka
> > >>>> compacted topic
> > >>>>>> as the configuration source.  TO would write database updates to
> > >>>> Kafka, and
> > >>>>>> the ORT equivalent would pull its configuration from Kafka.
> > >>>>>>
> > >>>>>>    To explain compacted topics a bit, a standard Kafka message is
> > >>>> a key
> > >>>>>> and a payload; in a compacted topics, only the most recent
> > >>>> message/payload
> > >>>>>> with a particular key is kept.  As a result, reading all the
> > >>>> messages from
> > >>>>>> a topic will give you the current state of what's basically a key
> > >>>> value
> > >>>>>> store, with the benefit of not doing actual mutations of data.  So
> > >>>> a cache
> > >>>>>> could get the full expected configuration by reading all the
> > >>>> existing
> > >>>>>> messages on the appropriate topic, as well as get new updates to
> > >>>>>> configuration by listening for new Kafka messages.
> > >>>>>>
> > >>>>>>    This leaves the load on the Kafka brokers, which I can assure
> > >>>> you
> > >>>>>> given recent experience, is minimal.  TO would only have the
> > >>>> responsibility
> > >>>>>> of writing database updates to Kafka, ORT only would need to read
> > >>>>>> individual updates (and be smart enough to know how and when to
> > >>>> apply them
> > >>>>>> -- perhaps hints could be provided in the payload?).  The result
> > >>>> is TO is
> > >>>>>> "pushing" updates to the caches (via Kafka) as Rawlin was
> > >>>> proposing, and
> > >>>>>> ORT could still pull the full configuration whenever necessary
> > >>>> with no hit
> > >>>>>> to Postgres or TO.
> > >>>>>>
> > >>>>>>    Now this is obviously a radical shift (and there are no doubt
> > >>>> other
> > >>>>>> ways to implement the basic idea), but It seemed worth bringing
> up.
> > >>>>>>
> > >>>>>>    - Geoff
> > >>>>>>
> > >>>>>>    On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>        +1 on this
> > >>>>>>
> > >>>>>>        On Jul 30, 2019, at 6:01 PM, Rawlin Peters <
> > >>>>>> [email protected]> wrote:
> > >>>>>>
> > >>>>>>        I've been thinking for a while now that ORT's current
> > >>>> pull-based
> > >>>>>> model
> > >>>>>>        of checking for queued updates is not really ideal, and I
> > >>>> was
> > >>>>>> hoping
> > >>>>>>        with "ORT 2.0" that we would switch that paradigm around
> > >>>> to where
> > >>>>>> TO
> > >>>>>>        itself would push updates out to queued caches. That way
> > >>>> TO would
> > >>>>>>        never get overloaded because we could tune the level of
> > >>>> concurrency
> > >>>>>>        for pushing out updates (based on server capacity/specs),
> > >>>> and we
> > >>>>>> would
> > >>>>>>        eliminate the "waiting period" between the time updates
> > >>>> are queued
> > >>>>>> and
> > >>>>>>        the time ORT actually updates the config on the cache.
> > >>>>>>
> > >>>>>>        I think cache-side config generation is a good idea in
> > >>>> terms of
> > >>>>>>        enabling canary deployments, but as CDNs continue to scale
> > >>>> by
> > >>>>>> adding
> > >>>>>>        more and more caches, we might want to get out ahead of
> > >>>> the ORT
> > >>>>>>        load/waiting problem by flipping that paradigm from "pull"
> > >>>> to
> > >>>>>> "push"
> > >>>>>>        somehow. Then instead of 1000 caches all asking TO the same
> > >>>>>> question
> > >>>>>>        and causing 1000 duplicated reads from the DB, TO would
> > >>>> just read
> > >>>>>> the
> > >>>>>>        one answer from the DB and send it to all the caches,
> > >>>> further
> > >>>>>> reducing
> > >>>>>>        load on the DB as well. The data in the "push" request
> > >>>> from TO to
> > >>>>>> ORT
> > >>>>>>        2.0 would contain all the information ORT would request
> > >>>> from the
> > >>>>>> API
> > >>>>>>        itself, not the actual config files.
> > >>>>>>
> > >>>>>>        With the API transition from Perl to Go, I think we're
> > >>>> eliminating
> > >>>>>> the
> > >>>>>>        Perl CPU bottleneck from TO, but the next bottleneck seems
> > >>>> like it
> > >>>>>>        would be reading from the DB due to the constantly growing
> > >>>> number
> > >>>>>> of
> > >>>>>>        concurrent ORT requests as a CDN scales up. We should keep
> > >>>> that in
> > >>>>>>        mind for whatever "ORT 2.0"-type changes we're making so
> > >>>> that it
> > >>>>>> won't
> > >>>>>>        make flipping that paradigm around even harder.
> > >>>>>>
> > >>>>>>        - Rawlin
> > >>>>>>
> > >>>>>>> On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <
> > >>>> [email protected]>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I'm confused why this is separate from ORT.
> > >>>>>>>
> > >>>>>>> Because ORT does a lot more than just fetching config
> > >>>> files.
> > >>>>>> Rewriting all
> > >>>>>>> of ORT in Go would be considerably more work.
> > >>>> Contrawise, if we
> > >>>>>> were to put
> > >>>>>>> the config generation in the ORT script itself, we would
> > >>>> have to
> > >>>>>> write it
> > >>>>>>> all from scratch in Perl (the old config gen used the
> > >>>> database
> > >>>>>> directly,
> > >>>>>>> it'd still have to be rewritten) or Python. This was
> > >>>> just the
> > >>>>>> easiest path
> > >>>>>>> forward.
> > >>>>>>>
> > >>>>>>>> I feel like this logic should just be replacing the
> > >>>> config
> > >>>>>> fetching logic
> > >>>>>>> of ORT
> > >>>>>>>
> > >>>>>>> That's exactly what it does: the PR changes ORT to call
> > >>>> this app
> > >>>>>> instead of
> > >>>>>>> calling Traffic Ops over HTTP:
> > >>>>>>>
> > >>>>>>
> > >>>>
> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
> > >>>>>>>
> > >>>>>>>> Is that the eventual plan? Or does our vision of the
> > >>>> future
> > >>>>>> include this
> > >>>>>>> *and* ORT?
> > >>>>>>>
> > >>>>>>> I reserve the right to develop a strong opinion about
> > >>>> that in
> > >>>>>> the future.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <
> > >>>> [email protected]>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>> "I'm just looking for consensus that this is the right
> > >>>>>> approach."
> > >>>>>>>>
> > >>>>>>>> Umm... sort of. I think moving cache configuration to
> > >>>> the cache
> > >>>>>> itself
> > >>>>>>>> is a great idea,
> > >>>>>>>>
> > >>>>>>>> but I'm confused why this is separate from ORT. Like if
> > >>>> this is
> > >>>>>> going to
> > >>>>>>>> be generating the
> > >>>>>>>>
> > >>>>>>>> configs and it's already right there on the server, I
> > >>>> feel like
> > >>>>>> this
> > >>>>>>>> logic should just be
> > >>>>>>>>
> > >>>>>>>> replacing the config fetching logic of ORT (and
> > >>>> personally I
> > >>>>>> think a
> > >>>>>>>> neat place to try it
> > >>>>>>>>
> > >>>>>>>> out would be in ORT.py).
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Is that the eventual plan? Or does our vision of the
> > >>>> future
> > >>>>>> include this
> > >>>>>>>> *and* ORT?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On 7/30/19 2:15 PM, Robert Butts wrote:
> > >>>>>>>>> Hi all! I've been working on moving the ATS config
> > >>>> generation
> > >>>>>> from
> > >>>>>>>> Traffic
> > >>>>>>>>> Ops to a standalone app alongside ORT, that queries the
> > >>>>>> standard TO API
> > >>>>>>>> to
> > >>>>>>>>> generate its data. I just wanted to put it here, and
> > >>>> get some
> > >>>>>> feedback,
> > >>>>>>>> to
> > >>>>>>>>> make sure the community agrees this is the right
> > >>>> direction.
> > >>>>>>>>>
> > >>>>>>>>> There's a (very) brief spec here: (I might put more
> > >>>> detail
> > >>>>>> into it later,
> > >>>>>>>>> let me know if that's important to anyone)
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> > >>>>>>>>>
> > >>>>>>>>> And the Draft PR is here:
> > >>>>>>>>> https://github.com/apache/trafficcontrol/pull/3762
> > >>>>>>>>>
> > >>>>>>>>> This has a number of advantages:
> > >>>>>>>>> 1. TO is a monolith, this moves a significant amount
> > >>>> of logic
> > >>>>>> out of it,
> > >>>>>>>>> into a smaller per-cache app/library that's easier to
> > >>>> test,
> > >>>>>> validate,
> > >>>>>>>>> rewrite, deploy, canary, rollback, etc.
> > >>>>>>>>> 2. Deploying cache config changes is much smaller and
> > >>>> safer.
> > >>>>>> Instead of
> > >>>>>>>>> having to deploy (and potentially roll back) TO, you
> > >>>> can
> > >>>>>> canary deploy on
> > >>>>>>>>> one cache at a time.
> > >>>>>>>>> 3. This makes TC more cache-agnostic. It moves cache
> > >>>> config
> > >>>>>> generation
> > >>>>>>>>> logic out of TO, and into an independent app/library.
> > >>>> The app
> > >>>>>> (atstccfg)
> > >>>>>>>> is
> > >>>>>>>>> actually very similar to Grove's config generator
> > >>>>>> (grovetccfg). This
> > >>>>>>>> makes
> > >>>>>>>>> it easier and more obvious how to write config
> > >>>> generators for
> > >>>>>> other
> > >>>>>>>> proxies.
> > >>>>>>>>> 4. By using the API and putting the generator
> > >>>> functions in a
> > >>>>>> library,
> > >>>>>>>> this
> > >>>>>>>>> really gives a lot more flexibility to put the config
> > >>>> gen
> > >>>>>> anywhere you
> > >>>>>>>> want
> > >>>>>>>>> without too much work. You could easily put it in an
> > >>>> HTTP
> > >>>>>> service, or
> > >>>>>>>> even
> > >>>>>>>>> put it back in TO via a Plugin. That's not something
> > >>>> that's
> > >>>>>> really
> > >>>>>>>> possible
> > >>>>>>>>> with the existing system, generating directly from the
> > >>>>>> database.
> > >>>>>>>>>
> > >>>>>>>>> Right now, I'm just looking for consensus that this is
> > >>>> the
> > >>>>>> right
> > >>>>>>>> approach.
> > >>>>>>>>> Does the community agree this is the right direction?
> > >>>> Are
> > >>>>>> there concerns?
> > >>>>>>>>> Would anyone like more details about anything in
> > >>>> particular?
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> >
>

Re: [EXTERNAL] Re: Cache-Side Config Generation

Reply via email to