Re: [EXTERNAL] Re: Cache-Side Config Generation

Jan van Doorn Thu, 01 Aug 2019 18:47:08 -0700


> On Aug 1, 2019, at 18:10, Evan Zelkowitz <[email protected]> wrote:
> 
> Also a +1 on the defining a standard common language that this library
> will use instead of direct dependencies on TO data. I know it has been
> bandied about for years that instead of having a direct TOdata->ATS
> config instead we have TOdata->cache side library->generic CDN
> language->$LocalCacheSoftwareConfig
> 
> Would be nice to keep that dream alive. So now you would have a
> library to handle TO data conversion to generic data, then separate
> libraries per cache software to determine config from that data


Maybe use 
https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json
 
<https://github.com/apache/trafficserver/blob/voluspa/tools/voluspa/schema_v1.json>
 as a start for this generic CDN language?

Rgds,
JvD

> 
> On Thu, Aug 1, 2019 at 4:17 PM Rawlin Peters <[email protected]> wrote:
>> 
>> It sounds like:
>> (A) everyone is +1 on cache-side config generation
>> (B) most people are -1 on caches connecting directly to the TO DB
>> (C) most people are +1 on TO pushing data to ORT instead of the other way 
>> around
>> (D) most people are -1 on using Kafka for cache configs
>> 
>> For (A) I'm +1 on the approach (ORT sidecar), but I think if we can
>> design the ats config gen library in a way that it just takes in a
>> well-defined set of input data and returns strings (the generated
>> config files) as output, it shouldn't really matter if the input data
>> comes from direct DB queries, API calls from ORT to TO, or pushed data
>> from TO to ORT. Whatever that library ends up looking like, it should
>> just be a matter of getting that data from some source, and converting
>> it into the input format expected by the library. The library should
>> not have any dependency on external data -- only what has been passed
>> into the library's function. Then we will get a lot of nice benefits
>> in terms of testability and reusability.
>> 
>> Testability:
>> Given a set of input data, we can expect certain output in terms of
>> the ATS config files, so it would be easy to write unit tests for.
>> Right now that's a hard thing to do because we have to mock out every
>> single DB call for those unit tests
>> 
>> Reusability:
>> The library could be shared between the TO API and this new
>> ORT-sidecar thing, and the only difference should be that the TO API
>> runs a set of DB queries to populate the input data whereas the
>> ORT-sidecar thing would run a set of TO API calls to populate the
>> input data.
>> 
>> I know it might be more difficult to come up with that well-defined
>> input interface than to just make DB or API calls whenever you need
>> data in the library, but I think it would be well worth the effort for
>> those reasons above.
>> 
>> - Rawlin
>> 
>> On Thu, Aug 1, 2019 at 7:28 AM ocket 8888 <[email protected]> wrote:
>>> 
>>> Well, in that spirit:
>>> 
>>> - Cache-side processing: +1. I suppose given the fact that we wouldn't want
>>> to rewrite the entire configuration generation logic at once, there's no
>>> reason to prefer this being part of ORT immediately versus separate. Either
>>> way, there's no real "extra step". Though I must admit, I am sad to see
>>> this written in Go and not built into my baby: ORT.py
>>> 
>>> - Invert LoC for config update: +1 because this essentially lets you do
>>> server configuration snapshots for free, in addition to the other benefits,
>>> and that's a pretty requested feature, I think. For servers that are
>>> unreachable from TO, there's a problem, and after backing off for some set
>>> number of retries they should probably just be marked offline with reason
>>> e.g. "unable to contact for updates". This is a bit off-topic, though, imo,
>>> because it's sort of independent from what Rob's suggesting, and that
>>> utility isn't even really designed with that in mind - at least just yet. A
>>> conversation for after it's finished.
>>> 
>>> - Invert LoC for data selection: +1 (I think). Letting the cache server
>>> decide what it needs to know allows you to decouple it from any particular
>>> cache server, which would let us support things like Grove or NGinx more
>>> easily. Or at least allow people to write their own config generators
>>> (plugins?) for those servers. Though honestly, it's probably always going
>>> to want the entire profile/parameter set and information for assigned
>>> delivery services anyway, just then it'll decide what's important and
>>> what's meaningless. This is somewhat more related, imo, since I _think_
>>> (without looking at any code) that what Rob's thing does now is just
>>> request the information it thinks it needs, and builds the configs with
>>> that. I'd be interested to hear more about "fragility when out-of-sync"
>>> thing. Or maybe I'm misunderstanding the concept? If what you mean is
>>> something more like "the cache server selects what specific parameters it
>>> needs" then I'm -1, but you should be able to get all of the parameters and
>>> their modified dates with one call to `/profiles?name={{name}}` and then
>>> decide from there. So the server still tells you everything that just
>>> changed. Stuff like CG assignment/params and DS assignment/params would
>>> likewise still need to be checked normally. So +1 for caches deciding what
>>> API endpoints to call, -1 for big globs of "this is what was updated" being
>>> pushed to the cache server, and -1 for cache servers trying to guess what
>>> might have changed instead of checking everything.
>>> 
>>> - Direct Database Connection: -1
>>> - Kafka: -1
>>> 
>>> 
>>>> "This is true, but you can also run the cache-config generator to
>>> visually inspect them as well"
>>> 
>>> yeah, but then you need to either expose that configuration generator's
>>> output to the internet or you need to `ssh` into a cache server or
>>> something to inspect it individually. I think his concern is not being able
>>> to see it in Traffic Portal, which is arguably safer and much easier than
>>> the other two options, respectively.
>>> My response would be "but you can still see arbitrary configuration, so
>>> maybe that just needs to be made easier to view and understand to
>>> compensate. Like, instead of
>>> 
>>> {
>>> "configFile": "records.config",
>>> "name": "CONFIG proxy.config.http.insert_response_via_str",
>>> "secure": false,
>>> "value": "INT 3"
>>> },
>>> 
>>> you see something like 'encoded Via header verbosity: none/low/med/high'"
>>> 
>>> 
>>> On Wed, Jul 31, 2019 at 11:01 AM Chris Lemmons <[email protected]> wrote:
>>> 
>>>> This is true, but you can also run the cache-config generator to
>>>> visually inspect them as well. That makes it easy to visually inspect
>>>> them as well as to pipe them to diff and mechanically inspect them. So
>>>> we don't lose the ability entirely, we just move it from one place to
>>>> another.
>>>> 
>>>> On Wed, Jul 31, 2019 at 10:47 AM Genz, Geoffrey
>>>> <[email protected]> wrote:
>>>>> 
>>>>> A small point, but TO currently allows one to visually inspect/validate
>>>> the generated configuration files.  I don't know how critical that
>>>> functionality is (I personally found it invaluable when testing logging
>>>> configuration changes), but it seems like we either have the generation
>>>> logic in two places (ORT and TO), or we lose that ability in TO by moving
>>>> all the logic to the cache.
>>>>> 
>>>>> - Geoff
>>>>> 
>>>>> On 7/31/19, 10:33 AM, "Jeremy Mitchell" <[email protected]> wrote:
>>>>> 
>>>>>    my feedback:
>>>>> 
>>>>>    1. i like the idea of slimming down TO. It's gotten way too fat.
>>>> Basically
>>>>>    deprecating these api endpoints at some point and letting "something
>>>> else"
>>>>>    do the job of config generation:
>>>>> 
>>>>>    GET /api/$version/servers/#id/configfiles/ats
>>>>>    GET /api/$version/profiles/#id/configfiles/ats/#filename
>>>>>    GET /api/$version/servers/#id/configfiles/ats/#filename
>>>>>    GET /api/$version/cdns/#id/configfiles/ats/#filename
>>>>> 
>>>>>    2.  i don't really care if that "something else" is a sidecar to ORT
>>>> or
>>>>>    actually ORT. will let you guys hash that out.
>>>>> 
>>>>>    3. i like the idea of that "something else" eventually being able to
>>>> handle
>>>>>    a push vs. a pull as rawlin suggested.
>>>>> 
>>>>>    4. a bit curious how "cache snapshots" would work as rob suggested in
>>>>> 
>>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
>>>> -
>>>>>    would you look at a cache snapshot diff and then snapshot (which
>>>> would
>>>>>    queue updates in the background)?
>>>>> 
>>>>>    otherwise, thanks for taking the initiative, rob. and looking
>>>> forward to
>>>>>    seeing what comes of this that will make TC safer/more efficient.
>>>>> 
>>>>>    jeremy
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>    On Wed, Jul 31, 2019 at 9:20 AM Gray, Jonathan <
>>>> [email protected]>
>>>>>    wrote:
>>>>> 
>>>>>> Smaller, simpler pieces closer to the cache that do one job are far
>>>>>> simpler to maintain, triage, and build.  I'm not a fan of trying
>>>> to inject
>>>>>> a message bus in the middle of everything.
>>>>>> 
>>>>>> Jonathan G
>>>>>> 
>>>>>> 
>>>>>> On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>    To throw a completely different idea out there . . . some time
>>>> ago
>>>>>> Matt Mills was talking about using Kafka as the configuration
>>>> transport
>>>>>> mechanism for Traffic Control.  The idea is to use a Kafka
>>>> compacted topic
>>>>>> as the configuration source.  TO would write database updates to
>>>> Kafka, and
>>>>>> the ORT equivalent would pull its configuration from Kafka.
>>>>>> 
>>>>>>    To explain compacted topics a bit, a standard Kafka message is
>>>> a key
>>>>>> and a payload; in a compacted topics, only the most recent
>>>> message/payload
>>>>>> with a particular key is kept.  As a result, reading all the
>>>> messages from
>>>>>> a topic will give you the current state of what's basically a key
>>>> value
>>>>>> store, with the benefit of not doing actual mutations of data.  So
>>>> a cache
>>>>>> could get the full expected configuration by reading all the
>>>> existing
>>>>>> messages on the appropriate topic, as well as get new updates to
>>>>>> configuration by listening for new Kafka messages.
>>>>>> 
>>>>>>    This leaves the load on the Kafka brokers, which I can assure
>>>> you
>>>>>> given recent experience, is minimal.  TO would only have the
>>>> responsibility
>>>>>> of writing database updates to Kafka, ORT only would need to read
>>>>>> individual updates (and be smart enough to know how and when to
>>>> apply them
>>>>>> -- perhaps hints could be provided in the payload?).  The result
>>>> is TO is
>>>>>> "pushing" updates to the caches (via Kafka) as Rawlin was
>>>> proposing, and
>>>>>> ORT could still pull the full configuration whenever necessary
>>>> with no hit
>>>>>> to Postgres or TO.
>>>>>> 
>>>>>>    Now this is obviously a radical shift (and there are no doubt
>>>> other
>>>>>> ways to implement the basic idea), but It seemed worth bringing up.
>>>>>> 
>>>>>>    - Geoff
>>>>>> 
>>>>>>    On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>        +1 on this
>>>>>> 
>>>>>>        On Jul 30, 2019, at 6:01 PM, Rawlin Peters <
>>>>>> [email protected]> wrote:
>>>>>> 
>>>>>>        I've been thinking for a while now that ORT's current
>>>> pull-based
>>>>>> model
>>>>>>        of checking for queued updates is not really ideal, and I
>>>> was
>>>>>> hoping
>>>>>>        with "ORT 2.0" that we would switch that paradigm around
>>>> to where
>>>>>> TO
>>>>>>        itself would push updates out to queued caches. That way
>>>> TO would
>>>>>>        never get overloaded because we could tune the level of
>>>> concurrency
>>>>>>        for pushing out updates (based on server capacity/specs),
>>>> and we
>>>>>> would
>>>>>>        eliminate the "waiting period" between the time updates
>>>> are queued
>>>>>> and
>>>>>>        the time ORT actually updates the config on the cache.
>>>>>> 
>>>>>>        I think cache-side config generation is a good idea in
>>>> terms of
>>>>>>        enabling canary deployments, but as CDNs continue to scale
>>>> by
>>>>>> adding
>>>>>>        more and more caches, we might want to get out ahead of
>>>> the ORT
>>>>>>        load/waiting problem by flipping that paradigm from "pull"
>>>> to
>>>>>> "push"
>>>>>>        somehow. Then instead of 1000 caches all asking TO the same
>>>>>> question
>>>>>>        and causing 1000 duplicated reads from the DB, TO would
>>>> just read
>>>>>> the
>>>>>>        one answer from the DB and send it to all the caches,
>>>> further
>>>>>> reducing
>>>>>>        load on the DB as well. The data in the "push" request
>>>> from TO to
>>>>>> ORT
>>>>>>        2.0 would contain all the information ORT would request
>>>> from the
>>>>>> API
>>>>>>        itself, not the actual config files.
>>>>>> 
>>>>>>        With the API transition from Perl to Go, I think we're
>>>> eliminating
>>>>>> the
>>>>>>        Perl CPU bottleneck from TO, but the next bottleneck seems
>>>> like it
>>>>>>        would be reading from the DB due to the constantly growing
>>>> number
>>>>>> of
>>>>>>        concurrent ORT requests as a CDN scales up. We should keep
>>>> that in
>>>>>>        mind for whatever "ORT 2.0"-type changes we're making so
>>>> that it
>>>>>> won't
>>>>>>        make flipping that paradigm around even harder.
>>>>>> 
>>>>>>        - Rawlin
>>>>>> 
>>>>>>> On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <
>>>> [email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'm confused why this is separate from ORT.
>>>>>>> 
>>>>>>> Because ORT does a lot more than just fetching config
>>>> files.
>>>>>> Rewriting all
>>>>>>> of ORT in Go would be considerably more work.
>>>> Contrawise, if we
>>>>>> were to put
>>>>>>> the config generation in the ORT script itself, we would
>>>> have to
>>>>>> write it
>>>>>>> all from scratch in Perl (the old config gen used the
>>>> database
>>>>>> directly,
>>>>>>> it'd still have to be rewritten) or Python. This was
>>>> just the
>>>>>> easiest path
>>>>>>> forward.
>>>>>>> 
>>>>>>>> I feel like this logic should just be replacing the
>>>> config
>>>>>> fetching logic
>>>>>>> of ORT
>>>>>>> 
>>>>>>> That's exactly what it does: the PR changes ORT to call
>>>> this app
>>>>>> instead of
>>>>>>> calling Traffic Ops over HTTP:
>>>>>>> 
>>>>>> 
>>>> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
>>>>>>> 
>>>>>>>> Is that the eventual plan? Or does our vision of the
>>>> future
>>>>>> include this
>>>>>>> *and* ORT?
>>>>>>> 
>>>>>>> I reserve the right to develop a strong opinion about
>>>> that in
>>>>>> the future.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <
>>>> [email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>>> "I'm just looking for consensus that this is the right
>>>>>> approach."
>>>>>>>> 
>>>>>>>> Umm... sort of. I think moving cache configuration to
>>>> the cache
>>>>>> itself
>>>>>>>> is a great idea,
>>>>>>>> 
>>>>>>>> but I'm confused why this is separate from ORT. Like if
>>>> this is
>>>>>> going to
>>>>>>>> be generating the
>>>>>>>> 
>>>>>>>> configs and it's already right there on the server, I
>>>> feel like
>>>>>> this
>>>>>>>> logic should just be
>>>>>>>> 
>>>>>>>> replacing the config fetching logic of ORT (and
>>>> personally I
>>>>>> think a
>>>>>>>> neat place to try it
>>>>>>>> 
>>>>>>>> out would be in ORT.py).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Is that the eventual plan? Or does our vision of the
>>>> future
>>>>>> include this
>>>>>>>> *and* ORT?
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 7/30/19 2:15 PM, Robert Butts wrote:
>>>>>>>>> Hi all! I've been working on moving the ATS config
>>>> generation
>>>>>> from
>>>>>>>> Traffic
>>>>>>>>> Ops to a standalone app alongside ORT, that queries the
>>>>>> standard TO API
>>>>>>>> to
>>>>>>>>> generate its data. I just wanted to put it here, and
>>>> get some
>>>>>> feedback,
>>>>>>>> to
>>>>>>>>> make sure the community agrees this is the right
>>>> direction.
>>>>>>>>> 
>>>>>>>>> There's a (very) brief spec here: (I might put more
>>>> detail
>>>>>> into it later,
>>>>>>>>> let me know if that's important to anyone)
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
>>>>>>>>> 
>>>>>>>>> And the Draft PR is here:
>>>>>>>>> https://github.com/apache/trafficcontrol/pull/3762
>>>>>>>>> 
>>>>>>>>> This has a number of advantages:
>>>>>>>>> 1. TO is a monolith, this moves a significant amount
>>>> of logic
>>>>>> out of it,
>>>>>>>>> into a smaller per-cache app/library that's easier to
>>>> test,
>>>>>> validate,
>>>>>>>>> rewrite, deploy, canary, rollback, etc.
>>>>>>>>> 2. Deploying cache config changes is much smaller and
>>>> safer.
>>>>>> Instead of
>>>>>>>>> having to deploy (and potentially roll back) TO, you
>>>> can
>>>>>> canary deploy on
>>>>>>>>> one cache at a time.
>>>>>>>>> 3. This makes TC more cache-agnostic. It moves cache
>>>> config
>>>>>> generation
>>>>>>>>> logic out of TO, and into an independent app/library.
>>>> The app
>>>>>> (atstccfg)
>>>>>>>> is
>>>>>>>>> actually very similar to Grove's config generator
>>>>>> (grovetccfg). This
>>>>>>>> makes
>>>>>>>>> it easier and more obvious how to write config
>>>> generators for
>>>>>> other
>>>>>>>> proxies.
>>>>>>>>> 4. By using the API and putting the generator
>>>> functions in a
>>>>>> library,
>>>>>>>> this
>>>>>>>>> really gives a lot more flexibility to put the config
>>>> gen
>>>>>> anywhere you
>>>>>>>> want
>>>>>>>>> without too much work. You could easily put it in an
>>>> HTTP
>>>>>> service, or
>>>>>>>> even
>>>>>>>>> put it back in TO via a Plugin. That's not something
>>>> that's
>>>>>> really
>>>>>>>> possible
>>>>>>>>> with the existing system, generating directly from the
>>>>>> database.
>>>>>>>>> 
>>>>>>>>> Right now, I'm just looking for consensus that this is
>>>> the
>>>>>> right
>>>>>>>> approach.
>>>>>>>>> Does the community agree this is the right direction?
>>>> Are
>>>>>> there concerns?
>>>>>>>>> Would anyone like more details about anything in
>>>> particular?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>

Re: [EXTERNAL] Re: Cache-Side Config Generation

Reply via email to