Re: Cache-Side Config Generation

Derek Gelinas Wed, 31 Jul 2019 07:17:16 -0700

Seems to me the client could download a matrix of what-goes-where just as 
easily as traffic ops could use that info.


> On Jul 31, 2019, at 9:48 AM, Nir Sopher <[email protected]> wrote:
> 
> Hi,
> 
> Architecture wise, I'm in favor of the traffic ops sending the specific
> configuration to the cache.
> Main reason is taking features like "DS *individual *automatic deployment"
> into account, where we would like to be able to control "which server get
> which configuration and when" - e.g. edge cache "A" can be assigned with a
> DS only after all its parents are aware of the DS.
> I believe that if the control of "what configuration is pulled" is in the
> hand of the cache, the complexity of the cfg distribution flow would
> increase and debug-ability would be very difficult.
> 
> Nir
> 
> 
> 
> 
> On Wed, Jul 31, 2019, 05:47 Robert Butts <[email protected]> wrote:
> 
>>> Sure, but I think that's missing the point a bit. There's still the extra
>> step of fetching the configs from a local source, which is the redundancy
>> that concerns me. Not in the short-term, but as a long-term solution.
>> 
>> I'm not sure I understand the concern. The "extra step" is just asking a
>> local app instead of HTTP. Are you concerned about performance? That should
>> be negligible. Likewise, what's the difference in calling a Perl/Python
>> function, and calling an app? Is there really much difference in two Python
>> files, versus a Python file calling a binary file?
>> 
>>> with the Go rewrite of Traffic Ops already more than two major versions
>> and three years (I think?) old I'm dubious of adding another component
>> that's supposed to "eventually" replace (and we're not even committed to
>> that) another.
>> 
>> I also share that concern. But IMO it would be better to have a local app
>> generating a single config and proxying everything else, than to not have
>> it. IMO the ability to canary-deploy even a single config cache-side is
>> worth the app overhead. I'm also hopeful it will go quicker than TO --
>> there's far less config code than the entirety of TO, and we've already
>> written most of it, AFAIK there are only a few small config files left.
>> 
>>> this adds the potential question "what if my config generator is version
>> X and ORT is version Y?"
>> 
>> Ahh, I think I wasn't clear about how this will be deployed. It's part of
>> ORT-the-RPM. The binary app isn't part of ORT-the-script, but it's in the
>> RPM, and installed/upgraded by Yum. See
>> 
>> https://github.com/apache/trafficcontrol/pull/3762/files#diff-8ebb93342b2acfa55d6c9fc7df534518
>> . So, it shouldn't ever be a different version than ORT-the-script, unless
>> someone manually copies a different binary or script file in, which Traffic
>> Control would not support any more than someone dropping a different
>> traffic_ctl in an ATS install. Is that any better?
>> 
>> 
>> On Tue, Jul 30, 2019 at 8:02 PM ocket8888 <[email protected]> wrote:
>> 
>>>>> is there any reason we can't hit the DB from ORT
>>> 
>>> pls no
>>> 
>>> 
>>>> ...the config generation in the ORT script itself, we would have to
>>> write it all from scratch in Perl (the old config gen used the database
>>> directly, it'd still have to be rewritten) or Python
>>> 
>>> but what if it _was_ in Python though? Something for me to work on this
>>> weekend, I suppose...
>>> 
>>>> That's exactly what it does: the PR changes ORT to call this app
>>> instead of calling Traffic Ops over HTTP:
>>> 
>>> 
>>> Sure, but I think that's missing the point a bit. There's still the
>>> extra step of fetching the configs from a local source, which is the
>>> redundancy that concerns me. Not in the short-term, but as a long-term
>>> solution.
>>> 
>>> 
>>>> I reserve the right to develop a strong opinion about that [whether
>>> ORT is to exist forever in concert with configuration generation] in the
>>> future.
>>> 
>>> Of course you're entitled to that, but my concern is that we've
>>> basically added a component here. ORT already servers the purpose of
>>> creating on-disk  configuration files from data stored in Traffic Ops,
>>> and this adds the potential question "what if my config generator is
>>> version X and ORT is version Y?" and I just think we have enough of that
>>> already. Sure, ORT does more than place the configuration file, but I'm
>>> not sure that it does "much more". It emplaces a status file, manages
>>> packages, and sets service status. Those are arguably complex, but much
>>> less so when you consider that only CentOS 6/7 is supported. I'd
>>> estimate a solid 80% of ORT is dealing with configuration files, and I
>>> understand that that's a dangerously huge rewrite (although ORT.py may
>>> have done quite a bit of that already!), but with the Go rewrite of
>>> Traffic Ops already more than two major versions and three years (I
>>> think?) old I'm dubious of adding another component that's supposed to
>>> "eventually" replace (and we're not even committed to that) another.
>>> 
>>> To be clear, though, I absolutely think that config generation on the
>>> cache server is a massive step in the right direction, and for that
>>> reason alone I wouldn't oppose this if it's what everyone else thinks is
>>> best.
>>> 
>>> On 7/30/19 6:06 PM, Robert Butts wrote:
>>>>> is there any reason we can't hit the DB from ORT
>>>> Technically, it's possible. But we really, really shouldn't. The API
>> is a
>>>> guaranteed interface. The database has no such guarantees. TC users
>> would
>>>> then be required to deploy ORT with TO, in order; or else implement
>> some
>>>> sort of backwards compatibility in the DB. In other words, we'd end up
>>>> having to deal with all the Versioning stuff TO already does for us
>> (and
>>>> this is why it does it).
>>>> 
>>>>> I'm still not convinced that it would be that hard to modify it to use
>>>> json data instead sql queries
>>>> 
>>>> I was hoping the same. I did exactly that, in the process described in
>>> the
>>>> spec (transliterate -> use objects -> use http), to be as safe as
>>> possible.
>>>> It was more code than I'd hoped. I'd estimate the changes to the logic
>> to
>>>> use the objects, and then the code to create those objects from the
>> API,
>>>> I'd estimate at 20-30% of the entire config code.
>>>> 
>>>>> What I AM nervous about is someone rewriting all that code
>>>> I agree, there's some inevitable risk involved. FWIW I've gone to great
>>>> lengths to minimize the risk as much as possible -- see the spec,
>>>> transliterating as closely as possible, then changing as little as
>>>> possible. I also have a set of scripts (which I'm happy to share with
>>>> anyone who wants them) to pull and diff every single config file, on
>>> every
>>>> single server, edge and mid, for Profile endpoints every single
>> profile,
>>>> from our production database. I've done that for every single config
>> file
>>>> we've rewritten, to ensure parity as much as possible.
>>>> 
>>>> Also FWIW, the very act of putting config gen in ORT means we can
>> canary
>>>> test one cache at a time, when deploying changes to prod, to ensure
>>> correct
>>>> behavior before deploying everywhere.
>>>> 
>>>> I'm also hopeful this will make the config files more stable, once it's
>>>> done. The Go essentially checks every single error condition it can
>>>> conceive of. Of course, it isn't possible to check every dynamic
>>> Parameter.
>>>> But it comes pretty close. Where, I can speak from experience, the Perl
>>>> checks pretty darn close to nothing for errors. The Go language lends
>>>> itself to this, you typically have to go out of your way to ignore
>>> errors;
>>>> where Perl arguably lends itself to ignoring errors and assuming
>>> errorable
>>>> calls worked.
>>>> 
>>>> 
>>>> On Tue, Jul 30, 2019 at 5:37 PM Derek Gelinas <[email protected]>
>>> wrote:
>>>> 
>>>>> This is probably a stupid question, but is there any reason we can't
>> hit
>>>>> the DB from ORT, thus saving us the expense of writing any new
>>> scripting?
>>>>> My understanding is that the biggest hit on traffic ops isn't the DB
>> so
>>>>> much as the perl processing for thousands of hosts at once.  I assume
>>> that
>>>>> the DB requests themselves would  fairly cacheable, no?
>>>>> 
>>>>> To be honest I'm still not convinced that it would be that hard to
>>> modify
>>>>> it to use json data instead sql queries.  What I AM nervous about is
>>>>> someone rewriting all that code.  It's pretty damn particular and
>> there
>>>>> have been a few times where much more minor things have been rewritten
>>> that
>>>>> missed the point of certain results entirely and as such broke things.
>>>>> 
>>>>> Derek
>>>>> 
>>>>>> On Jul 30, 2019, at 6:22 PM, Robert Butts <[email protected]> wrote:
>>>>>> 
>>>>>>> I'm confused why this is separate from ORT.
>>>>>> Because ORT does a lot more than just fetching config files.
>> Rewriting
>>>>> all
>>>>>> of ORT in Go would be considerably more work. Contrawise, if we were
>> to
>>>>> put
>>>>>> the config generation in the ORT script itself, we would have to
>> write
>>> it
>>>>>> all from scratch in Perl (the old config gen used the database
>>> directly,
>>>>>> it'd still have to be rewritten) or Python. This was just the easiest
>>>>> path
>>>>>> forward.
>>>>>> 
>>>>>>> I feel like this logic should just be replacing the config fetching
>>>>> logic
>>>>>> of ORT
>>>>>> 
>>>>>> That's exactly what it does: the PR changes ORT to call this app
>>> instead
>>>>> of
>>>>>> calling Traffic Ops over HTTP:
>>>>>> 
>>>>> 
>>> 
>> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
>>>>>>> Is that the eventual plan? Or does our vision of the future include
>>> this
>>>>>> *and* ORT?
>>>>>> 
>>>>>> I reserve the right to develop a strong opinion about that in the
>>> future.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <[email protected]>
>> wrote:
>>>>>> 
>>>>>>>> "I'm just looking for consensus that this is the right approach."
>>>>>>> Umm... sort of. I think moving cache configuration to the cache
>> itself
>>>>>>> is a great idea,
>>>>>>> 
>>>>>>> but I'm confused why this is separate from ORT. Like if this is
>> going
>>> to
>>>>>>> be generating the
>>>>>>> 
>>>>>>> configs and it's already right there on the server, I feel like this
>>>>>>> logic should just be
>>>>>>> 
>>>>>>> replacing the config fetching logic of ORT (and personally I think a
>>>>>>> neat place to try it
>>>>>>> 
>>>>>>> out would be in ORT.py).
>>>>>>> 
>>>>>>> 
>>>>>>> Is that the eventual plan? Or does our vision of the future include
>>> this
>>>>>>> *and* ORT?
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/30/19 2:15 PM, Robert Butts wrote:
>>>>>>>> Hi all! I've been working on moving the ATS config generation from
>>>>>>> Traffic
>>>>>>>> Ops to a standalone app alongside ORT, that queries the standard TO
>>> API
>>>>>>> to
>>>>>>>> generate its data. I just wanted to put it here, and get some
>>> feedback,
>>>>>>> to
>>>>>>>> make sure the community agrees this is the right direction.
>>>>>>>> 
>>>>>>>> There's a (very) brief spec here: (I might put more detail into it
>>>>> later,
>>>>>>>> let me know if that's important to anyone)
>>>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
>>>>>>>> And the Draft PR is here:
>>>>>>>> https://github.com/apache/trafficcontrol/pull/3762
>>>>>>>> 
>>>>>>>> This has a number of advantages:
>>>>>>>> 1. TO is a monolith, this moves a significant amount of logic out
>> of
>>>>> it,
>>>>>>>> into a smaller per-cache app/library that's easier to test,
>> validate,
>>>>>>>> rewrite, deploy, canary, rollback, etc.
>>>>>>>> 2. Deploying cache config changes is much smaller and safer.
>> Instead
>>> of
>>>>>>>> having to deploy (and potentially roll back) TO, you can canary
>>> deploy
>>>>> on
>>>>>>>> one cache at a time.
>>>>>>>> 3. This makes TC more cache-agnostic. It moves cache config
>>> generation
>>>>>>>> logic out of TO, and into an independent app/library. The app
>>>>> (atstccfg)
>>>>>>> is
>>>>>>>> actually very similar to Grove's config generator (grovetccfg).
>> This
>>>>>>> makes
>>>>>>>> it easier and more obvious how to write config generators for other
>>>>>>> proxies.
>>>>>>>> 4. By using the API and putting the generator functions in a
>> library,
>>>>>>> this
>>>>>>>> really gives a lot more flexibility to put the config gen anywhere
>>> you
>>>>>>> want
>>>>>>>> without too much work. You could easily put it in an HTTP service,
>> or
>>>>>>> even
>>>>>>>> put it back in TO via a Plugin. That's not something that's really
>>>>>>> possible
>>>>>>>> with the existing system, generating directly from the database.
>>>>>>>> 
>>>>>>>> Right now, I'm just looking for consensus that this is the right
>>>>>>> approach.
>>>>>>>> Does the community agree this is the right direction? Are there
>>>>> concerns?
>>>>>>>> Would anyone like more details about anything in particular?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> 
>>>>> 
>>> 
>>

Re: Cache-Side Config Generation

Reply via email to