Re: Cache-Side Config Generation

Robert Butts Tue, 30 Jul 2019 19:47:25 -0700

>Sure, but I think that's missing the point a bit. There's still the extra
step of fetching the configs from a local source, which is the redundancy
that concerns me. Not in the short-term, but as a long-term solution.


I'm not sure I understand the concern. The "extra step" is just asking a
local app instead of HTTP. Are you concerned about performance? That should
be negligible. Likewise, what's the difference in calling a Perl/Python
function, and calling an app? Is there really much difference in two Python
files, versus a Python file calling a binary file?

>with the Go rewrite of Traffic Ops already more than two major versions
and three years (I think?) old I'm dubious of adding another component
that's supposed to "eventually" replace (and we're not even committed to
that) another.

I also share that concern. But IMO it would be better to have a local app
generating a single config and proxying everything else, than to not have
it. IMO the ability to canary-deploy even a single config cache-side is
worth the app overhead. I'm also hopeful it will go quicker than TO --
there's far less config code than the entirety of TO, and we've already
written most of it, AFAIK there are only a few small config files left.

> this adds the potential question "what if my config generator is version
X and ORT is version Y?"

Ahh, I think I wasn't clear about how this will be deployed. It's part of
ORT-the-RPM. The binary app isn't part of ORT-the-script, but it's in the
RPM, and installed/upgraded by Yum. See
https://github.com/apache/trafficcontrol/pull/3762/files#diff-8ebb93342b2acfa55d6c9fc7df534518
. So, it shouldn't ever be a different version than ORT-the-script, unless
someone manually copies a different binary or script file in, which Traffic
Control would not support any more than someone dropping a different
traffic_ctl in an ATS install. Is that any better?


On Tue, Jul 30, 2019 at 8:02 PM ocket8888 <[email protected]> wrote:

>  >>  is there any reason we can't hit the DB from ORT
>
> pls no
>
>
>  > ...the config generation in the ORT script itself, we would have to
> write it all from scratch in Perl (the old config gen used the database
> directly, it'd still have to be rewritten) or Python
>
> but what if it _was_ in Python though? Something for me to work on this
> weekend, I suppose...
>
>  > That's exactly what it does: the PR changes ORT to call this app
> instead of calling Traffic Ops over HTTP:
>
>
> Sure, but I think that's missing the point a bit. There's still the
> extra step of fetching the configs from a local source, which is the
> redundancy that concerns me. Not in the short-term, but as a long-term
> solution.
>
>
>  >  I reserve the right to develop a strong opinion about that [whether
> ORT is to exist forever in concert with configuration generation] in the
> future.
>
> Of course you're entitled to that, but my concern is that we've
> basically added a component here. ORT already servers the purpose of
> creating on-disk  configuration files from data stored in Traffic Ops,
> and this adds the potential question "what if my config generator is
> version X and ORT is version Y?" and I just think we have enough of that
> already. Sure, ORT does more than place the configuration file, but I'm
> not sure that it does "much more". It emplaces a status file, manages
> packages, and sets service status. Those are arguably complex, but much
> less so when you consider that only CentOS 6/7 is supported. I'd
> estimate a solid 80% of ORT is dealing with configuration files, and I
> understand that that's a dangerously huge rewrite (although ORT.py may
> have done quite a bit of that already!), but with the Go rewrite of
> Traffic Ops already more than two major versions and three years (I
> think?) old I'm dubious of adding another component that's supposed to
> "eventually" replace (and we're not even committed to that) another.
>
> To be clear, though, I absolutely think that config generation on the
> cache server is a massive step in the right direction, and for that
> reason alone I wouldn't oppose this if it's what everyone else thinks is
> best.
>
> On 7/30/19 6:06 PM, Robert Butts wrote:
> >> is there any reason we can't hit the DB from ORT
> > Technically, it's possible. But we really, really shouldn't. The API is a
> > guaranteed interface. The database has no such guarantees. TC users would
> > then be required to deploy ORT with TO, in order; or else implement some
> > sort of backwards compatibility in the DB. In other words, we'd end up
> > having to deal with all the Versioning stuff TO already does for us (and
> > this is why it does it).
> >
> >> I'm still not convinced that it would be that hard to modify it to use
> > json data instead sql queries
> >
> > I was hoping the same. I did exactly that, in the process described in
> the
> > spec (transliterate -> use objects -> use http), to be as safe as
> possible.
> > It was more code than I'd hoped. I'd estimate the changes to the logic to
> > use the objects, and then the code to create those objects from the API,
> > I'd estimate at 20-30% of the entire config code.
> >
> >> What I AM nervous about is someone rewriting all that code
> > I agree, there's some inevitable risk involved. FWIW I've gone to great
> > lengths to minimize the risk as much as possible -- see the spec,
> > transliterating as closely as possible, then changing as little as
> > possible. I also have a set of scripts (which I'm happy to share with
> > anyone who wants them) to pull and diff every single config file, on
> every
> > single server, edge and mid, for Profile endpoints every single profile,
> > from our production database. I've done that for every single config file
> > we've rewritten, to ensure parity as much as possible.
> >
> > Also FWIW, the very act of putting config gen in ORT means we can canary
> > test one cache at a time, when deploying changes to prod, to ensure
> correct
> > behavior before deploying everywhere.
> >
> > I'm also hopeful this will make the config files more stable, once it's
> > done. The Go essentially checks every single error condition it can
> > conceive of. Of course, it isn't possible to check every dynamic
> Parameter.
> > But it comes pretty close. Where, I can speak from experience, the Perl
> > checks pretty darn close to nothing for errors. The Go language lends
> > itself to this, you typically have to go out of your way to ignore
> errors;
> > where Perl arguably lends itself to ignoring errors and assuming
> errorable
> > calls worked.
> >
> >
> > On Tue, Jul 30, 2019 at 5:37 PM Derek Gelinas <[email protected]>
> wrote:
> >
> >> This is probably a stupid question, but is there any reason we can't hit
> >> the DB from ORT, thus saving us the expense of writing any new
> scripting?
> >> My understanding is that the biggest hit on traffic ops isn't the DB so
> >> much as the perl processing for thousands of hosts at once.  I assume
> that
> >> the DB requests themselves would  fairly cacheable, no?
> >>
> >> To be honest I'm still not convinced that it would be that hard to
> modify
> >> it to use json data instead sql queries.  What I AM nervous about is
> >> someone rewriting all that code.  It's pretty damn particular and there
> >> have been a few times where much more minor things have been rewritten
> that
> >> missed the point of certain results entirely and as such broke things.
> >>
> >> Derek
> >>
> >>> On Jul 30, 2019, at 6:22 PM, Robert Butts <[email protected]> wrote:
> >>>
> >>>> I'm confused why this is separate from ORT.
> >>> Because ORT does a lot more than just fetching config files. Rewriting
> >> all
> >>> of ORT in Go would be considerably more work. Contrawise, if we were to
> >> put
> >>> the config generation in the ORT script itself, we would have to write
> it
> >>> all from scratch in Perl (the old config gen used the database
> directly,
> >>> it'd still have to be rewritten) or Python. This was just the easiest
> >> path
> >>> forward.
> >>>
> >>>> I feel like this logic should just be replacing the config fetching
> >> logic
> >>> of ORT
> >>>
> >>> That's exactly what it does: the PR changes ORT to call this app
> instead
> >> of
> >>> calling Traffic Ops over HTTP:
> >>>
> >>
> https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
> >>>> Is that the eventual plan? Or does our vision of the future include
> this
> >>> *and* ORT?
> >>>
> >>> I reserve the right to develop a strong opinion about that in the
> future.
> >>>
> >>>
> >>> On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <[email protected]> wrote:
> >>>
> >>>>> "I'm just looking for consensus that this is the right approach."
> >>>> Umm... sort of. I think moving cache configuration to the cache itself
> >>>> is a great idea,
> >>>>
> >>>> but I'm confused why this is separate from ORT. Like if this is going
> to
> >>>> be generating the
> >>>>
> >>>> configs and it's already right there on the server, I feel like this
> >>>> logic should just be
> >>>>
> >>>> replacing the config fetching logic of ORT (and personally I think a
> >>>> neat place to try it
> >>>>
> >>>> out would be in ORT.py).
> >>>>
> >>>>
> >>>> Is that the eventual plan? Or does our vision of the future include
> this
> >>>> *and* ORT?
> >>>>
> >>>>
> >>>> On 7/30/19 2:15 PM, Robert Butts wrote:
> >>>>> Hi all! I've been working on moving the ATS config generation from
> >>>> Traffic
> >>>>> Ops to a standalone app alongside ORT, that queries the standard TO
> API
> >>>> to
> >>>>> generate its data. I just wanted to put it here, and get some
> feedback,
> >>>> to
> >>>>> make sure the community agrees this is the right direction.
> >>>>>
> >>>>> There's a (very) brief spec here: (I might put more detail into it
> >> later,
> >>>>> let me know if that's important to anyone)
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
> >>>>> And the Draft PR is here:
> >>>>> https://github.com/apache/trafficcontrol/pull/3762
> >>>>>
> >>>>> This has a number of advantages:
> >>>>> 1. TO is a monolith, this moves a significant amount of logic out of
> >> it,
> >>>>> into a smaller per-cache app/library that's easier to test, validate,
> >>>>> rewrite, deploy, canary, rollback, etc.
> >>>>> 2. Deploying cache config changes is much smaller and safer. Instead
> of
> >>>>> having to deploy (and potentially roll back) TO, you can canary
> deploy
> >> on
> >>>>> one cache at a time.
> >>>>> 3. This makes TC more cache-agnostic. It moves cache config
> generation
> >>>>> logic out of TO, and into an independent app/library. The app
> >> (atstccfg)
> >>>> is
> >>>>> actually very similar to Grove's config generator (grovetccfg). This
> >>>> makes
> >>>>> it easier and more obvious how to write config generators for other
> >>>> proxies.
> >>>>> 4. By using the API and putting the generator functions in a library,
> >>>> this
> >>>>> really gives a lot more flexibility to put the config gen anywhere
> you
> >>>> want
> >>>>> without too much work. You could easily put it in an HTTP service, or
> >>>> even
> >>>>> put it back in TO via a Plugin. That's not something that's really
> >>>> possible
> >>>>> with the existing system, generating directly from the database.
> >>>>>
> >>>>> Right now, I'm just looking for consensus that this is the right
> >>>> approach.
> >>>>> Does the community agree this is the right direction? Are there
> >> concerns?
> >>>>> Would anyone like more details about anything in particular?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>
>

Re: Cache-Side Config Generation

Reply via email to