Re: Cache-Side Config Generation

ocket8888 Wed, 31 Jul 2019 09:07:54 -0700


I've accidentally been replying to Rob directly :P


-------- Forwarded Message --------
Subject:        Re: Cache-Side Config Generation
Date:   Wed, 31 Jul 2019 09:56:37 -0600
From:   ocket8888 <[email protected]>
To:     Robert Butts <[email protected]>

The "extra step" is just asking a local app instead of HTTP. Are you

concerned about performance? That should be negligible. Likewise, what'sthe difference in calling a Perl/Python function, and calling an app? Isthere really much difference in two Python files, versus a Python filecalling a binary file?

Well... when you import a Python module, its namespace is immediatelybrought into the execution context. On this first run, that meanscompiling it first, but after that Python will check a hash of thesource file against one stored in a .pyc (or .pyo, depending on runtimeflags to the interpreter) to see if it needs to again - and as of 3.7 itwill instead use modification times like `make` does. So the differenceis calling out to `fork` and `exec*`, which involves two more contextswitches than just opening and reading a file.

But none of that matters because: no, I'm not concerned aboutperformance. Depending on how the data is going to be parsed once ORTgets it back, you probably even make up for the context switches withthe speed of native execution. I just like talking about Python.

The "redundancy" I was talking about was maintaining the configgeneration code in two places. Obviously we'd be doing that anyway if itwas built into ORT, but we'd be doing it much longer if the plan was toeventually build it into ORT. That's why I was talking about this as an"extra step".

It's part of ORT-the-RPM... Is that any better?


Yes, much.

On 7/30/19 8:46 PM, Robert Butts wrote:

Sure, but I think that's missing the point a bit. There's still the extra

step of fetching the configs from a local source, which is the redundancy
that concerns me. Not in the short-term, but as a long-term solution.

I'm not sure I understand the concern. The "extra step" is just asking a

local app instead of HTTP. Are you concerned about performance? Thatshould

be negligible. Likewise, what's the difference in calling a Perl/Python

function, and calling an app? Is there really much difference in twoPython

files, versus a Python file calling a binary file?

with the Go rewrite of Traffic Ops already more than two major versions

and three years (I think?) old I'm dubious of adding another component
that's supposed to "eventually" replace (and we're not even committed to
that) another.

I also share that concern. But IMO it would be better to have a local app
generating a single config and proxying everything else, than to not have
it. IMO the ability to canary-deploy even a single config cache-side is
worth the app overhead. I'm also hopeful it will go quicker than TO --
there's far less config code than the entirety of TO, and we've already
written most of it, AFAIK there are only a few small config files left.

this adds the potential question "what if my config generator is version

X and ORT is version Y?"

Ahh, I think I wasn't clear about how this will be deployed. It's part of
ORT-the-RPM. The binary app isn't part of ORT-the-script, but it's in the
RPM, and installed/upgraded by Yum. See
https://github.com/apache/trafficcontrol/pull/3762/files#diff-8ebb93342b2acfa55d6c9fc7df534518
. So, it shouldn't ever be a different version than ORT-the-script, unless

someone manually copies a different binary or script file in, whichTraffic

Control would not support any more than someone dropping a different
traffic_ctl in an ATS install. Is that any better?


On Tue, Jul 30, 2019 at 8:02 PM ocket8888 <[email protected]> wrote:

>> is there any reason we can't hit the DB from ORT

pls no

> ...the config generation in the ORT script itself, we would have to
write it all from scratch in Perl (the old config gen used the database
directly, it'd still have to be rewritten) or Python

but what if it _was_ in Python though? Something for me to work on this
weekend, I suppose...

> That's exactly what it does: the PR changes ORT to call this app
instead of calling Traffic Ops over HTTP:

Sure, but I think that's missing the point a bit. There's still the
extra step of fetching the configs from a local source, which is the
redundancy that concerns me. Not in the short-term, but as a long-term
solution.

> I reserve the right to develop a strong opinion about that [whether
ORT is to exist forever in concert with configuration generation] in the
future.

Of course you're entitled to that, but my concern is that we've
basically added a component here. ORT already servers the purpose of
creating on-disk configuration files from data stored in Traffic Ops,
and this adds the potential question "what if my config generator is
version X and ORT is version Y?" and I just think we have enough of that
already. Sure, ORT does more than place the configuration file, but I'm
not sure that it does "much more". It emplaces a status file, manages
packages, and sets service status. Those are arguably complex, but much
less so when you consider that only CentOS 6/7 is supported. I'd
estimate a solid 80% of ORT is dealing with configuration files, and I
understand that that's a dangerously huge rewrite (although ORT.py may
have done quite a bit of that already!), but with the Go rewrite of
Traffic Ops already more than two major versions and three years (I
think?) old I'm dubious of adding another component that's supposed to
"eventually" replace (and we're not even committed to that) another.

To be clear, though, I absolutely think that config generation on the
cache server is a massive step in the right direction, and for that
reason alone I wouldn't oppose this if it's what everyone else thinks is
best.

On 7/30/19 6:06 PM, Robert Butts wrote:

is there any reason we can't hit the DB from ORT

Technically, it's possible. But we really, really shouldn't. The APIis aguaranteed interface. The database has no such guarantees. TC userswould

then be required to deploy ORT with TO, in order; or else implement some
sort of backwards compatibility in the DB. In other words, we'd end up
having to deal with all the Versioning stuff TO already does for us (and
this is why it does it).

I'm still not convinced that it would be that hard to modify it to use

json data instead sql queries

I was hoping the same. I did exactly that, in the process described in

the

spec (transliterate -> use objects -> use http), to be as safe as

possible.

It was more code than I'd hoped. I'd estimate the changes to thelogic to

use the objects, and then the code to create those objects from the API,
I'd estimate at 20-30% of the entire config code.

What I AM nervous about is someone rewriting all that code

I agree, there's some inevitable risk involved. FWIW I've gone to great
lengths to minimize the risk as much as possible -- see the spec,
transliterating as closely as possible, then changing as little as
possible. I also have a set of scripts (which I'm happy to share with
anyone who wants them) to pull and diff every single config file, on

every

single server, edge and mid, for Profile endpoints every single profile,

from our production database. I've done that for every single configfile

we've rewritten, to ensure parity as much as possible.

Also FWIW, the very act of putting config gen in ORT means we can canary
test one cache at a time, when deploying changes to prod, to ensure

correct

behavior before deploying everywhere.

I'm also hopeful this will make the config files more stable, once it's
done. The Go essentially checks every single error condition it can
conceive of. Of course, it isn't possible to check every dynamic

Parameter.

But it comes pretty close. Where, I can speak from experience, the Perl
checks pretty darn close to nothing for errors. The Go language lends
itself to this, you typically have to go out of your way to ignore

errors;

where Perl arguably lends itself to ignoring errors and assuming

errorable

calls worked.


On Tue, Jul 30, 2019 at 5:37 PM Derek Gelinas <[email protected]>

wrote:

This is probably a stupid question, but is there any reason wecan't hit
the DB from ORT, thus saving us the expense of writing any new

scripting?

My understanding is that the biggest hit on traffic ops isn't the DB so
much as the perl processing for thousands of hosts at once. I assume

that

the DB requests themselves would fairly cacheable, no?

To be honest I'm still not convinced that it would be that hard to

modify

it to use json data instead sql queries. What I AM nervous about is
someone rewriting all that code. It's pretty damn particular and there
have been a few times where much more minor things have been rewritten

that

missed the point of certain results entirely and as such broke things.

Derek

On Jul 30, 2019, at 6:22 PM, Robert Butts <[email protected]> wrote:

I'm confused why this is separate from ORT.

Because ORT does a lot more than just fetching config files. Rewriting

all

of ORT in Go would be considerably more work. Contrawise, if wewere to

put

the config generation in the ORT script itself, we would have to write

it

all from scratch in Perl (the old config gen used the database

directly,

it'd still have to be rewritten) or Python. This was just the easiest

path

forward.

I feel like this logic should just be replacing the config fetching

logic

of ORT

That's exactly what it does: the PR changes ORT to call this app

instead

of

calling Traffic Ops over HTTP:

https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485

Is that the eventual plan? Or does our vision of the future include

this

*and* ORT?

I reserve the right to develop a strong opinion about that in the

future.


On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <[email protected]> wrote:

"I'm just looking for consensus that this is the right approach."
Umm... sort of. I think moving cache configuration to the cacheitself
is a great idea,

but I'm confused why this is separate from ORT. Like if this is going

to

be generating the

configs and it's already right there on the server, I feel like this
logic should just be

replacing the config fetching logic of ORT (and personally I think a
neat place to try it

out would be in ORT.py).


Is that the eventual plan? Or does our vision of the future include

this

*and* ORT?


On 7/30/19 2:15 PM, Robert Butts wrote:

Hi all! I've been working on moving the ATS config generation from

Traffic

Ops to a standalone app alongside ORT, that queries the standard TO

API

to

generate its data. I just wanted to put it here, and get some

feedback,

to

make sure the community agrees this is the right direction.

There's a (very) brief spec here: (I might put more detail into it

later,

let me know if that's important to anyone)

https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation

And the Draft PR is here:
https://github.com/apache/trafficcontrol/pull/3762

This has a number of advantages:
1. TO is a monolith, this moves a significant amount of logic out of

it,

into a smaller per-cache app/library that's easier to test,validate,
rewrite, deploy, canary, rollback, etc.
2. Deploying cache config changes is much smaller and safer. Instead

of

having to deploy (and potentially roll back) TO, you can canary

deploy

on

one cache at a time.
3. This makes TC more cache-agnostic. It moves cache config

generation

logic out of TO, and into an independent app/library. The app

(atstccfg)

is

actually very similar to Grove's config generator (grovetccfg). This

makes

it easier and more obvious how to write config generators for other

proxies.

4. By using the API and putting the generator functions in alibrary,

this

really gives a lot more flexibility to put the config gen anywhere

you

want

without too much work. You could easily put it in an HTTPservice, or

even

put it back in TO via a Plugin. That's not something that's really

possible

with the existing system, generating directly from the database.

Right now, I'm just looking for consensus that this is the right

approach.

Does the community agree this is the right direction? Are there

concerns?

Would anyone like more details about anything in particular?

Thanks,

Re: Cache-Side Config Generation

Reply via email to