Smaller, simpler pieces closer to the cache that do one job are far simpler to 
maintain, triage, and build.  I'm not a fan of trying to inject a message bus 
in the middle of everything.

Jonathan G


On 7/31/19, 8:48 AM, "Genz, Geoffrey" <[email protected]> wrote:

    To throw a completely different idea out there . . . some time ago Matt 
Mills was talking about using Kafka as the configuration transport mechanism 
for Traffic Control.  The idea is to use a Kafka compacted topic as the 
configuration source.  TO would write database updates to Kafka, and the ORT 
equivalent would pull its configuration from Kafka.
    
    To explain compacted topics a bit, a standard Kafka message is a key and a 
payload; in a compacted topics, only the most recent message/payload with a 
particular key is kept.  As a result, reading all the messages from a topic 
will give you the current state of what's basically a key value store, with the 
benefit of not doing actual mutations of data.  So a cache could get the full 
expected configuration by reading all the existing messages on the appropriate 
topic, as well as get new updates to configuration by listening for new Kafka 
messages.
    
    This leaves the load on the Kafka brokers, which I can assure you given 
recent experience, is minimal.  TO would only have the responsibility of 
writing database updates to Kafka, ORT only would need to read individual 
updates (and be smart enough to know how and when to apply them -- perhaps 
hints could be provided in the payload?).  The result is TO is "pushing" 
updates to the caches (via Kafka) as Rawlin was proposing, and ORT could still 
pull the full configuration whenever necessary with no hit to Postgres or TO.
    
    Now this is obviously a radical shift (and there are no doubt other ways to 
implement the basic idea), but It seemed worth bringing up.
    
    - Geoff
    
    On 7/31/19, 8:30 AM, "Lavanya Bathina" <[email protected]> wrote:
    
        +1 on this 
        
        On Jul 30, 2019, at 6:01 PM, Rawlin Peters <[email protected]> 
wrote:
        
        I've been thinking for a while now that ORT's current pull-based model
        of checking for queued updates is not really ideal, and I was hoping
        with "ORT 2.0" that we would switch that paradigm around to where TO
        itself would push updates out to queued caches. That way TO would
        never get overloaded because we could tune the level of concurrency
        for pushing out updates (based on server capacity/specs), and we would
        eliminate the "waiting period" between the time updates are queued and
        the time ORT actually updates the config on the cache.
        
        I think cache-side config generation is a good idea in terms of
        enabling canary deployments, but as CDNs continue to scale by adding
        more and more caches, we might want to get out ahead of the ORT
        load/waiting problem by flipping that paradigm from "pull" to "push"
        somehow. Then instead of 1000 caches all asking TO the same question
        and causing 1000 duplicated reads from the DB, TO would just read the
        one answer from the DB and send it to all the caches, further reducing
        load on the DB as well. The data in the "push" request from TO to ORT
        2.0 would contain all the information ORT would request from the API
        itself, not the actual config files.
        
        With the API transition from Perl to Go, I think we're eliminating the
        Perl CPU bottleneck from TO, but the next bottleneck seems like it
        would be reading from the DB due to the constantly growing number of
        concurrent ORT requests as a CDN scales up. We should keep that in
        mind for whatever "ORT 2.0"-type changes we're making so that it won't
        make flipping that paradigm around even harder.
        
        - Rawlin
        
        > On Tue, Jul 30, 2019 at 4:23 PM Robert Butts <[email protected]> wrote:
        > 
        >> I'm confused why this is separate from ORT.
        > 
        > Because ORT does a lot more than just fetching config files. 
Rewriting all
        > of ORT in Go would be considerably more work. Contrawise, if we were 
to put
        > the config generation in the ORT script itself, we would have to 
write it
        > all from scratch in Perl (the old config gen used the database 
directly,
        > it'd still have to be rewritten) or Python. This was just the easiest 
path
        > forward.
        > 
        >> I feel like this logic should just be replacing the config fetching 
logic
        > of ORT
        > 
        > That's exactly what it does: the PR changes ORT to call this app 
instead of
        > calling Traffic Ops over HTTP:
        > 
https://github.com/apache/trafficcontrol/pull/3762/files#diff-fe8a3eac71ee592a7170f2bdc7e65624R1485
        > 
        >> Is that the eventual plan? Or does our vision of the future include 
this
        > *and* ORT?
        > 
        > I reserve the right to develop a strong opinion about that in the 
future.
        > 
        > 
        > On Tue, Jul 30, 2019 at 3:17 PM ocket8888 <[email protected]> wrote:
        > 
        >>> "I'm just looking for consensus that this is the right approach."
        >> 
        >> Umm... sort of. I think moving cache configuration to the cache 
itself
        >> is a great idea,
        >> 
        >> but I'm confused why this is separate from ORT. Like if this is 
going to
        >> be generating the
        >> 
        >> configs and it's already right there on the server, I feel like this
        >> logic should just be
        >> 
        >> replacing the config fetching logic of ORT (and personally I think a
        >> neat place to try it
        >> 
        >> out would be in ORT.py).
        >> 
        >> 
        >> Is that the eventual plan? Or does our vision of the future include 
this
        >> *and* ORT?
        >> 
        >> 
        >>> On 7/30/19 2:15 PM, Robert Butts wrote:
        >>> Hi all! I've been working on moving the ATS config generation from
        >> Traffic
        >>> Ops to a standalone app alongside ORT, that queries the standard TO 
API
        >> to
        >>> generate its data. I just wanted to put it here, and get some 
feedback,
        >> to
        >>> make sure the community agrees this is the right direction.
        >>> 
        >>> There's a (very) brief spec here: (I might put more detail into it 
later,
        >>> let me know if that's important to anyone)
        >>> 
        >> 
https://cwiki.apache.org/confluence/display/TC/Cache-Side+Config+Generation
        >>> 
        >>> And the Draft PR is here:
        >>> https://github.com/apache/trafficcontrol/pull/3762
        >>> 
        >>> This has a number of advantages:
        >>> 1. TO is a monolith, this moves a significant amount of logic out 
of it,
        >>> into a smaller per-cache app/library that's easier to test, 
validate,
        >>> rewrite, deploy, canary, rollback, etc.
        >>> 2. Deploying cache config changes is much smaller and safer. 
Instead of
        >>> having to deploy (and potentially roll back) TO, you can canary 
deploy on
        >>> one cache at a time.
        >>> 3. This makes TC more cache-agnostic. It moves cache config 
generation
        >>> logic out of TO, and into an independent app/library. The app 
(atstccfg)
        >> is
        >>> actually very similar to Grove's config generator (grovetccfg). This
        >> makes
        >>> it easier and more obvious how to write config generators for other
        >> proxies.
        >>> 4. By using the API and putting the generator functions in a 
library,
        >> this
        >>> really gives a lot more flexibility to put the config gen anywhere 
you
        >> want
        >>> without too much work. You could easily put it in an HTTP service, 
or
        >> even
        >>> put it back in TO via a Plugin. That's not something that's really
        >> possible
        >>> with the existing system, generating directly from the database.
        >>> 
        >>> Right now, I'm just looking for consensus that this is the right
        >> approach.
        >>> Does the community agree this is the right direction? Are there 
concerns?
        >>> Would anyone like more details about anything in particular?
        >>> 
        >>> Thanks,
        >>> 
        >> 
        
    
    

Reply via email to