>Communicating between 11 different processes via stdin/stdout and exit codes, even if the processes themselves are relatively simple, is fairly complex as a whole.
>I don't really see a problem with implementing it as a single well-designed binary It's not that there's a "problem." The question is, if we're doing a rewrite, which design has more pros and fewer cons? Personally, I think the UNIX-Style has more pros. Small, self-contained apps are easier to develop, easier to test, easier to compose, more powerful, and more flexible. Self-contained executables for each logical "thing" makes each of those things much easier to Integration Test itself. Things which would otherwise be Unit Tests, which are more artificial and test less of the complete system. While at the same time, Integration Tests of the full pipeline aren't any harder. The Integration Test Framework can call the Aggregator or pipeline, just like it would the monolith, and verify the output. >stdin/stdout The input and output can be tested just like functions would be, in a monolith. In fact, functions can be very difficult to test if they have side effects, which are very easy and natural to write in most OO/Imperative languages. UNIX-style apps make that impossible; or at least very difficult. The more natural way of writing apps is making them just return output, and not change random things on the system, guiding developers toward more "pure" functions/apps. I can't prove that, of course; but I really do think small Unix-Style apps guide developers toward fewer side effects, making testing and verifying correctness much easier. I think that's another big advantage of the Unix Philosophy. >exit codes Exit codes should always be 0 unless there's an error. The tool (Aggregator) or operator calling the pipeline should return the error if any component fails (a script can do this with "set -o pipefail"). That isn't any more difficult to use than a monolith that checks "if err != nil" for every step. In fact, it's arguably easier than the monolith, because you don't have to do the manual error checks for every call, they're automatically propagated up. If there's an error in a pipeline, the error text returned is returned for the whole pipeline, and in a test the user is given the exact failure text to go track down. UNIX-style executables are easier to develop, too. They can be written in different languages, if one language is more optimal for a particular task. And they can be more easily worked on separately by different developers, with fewer conflicts. It's clearer what each "one thing" is; where in a single large app, it's very easy to blur lines and make confusing code around individual things or concepts. And more developers can work on ORT concurrently, without conflict. Because each person only has to modify their app's input and output, while everyone else does the other apps, in parallel. They're also easier to compose. Operators can call and pipe whatever they need. Where with a Monolith, only what's exposed is available. For example, ORT today _could_ expose a diff argument. But it doesn't. Right now, operators have no way to run ORT's custom diff on their own input. Or, suppose we didn't expose any kind of "caching," but some TC operator needed to cache for a minute, to reduce load. With the Monolith, it's just impossible, unless devs add it, or the operator dives in and modifies the large and complex ORT codebase. But with UNIX-Style apps, you can write a quick shell script to call the ort-get-to-data app, save it to a file, if the file's age is short enough don't call again, and then call the rest of the pipeline. There are innumerable things like that which are quick and easy with small apps, and difficult or impossible with a monolith. >I would also like to bring up the idea that we really need to change >ORT's "pull" paradigm, or at least make the "pull" more efficient so >that we don't have thousands of ORT instances all making the same > requests to TO, with TO having to hit the DB for every request even >though nothing has actually changed. Since we control ORT we have > nearly 100% of control over all TO API requests made, yet we have a >design that self-DDOSes itself by default right now. Do we want to >tackle that problem as part of this redesign, or is that out of scope? I would vote out-of-scope, that we discuss that in a different thread as a different project. Personally, IMO "pull," the HTTP Client-Server model, advantages far outweigh "push." The entire internet is built on Client-Server, for good reason. Push has tons of issues like being Stateful, needing to "register" clients, needing "brokers," more points of failure. The problem with ORT/TO scalability is because we don't implement well-known standards for updates, namely If-Modified-Since et al (I hope the irony isn't lost on anyone). IMO the solution is to implement IMS on Traffic Ops, and then make the "Traffic Ops Requestor" do proper IMS requests. Both network and database calls for IMS are tiny, and it solves the issue without all the disadvantages and costs of "push." But I think that project is orthogonal and independent of this. On Mon, Apr 13, 2020 at 7:06 PM ocket 8888 <[email protected]> wrote: > For what it's worth, I'd be +1 on re-examining "push" vs "pull" for ORT. > > On Mon, Apr 13, 2020, 16:46 Rawlin Peters <[email protected]> wrote: > > > I'm generally +1 on redesigning ORT with the removal of the features > > you mentioned, but the one thing that worries me is the number of > > unique binaries/executables involved (potentially 11). Communicating > > between 11 different processes via stdin/stdout and exit codes, even > > if the processes themselves are relatively simple, is fairly complex > > as a whole. IMO I don't really see a problem with implementing it as a > > single well-designed binary -- if it's Go, each proposed binary could > > just be its own package instead, with each package only exporting one > > high-level function. The main func would then be the "Aggregator" that > > simply calls each package's public function in turn, passing the > > output of one into the input of the next, checking for errors at each > > step. I think that would make it much easier to debug and test as a > > whole. > > > > I would also like to bring up the idea that we really need to change > > ORT's "pull" paradigm, or at least make the "pull" more efficient so > > that we don't have thousands of ORT instances all making the same > > requests to TO, with TO having to hit the DB for every request even > > though nothing has actually changed. Since we control ORT we have > > nearly 100% of control over all TO API requests made, yet we have a > > design that self-DDOSes itself by default right now. Do we want to > > tackle that problem as part of this redesign, or is that out of scope? > > > > - Rawlin > > > > On Thu, Apr 9, 2020 at 4:57 PM Robert O Butts <[email protected]> wrote: > > > > > > I've made a Blueprint proposing to rewrite ORT: > > > https://github.com/apache/trafficcontrol/pull/4628 > > > > > > If you have opinions on ORT, please read and provide feedback. > > > > > > In a nutshell, it's proposing to rewrite ORT in Go, in the "UNIX > > > Philosophy" of small, "do one thing" apps. > > > > > > Importantly, the proposal **removes** the following ORT features: > > > > > > chkconfig - CentOS 7+ and SystemD don't use chkconfig, and moreover our > > > default Profile runlevel is wrong and broken. But my knowledge of > > > CentOS,SystemD,chkconfig,runlevels isn't perfect, if I'm mistaken about > > > this and you're using ORT to set chkconfig, please let us know ASAP. > > > > > > ntpd - ORT today has code to set ntpd config and restart the ntpd > > service. > > > I have no idea why it was ever in charge of this, but this clearly > seems > > to > > > be the system's job, not ORT or TC's. > > > > > > interactive mode - I asked around, and couldn't find anyone using this. > > > Does anyone use it? And feel it's essential to keep in ORT? And also > feel > > > that the way this proposal breaks up the app so that it's easy to > request > > > and compare files before applying them isn't sufficient? > > > > > > reval mode - This was put in because ORT was slow. ORT in master now > > takes > > > 10-20s on our large CDN. Moreover, "reval" mode is no longer > > significantly > > > faster than just applying everything. Does anyone feel otherwise? > > > > > > report mode - The functionality here is valuable. But intention here is > > to > > > replace "ORT report mode" with a pipelined set of app calls or a script > > to > > > do the same thing. I.e. because it's "UNIX-Style" you can just > > "ort-to-get > > > | ort-make-configs | ort-diff". > > > > > > package installation - This is the biggest feature the proposal > removes, > > > and probably the most controversial. The thought is: this isn't > something > > > ORT or Traffic Control should be doing. The same thing that manages the > > > physical machine and/or operating system -- whether that's Ansible, > > Puppet, > > > Chef, or a human System Administrator -- should be installing the OS > > > packages for ATS and its plugins, just like it manages all the other > > > packages on your system. ORT and TC should deploy configuration, not > > > install things. > > > > > > So yeah, feedback welcome. Feel free to post it on the list here or the > > > blueprint PR on github. > > > > > > Thanks, > > >
