replies inline On Wed, Jun 27, 2018 at 10:59 AM Eric Friedrich (efriedri) <[email protected]> wrote: > > Regardless of adding a new DS type, we’ll always have incompatibility between > features. In this case, we could create new DS types for the _BYPASS, but we > would then still need to block enabling MSO on those delivery services.
<<<RP>>> True, we should probably prohibit MSO on every type of DS that bypasses the mid tier because it won't work. > A few more comments inline > > On Jun 27, 2018, at 12:08 PM, Rawlin Peters <[email protected]> wrote: > > > > Alright, I'm going to try to walk through this from the dev > > perspective a bit too. Say we add a go_direct column (non-null, > > default false) to the deliveryservice table. We can't add a new > > required field to the API without breaking it, so the API backend has > > to do a check like this before inserting the DS into the DB: > > > > if go_direct is undefined: > > if ds_type in (HTTP, DNS, HTTP_LIVE_NATNL, DNS_LIVE_NATNL, etc...): > > go_direct = false > > else: > > go_direct = true > EF>The API field would of course be optional like everything else going into > DS table. > > I’d prefer to see this written as: > If go_direct is undefined: > go_direct = ds_type.getGoDirectDefault() > > Much safer and if you add a new DS Type, the Go interface would enforce > implementing that default > (Maybe we don’t have a DSType interface… we should) <<<RP>>> I agree, the code would be better that way. The example was more just to show that we would need _something_ like that in TO/TP if we didn't have new types. We do have a DSType type in Go TO, but we'd have to add a "GoDirectDefaulter" interface that every DSType would have to implement. > > That means we would need to have some form of that code in the DS API > > as well as Traffic Portal so that someone doesn't accidentally shoot > > themselves in the foot and create a DS that bypasses the mid tier > > without them realizing it. With new DS types, we don't need > > conditional checks like that or a new column in the DS table. Since > > the types would still be in the DNS/HTTP families, we wouldn't have to > > change CRConfig or everywhere else that discriminates between > > DNS/HTTP. We'd really only have to change the code that generates > > parent.config to check for the new types and set go_direct > > accordingly. > EF> If we added new DS Types I would think we would want to update the > CRConfig as well. It would be pretty confusing to me to have all but 4 of the > DS types represented in the CRConfig. If we wanted to change this convention > separately, I’d suggest following Rob’s earlier advice and adding individual > fields for specific behaviors the TR cares about. I really don’t want to try > and troubleshoot Traffic Ops and wonder why my HTTP_LIVE_NATNL_BYPASS > delivery service is showing up as HTTP_LIVE_NATL in the CRConfig. If I saw > that, I’d think I had the DS type set wrong. We’d also need to update the DS > type documentation with a mapping from TO DS Type to the type as it would > appear in the CRConfig. This mapping could get complicated (i.e. why does TR > find out which DS’s are live and vod today or which DS’ are > national/regional, but not which are bypass). <<<RP>>> What I meant by not having to change CRConfig was not having to change CRConfig _generation_. Right now all the DNS-type deliveryservices are put into the DNS-routed bucket, and every other type of DS is put into the HTTP-routed bucket (ANY_MAP is ignored, so this includes all the HTTP_* and STEERING types). So by adding new HTTP_ or DNS_ types, they will get placed into the proper routing protocol buckets without changing CRConfig generation code. TR does not distinguish between _LIVE or whatever, because those differences only affect ATS edge/mid config. > > By adding new type-conflicting fields like go_direct, it also makes it > > slightly more difficult to add new DS types in the future (which we > > may have to do for anycast-routed DSes) because we'd have to maintain > > all the instances of the code snippet above and make sure the new > > types are handled properly with all the type-conflicting fields. > > > > We'd also have to add an asterisk to the DS types in the documentation like: > > HTTP_LIVE_NATNL: same as HTTP_LIVE except the mid-tier is NOT bypassed* > > *unless it IS bypassed by setting go_direct=true on the Delivery Service > EF> We should add that today for MSO :-) Also, I don’t think we need to > asterisk the DS table in this case because its a failure recovery behavior. > We don’t say that the an HTTP_LIVE_NATL has an * with the text “unless all > edges are down then requests are 302’d to the Bypass FQDN" > > > > Which would make choosing a DS type way more confusing IMO. It's like > > this should obsolete the _LIVE types because you could just choose > > _LIVE_NATNL but set go_direct=true to make it _LIVE? > EF> Nope, its only on failure that this happens its not a steady-state > behavior. > > > > I think part of the problem really is the fact that you can't easily > > change the DS type like you said, which actually is only prevented in > > the UI (probably for safety reasons) but not the API. If we had types > > like HTTP and HTTP_BYPASS, the UI should allow switching between those > > two types because that would be relatively safe. If you know what > > you're doing and understand the consequences of changing the DS type, > > I think you should be able to do that. That said, you can switch types > > from HTTP_LIVE_NATNL to HTTP_LIVE today just to bypass the mid tier in > > the scenario where all your mids are down. Maybe we should just > > support that transition for more types? > EF> " If you know what you're doing and understand the consequences of > changing the DS type,” > Most people who operate a Traffic Control should not need to have that > knowledge. It should be the responsibility of the developer community to > provide and enforce appropriate safeguards. <<<RP>>> We can put safeguards in place but still allow the safeguards to be overridden where needed, like how we can "override" the UI by using the API directly to change the DS type, but maybe it should really just be a pop-up modal that asks you "are you sure?". The system should be as flexible as possible while still being easy to operate. > Supporting that transition for more types wouldn’t help us here, because we > need a dynamic recovery that activates on massive mid-failure. We can’t have > a system outage while waiting for operators to triage the problem, manually > make a DS change and then push that change out to 1000 caches. <<<RP>>> If this field is going to be "set it and forget it" then to me it makes even more sense as a DS-type that's not easily switchable as opposed to a true/false boolean field. If you know your origins can handle all the traffic if the mids go down, then that's something you know when considering how your DS should handle a mid tier (i.e., choosing between the DS types).
