Alright, I'm trying to sum up this discussion so far since it seems like everyone went on vacation and didn't really get a chance to wrap this one up: - duplicate origins cause undefined behavior - we need a way to migrate to a future that is free of duplicate origins in Traffic Control - we need a visual and easy way to determine if Traffic Ops currently contains duplicate origins, so that operators are incentivized to fix them rather than let it slide indefinitely - operators should have a fair amount of time to fix their duplicate origins
I believe this is what we've mostly agreed upon but haven't clearly voted on: In release N we will: 1. Prohibit creating new delivery services that would share an existing origin and prohibit updating a delivery service to a shared origin 2. Add some kind of visual indicator that duplicate origins are a problem that need to be fixed before release N+1; otherwise, an upgrade to N+1 will be prohibited. In release N+1 we will: 3. Include a DB migration that adds a uniqueness constraint on origin FQDN, removing the API-level checks for that. 4. Prevent an upgrade to N+1 if duplicate origins are found (this might occur as a byproduct of step 3). I am +1 on this plan and believe this would hit on all the summarized points above. Please provide a clear vote on this plan so that we can dive deeper in the details (i.e. what release 'N' is, the best visual indicator for step 2, and a friendly way to handle step 4). Thanks, Rawlin On Wed, Dec 19, 2018 at 3:39 PM Jeremy Mitchell <mitchell...@gmail.com> wrote: > > Not sure TP is the right place for a warning like "clean up this > 'duplicate' origin or your next upgrade will fail". Most users of our > system will be like "not my problem". > > On Wed, Dec 19, 2018 at 11:58 AM Fieck, Brennan <brennan_fi...@comcast.com> > wrote: > > > Probably. It would impact load times by a bit, but the page for an > > individual object is not our bottleneck. > > ________________________________________ > > From: Robert Butts <r...@apache.org> > > Sent: Wednesday, December 19, 2018 11:50 AM > > To: dev@trafficcontrol.apache.org > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery Services > > produces indeterminate parent.config > > > > > - Including a warning on startup and an API constraint preventing adding > > more bad data in the next 3.0.0 Release Candidate > > > - Adding a database constraint immediately into master that won't be > > cherry-picked into 3.0.0 but should be included in 3.1.0 > > > > +1 > > > > I understand Jonathan's objection, but at some point, we have to be able to > > move forward. This is a good compromise: deprecate, then remove. That gives > > people a full major version to fix their data. > > > > I would be ideal if it were more than just a logged warning, though. Can we > > add a big red banner in Traffic Portal, on the Delivery Service page for > > any DS with a duplicate origin, telling users to fix it, and that they > > won't be able to upgrade to the next major version until it's fixed? > > > > > > On Wed, Dec 19, 2018 at 10:57 AM Fieck, Brennan <brennan_fi...@comcast.com > > > > > wrote: > > > > > So it seems like nobody has a problem with the "how" - disallowing > > > duplicate origin FQDNs on Delivery Services - but we never reached a > > > consensus on "when". > > > > > > I stand by my previous proposal: > > > - Including a warning on startup and an API constraint preventing adding > > > more bad data in the next 3.0.0 Release Candidate > > > - Adding a database constraint immediately into master that won't be > > > cherry-picked into 3.0.0 but should be included in 3.1.0 > > > ________________________________________ > > > From: Rawlin Peters <rawlin.pet...@gmail.com> > > > Sent: Tuesday, December 18, 2018 4:59 PM > > > To: dev@trafficcontrol.apache.org > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery Services > > > produces indeterminate parent.config > > > > > > Also, building more around DS types will make it even harder to get > > > away from DS types in the future too, which I know is something we've > > > discussed on this mailing list before. It also adds to the overhead of > > > Delivery Service Topologies, since a lot of the DS types won't > > > carryover into that world. > > > > > > - Rawlin > > > > > > On Tue, Dec 18, 2018 at 2:42 PM Fieck, Brennan > > > <brennan_fi...@comcast.com> wrote: > > > > > > > > +1. > > > > If there's a simple way to work around duplicate origins being > > > prohibited, > > > > then we should rely on that instead of "enumerating all those possible > > > conflicting > > > > settings, which are not only highly complex and confusing, but also > > > further > > > > entrench us in only supporting ATS as a caching proxy (hurting efforts > > to > > > > integrate e.g. Grove, nginx etc.) > > > > ________________________________________ > > > > From: Rawlin Peters <rawlin.pet...@gmail.com> > > > > Sent: Tuesday, December 18, 2018 2:20 PM > > > > To: dev@trafficcontrol.apache.org > > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery > > > Services produces indeterminate parent.config > > > > > > > > There are a number of different DS settings at play that can > > > > potentially cause conflicts. The question is: do we want to get into > > > > the business of enumerating all those possible conflicting settings or > > > > just simply prohibit duplicate origins altogether? I think we can dig > > > > in and get that "sufficiently advanced sql query" to check for > > > > conflicting origins, but is that something we want to carry along for > > > > the foreseeable future? Aren't CNAMEs relatively cheaper than > > > > developing and maintaining that code and the mental overhead required > > > > in understanding why you're getting an error that says your requested > > > > DS would cause an origin conflict? I think at the point you've > > > > requested a DS that would create a conflict, you've chosen those > > > > settings for a reason and would probably prefer to just create/use a > > > > CNAME in your new DS and keep the rest of your settings the same. > > > > > > > > Thinking in terms of errors, I'm imagining: > > > > "cannot create delivery service: origin fqdn 'foo.example.com' already > > > in use" > > > > vs > > > > "cannot create delivery service: origin fqdn 'foo.example.com' already > > > > in use as type DNS_LIVE_NATNL, which is incompatible with your chosen > > > > type of HTTP_NO_CACHE" > > > > > > > > At that point you'd probably say to yourself, "uh, I need > > > > HTTP_NO_CACHE, so what am I supposed to do now?" > > > > > > > > As a lazy developer I'm +1 on prohibiting duplicate origin fqdns > > > > because the resulting code will be simpler, but I think eliminating > > > > the mental overhead for operators could be worthwhile too. If we can > > > > agree on an end state of prohibiting duplicate origins altogether, we > > > > can start working on a design to smoothly transition us to that point. > > > > Are we willing to live with "just CNAME your origin fqdn" as the > > > > standard solution to duplicates? > > > > > > > > - Rawlin > > > > > > > > > > > > On Tue, Dec 18, 2018 at 1:27 PM Gelinas, Derek > > > > <derek_geli...@comcast.com> wrote: > > > > > > > > > > The only situation in which they can share origins is if a) the > > > origins are shared in an MSO configuration but still have different > > defined > > > origin fields in the delivery service, or if they're assigned to > > completely > > > different cachegroups. It's when two delivery services share the same > > > edges that there's an issue, because you end up with parent.config > > issues. > > > Actually you could even get away with it in mids as long as you weren't > > > doing anything like MSO to it. > > > > > > > > > > Could get messy real fast, though. Best to just create a second > > FQDN. > > > > > > > > > > Derek > > > > > > > > > > On 12/18/18, 3:23 PM, "Fieck, Brennan" <brennan_fi...@comcast.com> > > > wrote: > > > > > > > > > > So no two Delivery Services may share an origin *regardless of > > > cache hierarchy* ? I've been told that DNS Delivery Services can have the > > > same origin as HTTP Delivery Services because they obey the same cache > > > hierarchy. You're saying that would still produce invalid output and/or > > is > > > explicitly disallowed by ATS? > > > > > ________________________________________ > > > > > From: Robert Butts <r...@apache.org> > > > > > Sent: Tuesday, December 18, 2018 1:09 PM > > > > > To: dev@trafficcontrol.apache.org > > > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery > > > Services produces indeterminate parent.config > > > > > > > > > > >can you give an example of what parent.config looks like when 2 > > > ds's share > > > > > an origin and have different a different topology? > > > > > > > > > > Answering because I encountered this directly, when rewriting > > > parent.config. > > > > > > > > > > For example: Suppose you have one Delivery Service: > > > > > XML_ID: foo > > > > > Type: HTPT_LIVE_NATL > > > > > Query String Handling: 1 - ignore in cache key, and pass up > > > > > Origin Server Base URL: http://foo.example.net > > > > > > > > > > And another Delivery Service: > > > > > XML_ID: bar > > > > > Type: HTPT_LIVE_NATL > > > > > Query String Handling: 1 - ignore in cache key, and pass up > > > > > Origin Server Base URL: http://foo.example.net > > > > > > > > > > ATS only supports unique `dest_domain` entries in parent.config. > > > Therefore, > > > > > the parent.config generated for a server assigned to both of > > these > > > Delivery > > > > > Services with either be: > > > > > > > > > > dest_domain=foo.example.net port=80 go_direct=true > > > > > > > > > > Or > > > > > > > > > > dest_domain=foo.example.net port=80 go_direct=false > > > qstring=consider > > > > > > > > > > Right now, it's arbitrary which Perl Traffic Ops inserts, and > > Perl > > > provides > > > > > no warning or error of any kind (the pending Go parent.config PR > > > logs an > > > > > error). > > > > > > > > > > Whichever is arbitrarily inserted, the resulting remaps for the > > > other > > > > > delivery service will be wrong. Either `foo` requests will drop > > > the query > > > > > string when they shouldn't, and go to the mid when they > > shouldn't; > > > or `bar` > > > > > requests will use the query string and skip the mid when it > > > shouldn't. > > > > > > > > > > > > > > > Does that make sense? The only correct solution, is to somehow > > > prevent > > > > > different DSes having the same origin, and tell tenants they must > > > use > > > > > CNAMEs if they need. > > > > > > > > > > This isn't a bug in Traffic Control. ATS fundamentally doesn't > > > support > > > > > multiple remap rules with the same parent FQDN with different > > > > > configurations. Hence, Traffic Control needs to prohibit that. > > > > > > > > > > > > > > > On Tue, Dec 18, 2018 at 12:24 PM Jeremy Mitchell < > > > mitchell...@gmail.com> > > > > > wrote: > > > > > > > > > > > brennan, > > > > > > > > > > > > can you give an example of what parent.config looks like when 2 > > > ds's share > > > > > > an origin and have different a different topology? > > > > > > > > > > > > jeremy > > > > > > > > > > > > On Tue, Dec 18, 2018 at 11:39 AM Fieck, Brennan < > > > brennan_fi...@comcast.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > To be clear, the "Warning" I'm talking about would happen at > > > startup, but > > > > > > > I'd like a UI-only constraint to come with that to disallow > > > using the API > > > > > > > to bind the same origin to multiple Delivery Services with > > > varying > > > > > > > topography requirements. It wouldn't change the existing > > data, > > > but > > > > > > prevent > > > > > > > users from creating more bad data. > > > > > > > > > > > > > > "warning" doesn't really sufficiently describe that, my bad. > > > > > > > ________________________________________ > > > > > > > From: Fieck, Brennan <brennan_fi...@comcast.com> > > > > > > > Sent: Tuesday, December 18, 2018 11:24 AM > > > > > > > To: dev@trafficcontrol.apache.org > > > > > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe > > > Delivery Services > > > > > > > produces indeterminate parent.config > > > > > > > > > > > > > > Well the cost of fixing this bug is a constraint on the data. > > > Unless we > > > > > > > make it a UI-only constraint - which I'm personally against - > > > there must > > > > > > be > > > > > > > some point in the future where ATC cannot reasonably be > > > expected to work > > > > > > > with data that violates that constraint. The question is when > > > that should > > > > > > > occur, which should likely happen at a minor version release. > > > Minor not > > > > > > > major because it doesn't involve a change in data structures, > > > merely > > > > > > > relationships between them - in my opinion that's a minor > > > version change > > > > > > > but that's definitely up for debate. With several release > > > candidates for > > > > > > > 3.0.0 that _doesn't_ include this restriction already in the > > > wild, I > > > > > > > wouldn't recommend putting it in there. That means to fix the > > > bug as soon > > > > > > > as possible it should go in 3.1.0 which should be the target > > > of "master" > > > > > > > after the 3.0.0 release is cut from it. > > > > > > > > > > > > > > So I'd recommend immediately implementing the constraint in > > > master with a > > > > > > > refusal to upgrade with bad data, and backport a warning > > about > > > the future > > > > > > > behavior into 3.0.0 or as part of a 3.0.1 provided we had > > more > > > changes > > > > > > that > > > > > > > would warrant a micro version bump. > > > > > > > ________________________________________ > > > > > > > From: Gray, Jonathan <jonathan_g...@comcast.com> > > > > > > > Sent: Tuesday, December 18, 2018 9:34 AM > > > > > > > To: dev@trafficcontrol.apache.org > > > > > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe > > > Delivery Services > > > > > > > produces indeterminate parent.config > > > > > > > > > > > > > > -1 Holding an ATC upgrade hostage to data cleanup seems like > > a > > > bad idea. > > > > > > > The issue isn't great, but it's also not new. We should > > allow > > > teams to > > > > > > fix > > > > > > > their data at their normal paces if it doesn't create > > > significant > > > > > > overhead > > > > > > > or an inherant blocker for new functionality or correction of > > > other major > > > > > > > problems imho. > > > > > > > > > > > > > > Jonathan G > > > > > > > > > > > > > > > > > > > > > On 12/18/18, 9:28 AM, "Fieck, Brennan" < > > > brennan_fi...@comcast.com> > > > > > > wrote: > > > > > > > > > > > > > > Another option is we could detect collisions at startup > > > and simply > > > > > > > refuse to continue with the upgrade until the data is fixed. > > > That would > > > > > > > allow people using the now-unsupported data format to > > continue > > > to use > > > > > > their > > > > > > > old versions of Traffic Ops without wrecking their database, > > > but also > > > > > > > provide an incentive to clean up the data. > > > > > > > ________________________________________ > > > > > > > From: Gray, Jonathan <jonathan_g...@comcast.com> > > > > > > > Sent: Tuesday, December 18, 2018 5:12 AM > > > > > > > To: dev@trafficcontrol.apache.org > > > > > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe > > > Delivery > > > > > > > Services produces indeterminate parent.config > > > > > > > > > > > > > > I'm generally a fan of constrain your data in your > > > database, but not > > > > > > > necessarily exclusively. I see this as a one-way > > > cleanup/conversion so > > > > > > it > > > > > > > doesn't need to be configurable; otherwise you have to ask > > the > > > question > > > > > > > what happens if someone turns it off. That said, something > > in > > > the UI > > > > > > layer > > > > > > > would be nice to prevent spending significant quantities of > > > time > > > > > > building a > > > > > > > complex DS only to have it fail to post for reasons that > > could > > > have been > > > > > > > known earlier. > > > > > > > > > > > > > > The way my brain works in this case: > > > > > > > If !unique_constraint_exists_query() > > > > > > > If has_duplicates_query() > > > > > > > show_warning() > > > > > > > else > > > > > > > add_unique_constraint() > > > > > > > > > > > > > > to which the API and UI configuration could also make use > > > of > > > > > > > unique_constraint_exists_query() to drive additional layer > > > constraints if > > > > > > > desired. > > > > > > > > > > > > > > Jonathan G > > > > > > > > > > > > > > On 12/17/18, 1:11 PM, "Rawlin Peters" < > > > rawlin.pet...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > That is an interesting idea...detect at TO startup > > > whether or not > > > > > > > there are duplicate origins and operate in a "prevent > > > duplicate > > > > > > > origins" state if no duplicates are found or "prevent > > > conflicting > > > > > > > DS > > > > > > > topologies" state if duplicates are found? So once > > > operators have > > > > > > > replaced all the duplicate origins with CNAMEs, TO > > will > > > > > > essentially > > > > > > > operate in a "prohibit all duplicate origins" state. > > > That would > > > > > > > probably make for a simpler transition, but I'd want > > > to remove > > > > > > that > > > > > > > logic in a following release that strictly prohibits > > > duplicate > > > > > > > origins > > > > > > > (assuming that the community agrees we should > > prohibit > > > duplicate > > > > > > > origins altogether). > > > > > > > > > > > > > > As for DB constraints vs UI, I was thinking those > > > DS-type > > > > > > > constraints > > > > > > > I pointed out would live in the API. It would > > > basically be added > > > > > > > validation in the deliveryservices POST/PUT endpoint > > > that checks > > > > > > > the > > > > > > > DB for existing DSes that conflict with the requested > > > DS. > > > > > > > > > > > > > > - Rawlin > > > > > > > > > > > > > > On Mon, Dec 17, 2018 at 12:35 PM Gray, Jonathan > > > > > > > <jonathan_g...@comcast.com> wrote: > > > > > > > > > > > > > > > > These kinds of conditions should be detectable > > with a > > > > > > > sufficiently advanced SQL query. Is it possible to add the > > > constraint if > > > > > > > it passes and emit a warning during TO startup otherwise? > > > That would let > > > > > > > you know the condition exists at startup but not getting in > > > your way and > > > > > > > keep you out of trouble once you've cleaned up. We made a > > > mistake early > > > > > > > on, but this would acknowledge it was bad and encourage it to > > > be fixed at > > > > > > > the speed of operations teams. Also this puts the constraint > > > in the > > > > > > > database rather than the UI which is really where the > > > contention is for > > > > > > > usability. > > > > > > > > > > > > > > > > Jonathan G > > > > > > > > > > > > > > > > > > > > > > > > On 12/17/18, 11:38 AM, "Rawlin Peters" < > > > > > > rawlin.pet...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > We occasionally discuss this issue but haven't > > > tackled it > > > > > > > yet. I think > > > > > > > > the main issue is just that duplicate origins > > > have been > > > > > > > allowed since > > > > > > > > the beginning, and now everyone's Traffic Ops > > > could be > > > > > > > littered with > > > > > > > > duplicate origins. Also, depending on the > > config > > > of the > > > > > > > duplicate > > > > > > > > delivery services, the origins might not be in > > > conflict at > > > > > > > all (if > > > > > > > > they don't have different topology > > constraints). > > > I would > > > > > > > love for us > > > > > > > > to just add a uniqueness constraint, but there > > > would need > > > > > > to > > > > > > > be a fair > > > > > > > > amount of warning to the community before doing > > > so and > > > > > > might > > > > > > > > invalidate a significant amount of valid use > > > cases. > > > > > > > Operators would > > > > > > > > need time to make DNS CNAME records for the > > > duplicate > > > > > > > origins and > > > > > > > > update their DSes to use the different CNAMEs. > > > > > > > > > > > > > > > > I think as a good first step to eliminating the > > > use of > > > > > > > duplicate > > > > > > > > origins altogether, we should identify which > > > "topology > > > > > > > constraints" > > > > > > > > actually cause conflicting config when used > > with > > > duplicate > > > > > > > origins and > > > > > > > > prevent creating DSes with duplicate origins > > _if > > > it would > > > > > > > cause a > > > > > > > > conflict with an existing DS that uses the same > > > origin_. > > > > > > > > > > > > > > > > For instance, I believe an HTTP and DNS-type DS > > > can live > > > > > > > happily > > > > > > > > side-by-side using the same origin (probably > > > need different > > > > > > > > routing_names?), but scenarios like HTTP and > > > HTTP_LIVE, or > > > > > > > DNS and > > > > > > > > HTTP_NO_CACHE sharing the same origin will > > cause > > > conflicts > > > > > > > for sure. > > > > > > > > So maybe we can start by making sure the DS > > > types "match" > > > > > > > when using > > > > > > > > the same origin: > > > > > > > > HTTP + DNS: possibly good, if they have > > > different routing > > > > > > > names? > > > > > > > > HTTP_LIVE + HTTP_LIVE_NATNL: bad > > > > > > > > HTTP_NO_CACHE + [any other type]: bad > > > > > > > > HTTP_LIVE + HTTP: bad > > > > > > > > etc. > > > > > > > > > > > > > > > > There are most likely other conflict scenarios > > > that don't > > > > > > > involve the > > > > > > > > DS types, but I think this would be a good > > > start. In the > > > > > > > future with > > > > > > > > Delivery Service Topologies (aka Flexible > > > Cachegroups aka > > > > > > > Bring Your > > > > > > > > Own Topology), we might be able to prohibit > > > assigning a DS > > > > > > > to a > > > > > > > > Topology if the DS's origin is already used by > > > another DS > > > > > > in > > > > > > > a > > > > > > > > different Topology. > > > > > > > > > > > > > > > > - Rawlin > > > > > > > > > > > > > > > > On Mon, Dec 17, 2018 at 10:52 AM Fieck, Brennan > > > > > > > > <brennan_fi...@comcast.com> wrote: > > > > > > > > > > > > > > > > > > As some of you may be aware, `parent.config` > > > files > > > > > > > generated by Traffic Ops can vary wildly when an origin is > > > assigned to > > > > > > > multiple Delivery Services. This results in undefined > > > behavior. I'm told > > > > > > > that the conflict only happens when two Delivery Services > > with > > > different > > > > > > > "topology requirements" use the same origin, whatever that > > > means (content > > > > > > > routing type?). Regardless, the issue should be addressed. > > The > > > obvious > > > > > > > solution is to put in place a database constraint that > > > prevents an origin > > > > > > > from being assigned to more that one Delivery Service with > > API > > > checks in > > > > > > > place that would provide helpful error messages when an > > > attempt is made > > > > > > to > > > > > > > violate the constraint. However, would that mess with things > > > like > > > > > > > Multi-Site Origin? Or is it just not viable for some other > > > reason? If it > > > > > > is > > > > > > > a good solution, I'm prepared to work on a fix that utilizes > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >