Alright, I'm trying to sum up this discussion so far since it seems
like everyone went on vacation and didn't really get a chance to wrap
this one up:
- duplicate origins cause undefined behavior
- we need a way to migrate to a future that is free of duplicate
origins in Traffic Control
- we need a visual and easy way to determine if Traffic Ops currently
contains duplicate origins, so that operators are incentivized to fix
them rather than let it slide indefinitely
- operators should have a fair amount of time to fix their duplicate origins

I believe this is what we've mostly agreed upon but haven't clearly voted on:

In release N we will:
1. Prohibit creating new delivery services that would share an
existing origin and prohibit updating a delivery service to a shared
origin
2. Add some kind of visual indicator that duplicate origins are a
problem that need to be fixed before release N+1; otherwise, an
upgrade to N+1 will be prohibited.

In release N+1 we will:
3. Include a DB migration that adds a uniqueness constraint on origin
FQDN, removing the API-level checks for that.
4. Prevent an upgrade to N+1 if duplicate origins are found (this
might occur as a byproduct of step 3).

I am +1 on this plan and believe this would hit on all the summarized
points above. Please provide a clear vote on this plan so that we can
dive deeper in the details (i.e. what release 'N' is, the best visual
indicator for step 2, and a friendly way to handle step 4).

Thanks,
Rawlin

On Wed, Dec 19, 2018 at 3:39 PM Jeremy Mitchell <mitchell...@gmail.com> wrote:
>
> Not sure TP is the right place for a warning like "clean up this
> 'duplicate' origin or your next upgrade will fail". Most users of our
> system will be like "not my problem".
>
> On Wed, Dec 19, 2018 at 11:58 AM Fieck, Brennan <brennan_fi...@comcast.com>
> wrote:
>
> > Probably. It would impact load times by a bit, but the page for an
> > individual object is not our bottleneck.
> > ________________________________________
> > From: Robert Butts <r...@apache.org>
> > Sent: Wednesday, December 19, 2018 11:50 AM
> > To: dev@trafficcontrol.apache.org
> > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery Services
> > produces indeterminate parent.config
> >
> > > - Including a warning on startup and an API constraint preventing adding
> > more bad data in the next 3.0.0 Release Candidate
> > > - Adding a database constraint immediately into master that won't be
> > cherry-picked into 3.0.0 but should be included in 3.1.0
> >
> > +1
> >
> > I understand Jonathan's objection, but at some point, we have to be able to
> > move forward. This is a good compromise: deprecate, then remove. That gives
> > people a full major version to fix their data.
> >
> > I would be ideal if it were more than just a logged warning, though. Can we
> > add a big red banner in Traffic Portal, on the Delivery Service page for
> > any DS with a duplicate origin, telling users to fix it, and that they
> > won't be able to upgrade to the next major version until it's fixed?
> >
> >
> > On Wed, Dec 19, 2018 at 10:57 AM Fieck, Brennan <brennan_fi...@comcast.com
> > >
> > wrote:
> >
> > > So it seems like nobody has a problem with the "how" - disallowing
> > > duplicate origin FQDNs on Delivery Services - but we never reached a
> > > consensus on "when".
> > >
> > > I stand by my previous proposal:
> > > - Including a warning on startup and an API constraint preventing adding
> > > more bad data in the next 3.0.0 Release Candidate
> > > - Adding a database constraint immediately into master that won't be
> > > cherry-picked into 3.0.0 but should be included in 3.1.0
> > > ________________________________________
> > > From: Rawlin Peters <rawlin.pet...@gmail.com>
> > > Sent: Tuesday, December 18, 2018 4:59 PM
> > > To: dev@trafficcontrol.apache.org
> > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery Services
> > > produces indeterminate parent.config
> > >
> > > Also, building more around DS types will make it even harder to get
> > > away from DS types in the future too, which I know is something we've
> > > discussed on this mailing list before. It also adds to the overhead of
> > > Delivery Service Topologies, since a lot of the DS types won't
> > > carryover into that world.
> > >
> > > - Rawlin
> > >
> > > On Tue, Dec 18, 2018 at 2:42 PM Fieck, Brennan
> > > <brennan_fi...@comcast.com> wrote:
> > > >
> > > > +1.
> > > > If there's a simple way to work around duplicate origins being
> > > prohibited,
> > > > then we should rely on that instead of "enumerating all those possible
> > > conflicting
> > > > settings, which are not only highly complex and confusing, but also
> > > further
> > > > entrench us in only supporting ATS as a caching proxy (hurting efforts
> > to
> > > > integrate e.g. Grove, nginx etc.)
> > > > ________________________________________
> > > > From: Rawlin Peters <rawlin.pet...@gmail.com>
> > > > Sent: Tuesday, December 18, 2018 2:20 PM
> > > > To: dev@trafficcontrol.apache.org
> > > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery
> > > Services produces indeterminate parent.config
> > > >
> > > > There are a number of different DS settings at play that can
> > > > potentially cause conflicts. The question is: do we want to get into
> > > > the business of enumerating all those possible conflicting settings or
> > > > just simply prohibit duplicate origins altogether? I think we can dig
> > > > in and get that "sufficiently advanced sql query" to check for
> > > > conflicting origins, but is that something we want to carry along for
> > > > the foreseeable future? Aren't CNAMEs relatively cheaper than
> > > > developing and maintaining that code and the mental overhead required
> > > > in understanding why you're getting an error that says your requested
> > > > DS would cause an origin conflict? I think at the point you've
> > > > requested a DS that would create a conflict, you've chosen those
> > > > settings for a reason and would probably prefer to just create/use a
> > > > CNAME in your new DS and keep the rest of your settings the same.
> > > >
> > > > Thinking in terms of errors, I'm imagining:
> > > > "cannot create delivery service: origin fqdn 'foo.example.com' already
> > > in use"
> > > > vs
> > > > "cannot create delivery service: origin fqdn 'foo.example.com' already
> > > > in use as type DNS_LIVE_NATNL, which is incompatible with your chosen
> > > > type of HTTP_NO_CACHE"
> > > >
> > > > At that point you'd probably say to yourself, "uh, I need
> > > > HTTP_NO_CACHE, so what am I supposed to do now?"
> > > >
> > > > As a lazy developer I'm +1 on prohibiting duplicate origin fqdns
> > > > because the resulting code will be simpler, but I think eliminating
> > > > the mental overhead for operators could be worthwhile too. If we can
> > > > agree on an end state of prohibiting duplicate origins altogether, we
> > > > can start working on a design to smoothly transition us to that point.
> > > > Are we willing to live with "just CNAME your origin fqdn" as the
> > > > standard solution to duplicates?
> > > >
> > > > - Rawlin
> > > >
> > > >
> > > > On Tue, Dec 18, 2018 at 1:27 PM Gelinas, Derek
> > > > <derek_geli...@comcast.com> wrote:
> > > > >
> > > > > The only situation in which they can share origins is if a) the
> > > origins are shared in an MSO configuration but still have different
> > defined
> > > origin fields in the delivery service, or if they're assigned to
> > completely
> > > different cachegroups.  It's when two delivery services share the same
> > > edges that there's an issue, because you end up with parent.config
> > issues.
> > > Actually you could even get away with it in mids as long as you weren't
> > > doing anything like MSO to it.
> > > > >
> > > > > Could get messy real fast, though.  Best to just create a second
> > FQDN.
> > > > >
> > > > > Derek
> > > > >
> > > > > On 12/18/18, 3:23 PM, "Fieck, Brennan" <brennan_fi...@comcast.com>
> > > wrote:
> > > > >
> > > > >     So no two Delivery Services may share an origin *regardless of
> > > cache hierarchy* ? I've been told that DNS Delivery Services can have the
> > > same origin as HTTP Delivery Services because they obey the same cache
> > > hierarchy. You're saying that would still produce invalid output and/or
> > is
> > > explicitly disallowed by ATS?
> > > > >     ________________________________________
> > > > >     From: Robert Butts <r...@apache.org>
> > > > >     Sent: Tuesday, December 18, 2018 1:09 PM
> > > > >     To: dev@trafficcontrol.apache.org
> > > > >     Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe Delivery
> > > Services produces indeterminate parent.config
> > > > >
> > > > >     >can you give an example of what parent.config looks like when 2
> > > ds's share
> > > > >     an origin and have different a different topology?
> > > > >
> > > > >     Answering because I encountered this directly, when rewriting
> > > parent.config.
> > > > >
> > > > >     For example: Suppose you have one Delivery Service:
> > > > >     XML_ID: foo
> > > > >     Type: HTPT_LIVE_NATL
> > > > >     Query String Handling: 1 - ignore in cache key, and pass up
> > > > >     Origin Server Base URL: http://foo.example.net
> > > > >
> > > > >     And another Delivery Service:
> > > > >     XML_ID: bar
> > > > >     Type: HTPT_LIVE_NATL
> > > > >     Query String Handling: 1 - ignore in cache key, and pass up
> > > > >     Origin Server Base URL: http://foo.example.net
> > > > >
> > > > >     ATS only supports unique `dest_domain` entries in parent.config.
> > > Therefore,
> > > > >     the parent.config generated for a server assigned to both of
> > these
> > > Delivery
> > > > >     Services with either be:
> > > > >
> > > > >     dest_domain=foo.example.net port=80 go_direct=true
> > > > >
> > > > >     Or
> > > > >
> > > > >     dest_domain=foo.example.net port=80 go_direct=false
> > > qstring=consider
> > > > >
> > > > >     Right now, it's arbitrary which Perl Traffic Ops inserts, and
> > Perl
> > > provides
> > > > >     no warning or error of any kind (the pending Go parent.config PR
> > > logs an
> > > > >     error).
> > > > >
> > > > >     Whichever is arbitrarily inserted, the resulting remaps for the
> > > other
> > > > >     delivery service will be wrong. Either `foo` requests will drop
> > > the query
> > > > >     string when they shouldn't, and go to the mid when they
> > shouldn't;
> > > or `bar`
> > > > >     requests will use the query string and skip the mid when it
> > > shouldn't.
> > > > >
> > > > >
> > > > >     Does that make sense? The only correct solution, is to somehow
> > > prevent
> > > > >     different DSes having the same origin, and tell tenants they must
> > > use
> > > > >     CNAMEs if they need.
> > > > >
> > > > >     This isn't a bug in Traffic Control. ATS fundamentally doesn't
> > > support
> > > > >     multiple remap rules with the same parent FQDN with different
> > > > >     configurations. Hence, Traffic Control needs to prohibit that.
> > > > >
> > > > >
> > > > >     On Tue, Dec 18, 2018 at 12:24 PM Jeremy Mitchell <
> > > mitchell...@gmail.com>
> > > > >     wrote:
> > > > >
> > > > >     > brennan,
> > > > >     >
> > > > >     > can you give an example of what parent.config looks like when 2
> > > ds's share
> > > > >     > an origin and have different a different topology?
> > > > >     >
> > > > >     > jeremy
> > > > >     >
> > > > >     > On Tue, Dec 18, 2018 at 11:39 AM Fieck, Brennan <
> > > brennan_fi...@comcast.com
> > > > >     > >
> > > > >     > wrote:
> > > > >     >
> > > > >     > > To be clear, the "Warning" I'm talking about would happen at
> > > startup, but
> > > > >     > > I'd like a UI-only constraint to come with that to disallow
> > > using the API
> > > > >     > > to bind the same origin to multiple Delivery Services with
> > > varying
> > > > >     > > topography requirements. It wouldn't change the existing
> > data,
> > > but
> > > > >     > prevent
> > > > >     > > users from creating more bad data.
> > > > >     > >
> > > > >     > > "warning" doesn't really sufficiently describe that, my bad.
> > > > >     > > ________________________________________
> > > > >     > > From: Fieck, Brennan <brennan_fi...@comcast.com>
> > > > >     > > Sent: Tuesday, December 18, 2018 11:24 AM
> > > > >     > > To: dev@trafficcontrol.apache.org
> > > > >     > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe
> > > Delivery Services
> > > > >     > > produces indeterminate parent.config
> > > > >     > >
> > > > >     > > Well the cost of fixing this bug is a constraint on the data.
> > > Unless we
> > > > >     > > make it a UI-only constraint - which I'm personally against -
> > > there must
> > > > >     > be
> > > > >     > > some point in the future where ATC cannot reasonably be
> > > expected to work
> > > > >     > > with data that violates that constraint. The question is when
> > > that should
> > > > >     > > occur, which should likely happen at a minor version release.
> > > Minor not
> > > > >     > > major because it doesn't involve a change in data structures,
> > > merely
> > > > >     > > relationships between them - in my opinion that's a minor
> > > version change
> > > > >     > > but that's definitely up for debate. With several release
> > > candidates for
> > > > >     > > 3.0.0 that _doesn't_ include this restriction already in the
> > > wild, I
> > > > >     > > wouldn't recommend putting it in there. That means to fix the
> > > bug as soon
> > > > >     > > as possible it should go in 3.1.0 which should be the target
> > > of "master"
> > > > >     > > after the 3.0.0 release is cut from it.
> > > > >     > >
> > > > >     > > So I'd recommend immediately implementing the constraint in
> > > master with a
> > > > >     > > refusal to upgrade with bad data, and backport a warning
> > about
> > > the future
> > > > >     > > behavior into 3.0.0 or as part of a 3.0.1 provided we had
> > more
> > > changes
> > > > >     > that
> > > > >     > > would warrant a micro version bump.
> > > > >     > > ________________________________________
> > > > >     > > From: Gray, Jonathan <jonathan_g...@comcast.com>
> > > > >     > > Sent: Tuesday, December 18, 2018 9:34 AM
> > > > >     > > To: dev@trafficcontrol.apache.org
> > > > >     > > Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe
> > > Delivery Services
> > > > >     > > produces indeterminate parent.config
> > > > >     > >
> > > > >     > > -1 Holding an ATC upgrade hostage to data cleanup seems like
> > a
> > > bad idea.
> > > > >     > > The issue isn't great, but it's also not new.  We should
> > allow
> > > teams to
> > > > >     > fix
> > > > >     > > their data at their normal paces if it doesn't create
> > > significant
> > > > >     > overhead
> > > > >     > > or an inherant blocker for new functionality or correction of
> > > other major
> > > > >     > > problems imho.
> > > > >     > >
> > > > >     > > Jonathan G
> > > > >     > >
> > > > >     > >
> > > > >     > > On 12/18/18, 9:28 AM, "Fieck, Brennan" <
> > > brennan_fi...@comcast.com>
> > > > >     > wrote:
> > > > >     > >
> > > > >     > >     Another option is we could detect collisions at startup
> > > and simply
> > > > >     > > refuse to continue with the upgrade until the data is fixed.
> > > That would
> > > > >     > > allow people using the now-unsupported data format to
> > continue
> > > to use
> > > > >     > their
> > > > >     > > old versions of Traffic Ops without wrecking their database,
> > > but also
> > > > >     > > provide an incentive to clean up the data.
> > > > >     > >     ________________________________________
> > > > >     > >     From: Gray, Jonathan <jonathan_g...@comcast.com>
> > > > >     > >     Sent: Tuesday, December 18, 2018 5:12 AM
> > > > >     > >     To: dev@trafficcontrol.apache.org
> > > > >     > >     Subject: Re: [EXTERNAL] Re: Origins assigned to Multipe
> > > Delivery
> > > > >     > > Services produces indeterminate parent.config
> > > > >     > >
> > > > >     > >     I'm generally a fan of constrain your data in your
> > > database, but not
> > > > >     > > necessarily exclusively.  I see this as a one-way
> > > cleanup/conversion so
> > > > >     > it
> > > > >     > > doesn't need to be configurable; otherwise you have to ask
> > the
> > > question
> > > > >     > > what happens if someone turns it off.  That said, something
> > in
> > > the UI
> > > > >     > layer
> > > > >     > > would be nice to prevent spending significant quantities of
> > > time
> > > > >     > building a
> > > > >     > > complex DS only to have it fail to post for reasons that
> > could
> > > have been
> > > > >     > > known earlier.
> > > > >     > >
> > > > >     > >     The way my brain works in this case:
> > > > >     > >     If !unique_constraint_exists_query()
> > > > >     > >             If has_duplicates_query()
> > > > >     > >                     show_warning()
> > > > >     > >             else
> > > > >     > >                     add_unique_constraint()
> > > > >     > >
> > > > >     > >     to which the API and UI configuration could also make use
> > > of
> > > > >     > > unique_constraint_exists_query() to drive additional layer
> > > constraints if
> > > > >     > > desired.
> > > > >     > >
> > > > >     > >     Jonathan G
> > > > >     > >
> > > > >     > >     On 12/17/18, 1:11 PM, "Rawlin Peters" <
> > > rawlin.pet...@gmail.com>
> > > > >     > wrote:
> > > > >     > >
> > > > >     > >         That is an interesting idea...detect at TO startup
> > > whether or not
> > > > >     > >         there are duplicate origins and operate in a "prevent
> > > duplicate
> > > > >     > >         origins" state if no duplicates are found or "prevent
> > > conflicting
> > > > >     > > DS
> > > > >     > >         topologies" state if duplicates are found? So once
> > > operators have
> > > > >     > >         replaced all the duplicate origins with CNAMEs, TO
> > will
> > > > >     > essentially
> > > > >     > >         operate in a "prohibit all duplicate origins" state.
> > > That would
> > > > >     > >         probably make for a simpler transition, but I'd want
> > > to remove
> > > > >     > that
> > > > >     > >         logic in a following release that strictly prohibits
> > > duplicate
> > > > >     > > origins
> > > > >     > >         (assuming that the community agrees we should
> > prohibit
> > > duplicate
> > > > >     > >         origins altogether).
> > > > >     > >
> > > > >     > >         As for DB constraints vs UI, I was thinking those
> > > DS-type
> > > > >     > > constraints
> > > > >     > >         I pointed out would live in the API. It would
> > > basically be added
> > > > >     > >         validation in the deliveryservices POST/PUT endpoint
> > > that checks
> > > > >     > > the
> > > > >     > >         DB for existing DSes that conflict with the requested
> > > DS.
> > > > >     > >
> > > > >     > >         - Rawlin
> > > > >     > >
> > > > >     > >         On Mon, Dec 17, 2018 at 12:35 PM Gray, Jonathan
> > > > >     > >         <jonathan_g...@comcast.com> wrote:
> > > > >     > >         >
> > > > >     > >         > These kinds of conditions should be detectable
> > with a
> > > > >     > > sufficiently advanced SQL query.  Is it possible to add the
> > > constraint if
> > > > >     > > it passes and emit a warning during TO startup otherwise?
> > > That would let
> > > > >     > > you know the condition exists at startup but not getting in
> > > your way and
> > > > >     > > keep you out of trouble once you've cleaned up.  We made a
> > > mistake early
> > > > >     > > on, but this would acknowledge it was bad and encourage it to
> > > be fixed at
> > > > >     > > the speed of operations teams.  Also this puts the constraint
> > > in the
> > > > >     > > database rather than the UI which is really where the
> > > contention is for
> > > > >     > > usability.
> > > > >     > >         >
> > > > >     > >         > Jonathan G
> > > > >     > >         >
> > > > >     > >         >
> > > > >     > >         > On 12/17/18, 11:38 AM, "Rawlin Peters" <
> > > > >     > rawlin.pet...@gmail.com>
> > > > >     > > wrote:
> > > > >     > >         >
> > > > >     > >         >     We occasionally discuss this issue but haven't
> > > tackled it
> > > > >     > > yet. I think
> > > > >     > >         >     the main issue is just that duplicate origins
> > > have been
> > > > >     > > allowed since
> > > > >     > >         >     the beginning, and now everyone's Traffic Ops
> > > could be
> > > > >     > > littered with
> > > > >     > >         >     duplicate origins. Also, depending on the
> > config
> > > of the
> > > > >     > > duplicate
> > > > >     > >         >     delivery services, the origins might not be in
> > > conflict at
> > > > >     > > all (if
> > > > >     > >         >     they don't have different topology
> > constraints).
> > > I would
> > > > >     > > love for us
> > > > >     > >         >     to just add a uniqueness constraint, but there
> > > would need
> > > > >     > to
> > > > >     > > be a fair
> > > > >     > >         >     amount of warning to the community before doing
> > > so and
> > > > >     > might
> > > > >     > >         >     invalidate a significant amount of valid use
> > > cases.
> > > > >     > > Operators would
> > > > >     > >         >     need time to make DNS CNAME records for the
> > > duplicate
> > > > >     > > origins and
> > > > >     > >         >     update their DSes to use the different CNAMEs.
> > > > >     > >         >
> > > > >     > >         >     I think as a good first step to eliminating the
> > > use of
> > > > >     > > duplicate
> > > > >     > >         >     origins altogether, we should identify which
> > > "topology
> > > > >     > > constraints"
> > > > >     > >         >     actually cause conflicting config when used
> > with
> > > duplicate
> > > > >     > > origins and
> > > > >     > >         >     prevent creating DSes with duplicate origins
> > _if
> > > it would
> > > > >     > > cause a
> > > > >     > >         >     conflict with an existing DS that uses the same
> > > origin_.
> > > > >     > >         >
> > > > >     > >         >     For instance, I believe an HTTP and DNS-type DS
> > > can live
> > > > >     > > happily
> > > > >     > >         >     side-by-side using the same origin (probably
> > > need different
> > > > >     > >         >     routing_names?), but scenarios like HTTP and
> > > HTTP_LIVE, or
> > > > >     > > DNS and
> > > > >     > >         >     HTTP_NO_CACHE sharing the same origin will
> > cause
> > > conflicts
> > > > >     > > for sure.
> > > > >     > >         >     So maybe we can start by making sure the DS
> > > types "match"
> > > > >     > > when using
> > > > >     > >         >     the same origin:
> > > > >     > >         >     HTTP + DNS: possibly good, if they have
> > > different routing
> > > > >     > > names?
> > > > >     > >         >     HTTP_LIVE + HTTP_LIVE_NATNL: bad
> > > > >     > >         >     HTTP_NO_CACHE + [any other type]: bad
> > > > >     > >         >     HTTP_LIVE + HTTP: bad
> > > > >     > >         >     etc.
> > > > >     > >         >
> > > > >     > >         >     There are most likely other conflict scenarios
> > > that don't
> > > > >     > > involve the
> > > > >     > >         >     DS types, but I think this would be a good
> > > start. In the
> > > > >     > > future with
> > > > >     > >         >     Delivery Service Topologies (aka Flexible
> > > Cachegroups aka
> > > > >     > > Bring Your
> > > > >     > >         >     Own Topology), we might be able to prohibit
> > > assigning a DS
> > > > >     > > to a
> > > > >     > >         >     Topology if the DS's origin is already used by
> > > another DS
> > > > >     > in
> > > > >     > > a
> > > > >     > >         >     different Topology.
> > > > >     > >         >
> > > > >     > >         >     - Rawlin
> > > > >     > >         >
> > > > >     > >         >     On Mon, Dec 17, 2018 at 10:52 AM Fieck, Brennan
> > > > >     > >         >     <brennan_fi...@comcast.com> wrote:
> > > > >     > >         >     >
> > > > >     > >         >     > As some of you may be aware, `parent.config`
> > > files
> > > > >     > > generated by Traffic Ops can vary wildly when an origin is
> > > assigned to
> > > > >     > > multiple Delivery Services. This results in undefined
> > > behavior. I'm told
> > > > >     > > that the conflict only happens when two Delivery Services
> > with
> > > different
> > > > >     > > "topology requirements" use the same origin, whatever that
> > > means (content
> > > > >     > > routing type?). Regardless, the issue should be addressed.
> > The
> > > obvious
> > > > >     > > solution is to put in place a database constraint that
> > > prevents an origin
> > > > >     > > from being assigned to more that one Delivery Service with
> > API
> > > checks in
> > > > >     > > place that would provide helpful error messages when an
> > > attempt is made
> > > > >     > to
> > > > >     > > violate the constraint. However, would that mess with things
> > > like
> > > > >     > > Multi-Site Origin? Or is it just not viable for some other
> > > reason? If it
> > > > >     > is
> > > > >     > > a good solution, I'm prepared to work on a fix that utilizes
> > > it.
> > > > >     > >         >
> > > > >     > >         >
> > > > >     > >
> > > > >     > >
> > > > >     > >
> > > > >     > >
> > > > >     > >
> > > > >     >
> > > > >
> > > > >
> > >
> >

Reply via email to