Re: Backup Cache Group Selection

Dewayne Richardson Mon, 12 Mar 2018 07:43:37 -0700

Eric, I agree with this as well, maybe we build separate API's for
"loading" CZF formats (or any other types of external data) into the same
area of the data model (however that ultimately looks).  If we keep the CZF
data centralized it'll be easier to build relationships if needed.



-Dew

On Mon, Mar 12, 2018 at 7:14 AM, Eric Friedrich (efriedri) <
efrie...@cisco.com> wrote:

> Good point. I think it makes sense to move both the backupList and
> coordinates into the CG API. By move coordinates into the API, I’m implying
> that we consolidate into 1 set of coordinates per CG. The existing CG
> coordinates would now be used for both backup edge CG selection and initial
> edge cg selection when doing client geolocation. This can certainly happen
> in a different PR than the backupList getting committed.
>
> It looks as though the CZ file will give us more flexibility, but it is
> unneeded. Today that flexibility is only causing more complexity and more
> operational overhead.
>
> —Eric
>
>
> > On Mar 9, 2018, at 5:21 PM, Rawlin Peters <rawlin.pet...@gmail.com>
> wrote:
> >
> > So in your CZF example, we can't actually have two CZs using the same
> > name ("a"), because when that JSON gets parsed one of the CZs will be
> > overwritten so that there is only one "a" key in the JSON. The names
> > would have to be "a1" and "a2" for instance and backupList [a, b, c]
> > and [a, c, b], but I see your point in how that might be useful. But
> > at that point maybe it's just better to create two empty cachegroups
> > "a1" and "a2" in the API that first fall back to cachegroup "a".
> >
> > If I'd been around at the time coordinates were added to the CZF, I
> > think I would've -1'd the idea. The coordinates for Client Source and
> > Cache Group Destination should be the same typically because you
> > should know based upon your network topology which cachegroup to route
> > clients to. By being able to specify backups, the coordinates in the
> > CZF would become even less relevant. And fwiw at Comcast we generate
> > the CZF using the coordinates from the Cache Group API anyways, so
> > they always match.
> >
> > Another major concern with this is the scalability of having the
> > config in the CZF alone. If we can change these backupZone relations
> > using the API, that means we can add it to Traffic Portal and have a
> > much better UX rather than hard-coding these relations in whatever
> > script we're using to generate the CZF. We'd get benefits like
> > validation and typo safety, and who knows maybe in the future we could
> > have a map in TP to visualize the relationships between cache groups
> > for troubleshooting.
> >
> > - Rawlin
> >
> > On Fri, Mar 9, 2018 at 2:22 PM, Eric Friedrich (efriedri)
> > <efrie...@cisco.com> wrote:
> >> "I can't imagine why we'd ever want the two sets of coordinates to
> differ for the same Cache Group. “
> >> Maybe someone else can chime in about why coordinates were added to the
> CZF in the first place, but I’ve also thought of them like this:
> >> CG API Coordinates - Where the cache servers are. To be used as a
> destination location routing traffic towards
> >> CZF Coordinates - Where the clients are. To be used as the source
> location routing traffic from these coordinates to the CG API coordinates.
> >>
> >> I could see cases where:
> >> a) you might set the CG coordinates of DenCG to Denver’s lat long
> >> b) you might set the CZF Coordinates of that coverageZone to somewhere
> other than Denver (because the coverageZone is not a perfect circle
> centered on Denver, like the caches might be)
> >>
> >> So I think there are valid reason’s for two sets of coordinates to
> exist, but I’m not sure if people set them differently in practice? If they
> are always the same, for every CG it seems like they should get
> consolidated. (I don’t think we’re using CZF coordinates currently)
> >>
> >> Second Point:
> >>  By having the backupList in the CZF, we have more granularity on how
> clients are routed during failures. For example, could we create a CZF that
> looks like
> >> “coverageZones”: {
> >>  “a”: {
> >>    “backupList”: [b,c]
> >>    “networks": [1,2,3]
> >>  },
> >>
> >> “a”: {
> >>    “backupList”: [c,b]
> >>    “networks": [4,5,6]
> >>  }
> >> }
> >>
> >> Here all clients are part of the “a” CacheGroup/Coverage Zone, but
> depending on their specific subnet they have different backup policies.
> >>
> >> Our particular requirement for this feature is a backup at the
> CacheGroup level, not the CZ level as I’ve shown here- so perhaps we’re
> overbuilding it.
> >>
> >> —Eric
> >>
> >>
> >>
> >>
> >>
> >>> On Mar 9, 2018, at 4:05 PM, Rawlin Peters <rawlin.pet...@gmail.com>
> wrote:
> >>>
> >>> Ok, so the coordinates in the CZF are only used when no available
> >>> cache is found in the matched cachegroup. Rather than using the
> >>> coordinates of the matched cachegroup that it already knows about from
> >>> the API, Traffic Router uses the coordinates from the CZF cachegroup
> >>> instead. That seems...not great?
> >>>
> >>> That means we basically have two sets of coordinates for the same
> >>> cachegroup, one set in the API and one in the CZF. The API coordinates
> >>> are used when caches are available, but the CZF coordinates are used
> >>> when no caches are available in the matched CG. To me it seems like
> >>> the CG coordinates from the API should always take precedent over the
> >>> CZF. In fact the CZF coordinates are optional, but TR currently won't
> >>> use the coordinates from the API even if the CZF has no coordinates.
> >>> That sounds like a bug to me.
> >>>
> >>> Should we update TR to let the API coordinates take precedence over
> >>> the CZF coordinates when available? I can't imagine why we'd ever want
> >>> the two sets of coordinates to differ for the same Cache Group. Then
> >>> the coordinates backup logic will truly live in the Cache Group API,
> >>> and we could put the new backup config there as well. After that,
> >>> coordinates would only be needed in the CZF for coverage zones that
> >>> don't map to cache groups.
> >>>
> >>> - Rawlin
> >>>
> >>> On Fri, Mar 9, 2018 at 12:46 PM, Eric Friedrich (efriedri)
> >>> <efrie...@cisco.com> wrote:
> >>>> I think the original reason for putting it in the CZF was to stay
> consistent with the coordinates backup logic which is also in the CZF.
> >>>>
> >>>> Unless you have multiple “coordinates” for different networks in the
> same zone, would it also make sense to add the coordinates to the Cache
> Group API as well?
> >>>>
> >>>> I think all of our zones are also Cachegroups.
> >>>>
> >>>> —Eric
> >>>>
> >>>>> On Mar 9, 2018, at 1:31 PM, Rawlin Peters <rawlin.pet...@gmail.com>
> wrote:
> >>>>>
> >>>>> Hey Eric (and others),
> >>>>>
> >>>>> I'm resurrecting this thread because the PR [1] implementing this
> >>>>> proposed functionality is just about ready to be merged. The full
> >>>>> mailing list discussion can be read here [2] if interested.
> >>>>>
> >>>>> I've discussed this PR a bit more with my colleagues here at Comcast,
> >>>>> and while it provides the functionality we need, we think in the
> >>>>> long-term this configuration should live in the Cache Group API in
> >>>>> Traffic Ops rather than just the Coverage Zone File.
> >>>>>
> >>>>> However, after reading your initial proposal below, it sounds like
> you
> >>>>> might have Coverage Zones in your CZF that don't necessarily map back
> >>>>> to Cache Groups in TO. Might that be the case? That scenario seems to
> >>>>> be allowed by Traffic Router but might not necessarily be "supported"
> >>>>> given the CZF docs [3] that state:
> >>>>>> "The Coverage Zone File (CZF) should contain a cachegroup name to
> network prefix mapping in the form:"
> >>>>>
> >>>>> If we do indeed "support" this scenario, that would mean that having
> >>>>> the backupZone config only in TO wouldn't solve all your use cases if
> >>>>> your CZF heavily uses Coverage Zones that don't directly map to a
> >>>>> Cache Group in TO.
> >>>>>
> >>>>> If we should officially support this scenario, then maybe we merge
> the
> >>>>> PR [1] as is, then later we can augment the feature so that we can
> use
> >>>>> the Cache Group API to provide the backupZone config as well as in
> the
> >>>>> CZF. If the config was provided in both the API and the CZF, then the
> >>>>> API would take precedent.
> >>>>>
> >>>>> If this scenario should NOT officially be supported, then I think we
> >>>>> should update the PR [1] to have Traffic Router parse the config from
> >>>>> CRConfig.json rather than the CZF and augment the Cache Group API to
> >>>>> support the backupZone config. I think this would be the most ideal
> >>>>> solution, but I also don't want to sign up our contributors for extra
> >>>>> work that they weren't planning on doing. I'd be happy to help
> augment
> >>>>> this feature on the TO side.
> >>>>>
> >>>>> What do you all think of this proposal? TO-only or both TO and CZF?
> >>>>>
> >>>>> - Rawlin
> >>>>>
> >>>>> [1] https://github.com/apache/incubator-trafficcontrol/pull/1908
> >>>>> [2] https://lists.apache.org/thread.html/
> b033b3943c22a606370ad3981fa05fb0e7039161b88bbc035bc49b25@%
> 3Cdev.trafficcontrol.apache.org%3E
> >>>>> [3] http://traffic-control-cdn.readthedocs.io/en/latest/
> admin/traffic_ops/using.html#the-coverage-zone-file-and-asn-table
> >>>>>
> >>>>> On 2016/12/22 19:28:17, Eric Friedrich (efriedri) <
> efrie...@cisco.com> wrote:
> >>>>>> The current behavior of cache group selection works as follows
> >>>>>> 1) Look for a subnet match in CZF
> >>>>>> 2) Use MaxMind/Neustar for GeoLocation based on client IP. Choose
> closest cache group.
> >>>>>> 3) Use Delivery Service Geo-Miss Lat/Long. Choose closest cache
> group.
> >>>>>>
> >>>>>>
> >>>>>> For deployments where IP addressing is primarily private (say
> RFC-1918 addresses), client IP Geo Location (#2) is not useful.
> >>>>>>
> >>>>>>
> >>>>>> We are considering adding another field to the Coverage Zone File
> that configures an ordered list of backup cache groups to try if the
> primary cache group does not have any available caches.
> >>>>>>
> >>>>>> Example:
> >>>>>>
> >>>>>> "coverageZones": {
> >>>>>> "cache-group-01": {
> >>>>>> “backupList”: [“cache-group-02”, “cache-group-03”],
> >>>>>> "network6": [
> >>>>>> "1234:5678::\/64”,
> >>>>>> "1234:5679::\/64"],
> >>>>>> "network": [
> >>>>>> "192.168.8.0\/24",
> >>>>>> "192.168.9.0\/24”]
> >>>>>> }
> >>>>>>
> >>>>>> This configuration could also be part of the per-cache group
> configuration, but that would give less control over which clients
> preferred which cache groups. For example, you may have cache groups in LA,
> Chicago and NY. If the Chicago Cache group fails, you may want some of the
> Chicago clients to go to LA and some to go to NY. If the backup CG
> configuration is per-cg, we would not be able to control where clients are
> allocated.
> >>>>>>
> >>>>>> Looking for opinions and comments on the above proposal, this is
> still in idea stage.
> >>>>>>
> >>>>>> Thanks All!
> >>>>>> Eric
> >>>>
> >>
>
>

Re: Backup Cache Group Selection

Reply via email to