Re: Question about the poll model of the Traffic Monitor

Zhilin Huang (zhilhuan) Wed, 28 Mar 2018 21:42:07 -0700

Hi Guys,

Thanks a lot for the discussion. I should put the design earlier for review, 
and sorry for the delay. Here is the link for the design doc:
https://docs.google.com/document/d/1vgq-pGNoLLYf7Y3cu5hWu67TUKpN5hucrp-ZS9nSsd4/edit?usp=sharing


Short summary for the feature design:
---
There is feature request from market to add secondary IPs support on edge cache 
servers, and the functionality to assign a delivery service to a secondary IP 
of an edge cache.

This feature requires Traffic Ops implementation to support secondary IP 
configuration for edge cache, and delivery service assignment to secondary IP. 

Traffic Monitor should also monitor connectivity of secondary IPs configured. 
And Traffic Router needs support to resolve streamer FQDN to secondary IP 
assigned in a delivery service.

Traffic Server should record the IP serving client request. And should reject 
request to an unassigned IP for a delivery service.

This design has taken compatibility into consideration: if no secondary IP 
configured, or some parts of the system has not been upgraded to the version 
supports this feature, the traffic will be served by primary IPs as before.
---

Replies for Robert's comments is embedded in the email thread. Much appreciated 
and welcome to any further comments.

Thanks,
Zhilin




On 29/03/2018, 10:19 AM, "Neil Hao (nbaoping)" <nbaop...@cisco.com> wrote:

    Hi Robert/Nir,
    
    Thanks very much for the quick and detail reply, and sorry for that I 
didn’t make the whole feature clearly. Actually, it’s our Secondary IP feature, 
which is a big feature that will bring change to all the components in the 
Traffic Control. I thought our teammate reviewed the design with you guys 
before, but it seems not. And after discussion, we will start the whole feature 
design review with you guys soon, I think it will be better to continue the 
discussion after that.
    
    Thanks,
    Neil
    
    On 3/29/18, 1:16 AM, "Robert Butts" <robert.o.bu...@gmail.com> wrote:
    
        I agree with Nir, it's not as simple as changing a structure to `[]URL`,
        it's a bigger architectural design question.
        
        How do you plan to mark caches Unavailable if they're unhealthy on one
        interface, but healthy on another?
        
        Right now, Traffic Router needs a boolean for each cache, it doesn't 
know
        anything about multiple network interfaces, IPv4 vs IPv6, etc. It only
        knows the FQDN, which is all the clients it's giving DNS records to will
        know when they request the cache.
        
        Questions:
        Is a cache marked Unavailable when any interface is unreachable? Or all 
of
        them?
ZH> Actually, we will care about an IP availability instead of interface 
availability. Please take a look at 3.1.2 of the design doc.

        What if an interface is reachable, but one interface reports different
        stats than another interface? For example, what if someone configures a
        different caching proxy (ATS) on each interface?
ZH> Will only use 1 ATS to serve traffic from all IPs configured.

        How are stats aggregated? Should the monitor aggregate all stats from
        different polls and interfaces together, and consider them the same
        "server"? If not, how do we reconcile the different stats with what the
        Monitor reports on `CrStates` and `CacheStats`? If so, again, what 
happens
        if different interfaces have different ATS instances, so e.g. the byte
        count on one is 100, and the other is 1000, then 101, then 1001. It 
simply
        won't work. Do we handle that? Or just ignore it, and document "all
        interfaces must report the same stats"? Do we try to detect that and 
give a
        useful error or warning?
ZH> The bandwidth for interfaces will be aggregated. We will only have 1 ATS to 
serve traffic from all interfaces. The connectivity check is IP based. And the 
stats collection will be interface based. Please take a look at 3.1.2 of the 
design doc for details.

        In Traffic Ops, servers have specific data used for polling. Traffic
        Monitor gets the stats URI path from Parameters, and the URI IP from the
        Servers table. It doesn't use the FQDN, Server Host or Server Domain. 
Where
        would these other interfaces come from? Parameters? Or another table 
linked
        to the servers table? (I'd really, really rather we didn't put more 
data in
        unsafe Parameters, which can not exist, not be properly formatted, need
        safety checks in all code that ever uses them, and are confusing and 
opaque
        to new users) Would these other interfaces be in addition to using the 
IP
        from the Server table? Or replace it?
        
        Do we have config options for all of these? Only some of them? In the
        config file, or Traffic Ops fields?
ZH> Please take a look at 3.1.1 of the design doc. Basically, we will add new 
APIs, or new fields to existing APIs. So this feature implementation will not 
impact existing functionality.
        
        I'd like to hear the use case too, and e.g. why it isn't better to 
simply
        make each different interface a different server in Traffic Ops? How is 
the
ZH> We discussed this solution too. But the main issue is running ort script 
for one server will overwrite the ATS configuration for anther server. The use 
case is our customer want different client to be served by different IP. For 
example a mobile client will be served by different IP of a PC client.
        Traffic Router routing to them, anyway? Are you setting up the same DNS
        record to point to the IPs of all interfaces? How is that configured in
ZH> For each edge, each DS will be assigned to a single IP. If no secondary IP 
specified, it will work just as the behavior today. Please take a look at 3.1.3 
of the design doc.
        Traffic Ops then? I.e. which interfaces are configured as the Server IP 
and
        IP6? Are we certain there aren't other issues in other Traffic Control
        components, with a Server IP and IP6 not having a one-to-one 
relationship
        with the FQDN A/AAAA record?
ZH> Please check 3.1.1 of the design doc. There will be new pages for secondary 
IPs configuration, the current functionality should not be impacted.
        
        Do we need to take the bigger step, of having a Traffic Ops Server have 
an
        array of IPs? That's a lot more work (especially making sure it works
        everywhere, e.g. Traffic Router), but it solves a lot of questions and
        hackery, gives us a lot more flexibility, and matches the physical 
reality
        better.
ZH> When making this design, we are trying to avoid impact to current 
functionality and compatibility with earlier version. So we add extra tables or 
fields for secondary IPs.
        
        I'm not opposed to the idea, but we need to think through the 
architecture,
        we need to be sure the added complexity is worth it over existing
        solutions, we need to make all the options (e.g. Unavailable if any vs 
all)
        configurable, and we need to make sure the common simple case of a 
single
        Server IP and IP6 still work without additional configuration 
complexity.
ZH> Yes, agree with you. We are trying to not impact the existing solution. 
Please take a look at the design doc for more details.
        

        
        On Wed, Mar 28, 2018 at 10:19 AM, Nir Sopher <n...@qwilt.com> wrote:
        
        > Hi Eric/Neil,
        > Isn't the question of supporting multi interfaces per server a much 
wider
        > question? Architectural wise.
        > What would be the desired behavior if the monitoring shows that only 
one of
        > the interfaces is down? Will the router send traffic to the healthy
        > interfaces? How?
        > Nir
        >
        > On Wed, Mar 28, 2018, 19:10 Eric Friedrich (efriedri) 
<efrie...@cisco.com>
        > wrote:
        >
        > > The use case behind this question probably deserves a longer dev@ 
email.
        > >
        > > I will oversimplify: we are extending TC to support multiple IPv4 
(or
        > > multiple IPv6) addresses per edge cache (across 1 or more NICs).
        > >
        > > Assume all addresses are reachable from the TM.
        > >
        > > —Eric
        > >
        > >
        > > > On Mar 28, 2018, at 11:37 AM, Robert Butts 
<robert.o.bu...@gmail.com>
        > > wrote:
        > > >
        > > > When you say different interfaces, do you mean IPv4 versus IPv6? 
Or
        > > > something else?
        > > >
        > > > If you mean IPv4 vs IPv6, we have a PR for that from Dylan Volz
        > > > https://github.com/apache/incubator-trafficcontrol/pull/1627
        > > >
        > > > I'm hoping to get to it early next week, just haven't found the 
time to
        > > > review and test it yet.
        > > >
        > > > Or did you mean something else by "interface"? Linux network
        > interfaces?
        > > > Ports?
        > > >
        > > >
        > > > On Wed, Mar 28, 2018 at 12:02 AM, Neil Hao (nbaoping) <
        > > nbaop...@cisco.com>
        > > > wrote:
        > > >
        > > >> Hi,
        > > >>
        > > >> Currently, we poll exact one URL request to each cache server 
for one
        > > >> interface, but now we’d like to add multiple interfaces support,
        > > therefore,
        > > >> we need multiple requests to query each interface of the cache
        > server, I
        > > >> check the code of Traffic Monitor, it seems we don’t support 
this kind
        > > of
        > > >> polling, right?
        > > >>
        > > >> I figure out different ways to support this:
        > > >> 1) The first way: change the ‘Urls’ field in the 
HttpPollerConfig from
        > > >> ‘map[string]PollConfig’ to ‘map[string][]PollConfig’, so that we 
can
        > > have
        > > >> multiple polling config to query the multiple interfaces info.
        > > >>
        > > >> 2) The second way: Change the ‘URL’ field in the PollConfig from
        > > ‘string’
        > > >> to ‘[]string’.
        > > >>
        > > >> No matter which way, it seems it will bring a little big change 
to the
        > > >> current polling model. I’m not sure if I’m on the right 
direction,
        > would
        > > >> you guys have suggestions for this?
        > > >>
        > > >> Thanks,
        > > >> Neil
        > > >>
        > >
        > >
        >

Re: Question about the poll model of the Traffic Monitor

Reply via email to