Re: [Dyncast] CAN BoF issues #7 #17 #32

Joel Halpern Fri, 17 Jun 2022 06:36:17 -0700

If CAN does isntance selection and DSCP marking, then it can influencerouting and select appropraite instances. It is an understandable,deployable, and probably scalable solution with the selection andmarking deployed at an appropriate place. If that place is the PE, andwe want to use a different IP address, then it probably uses a tunnel todeliver the packets. If that place is the end host, then it can do whatit wants.

However, if you expect the routing system to be including and respectinginformation about compute end point capabilities, and want the routingsystem to manage suitable server stickiness and all the other neededproperties, then I think the version of CAN you are describing is a badidea that will harm the infrastructure by mixing funcitonality ininappropriate places.



Yours,

Joel

PS: I retain rtgwg on the copy for now, but as far as I can tell thisdiscussion belongs exclusively on the dyncast list.



On 6/17/2022 4:13 AM, [email protected] wrote:

Hi Dirk,

For mode 1, CAN is only aware of computing information, because thebasic routing could select the 'best' path naturely.

For mode 2, CAN could also know more about the network path when thecomputing node selection is done, for instance, SR policy, networkslicing, detnet, etc. and then utilize them. I don't think it willinfluence the underlay routing, some apps could require for thespecific routing policy/strategy even there is no CAN service.

CAN aims to provide the joint optimization service to specificapplications. The difference is that whether to select the 'best'resource all the time, or just select the 'appropriate' one based onmore awareness and decision making.


Regards,
Peng
------------------------------------------------------------------------
[email protected]

    *From:* Dirk Trossen <mailto:[email protected]>
    *Date:* 2022-06-17 15:01
    *To:* Linda Dunbar <mailto:[email protected]>;
    [email protected]; dyncast <mailto:[email protected]>
    *CC:* rtgwg <mailto:[email protected]>; David R. Oran
    <mailto:[email protected]>; jefftant.ietf
    <mailto:[email protected]>
    *Subject:* RE: [Dyncast] CAN BoF issues #7 #17 #32

    Hi Linda, Peng, all,

    Let us tease apart what “include the path selection” may mean
    since the nature of this inclusion may be significant in difference.

    For this, let us assume a service instance S_1 as one of possibly
    several ones for service S. S_1 may be reachable over a number of
    network paths, the selection of some of which would significantly
    impact any compute-aware selection of S_1 over the other available
    service instances for S. I can see two modes of ‘including path
    selection”:

    1.S_1 exposes two (or more) IP addresses, where each IP address
    reflects a path from the client to the exposed address. IP
    addresses may be exposed across more than one network operator,
    multi-homing the service instance. Now here, ‘path selection’ is
    indirectly done by picking one IP address over all others,
    including the IP addresses of other service instances, and indeed,
    such indirect path selection may well be done through a metric
    that measures against (at least one) crucial path-related metric.
    But ultimately, the CAN provider selects one of possibly many IP
    address still, right? More importantly, it remains the task of the
    underlay routing infrastructure (again, which could include more
    than one network operator) to determine what it deems as the
    ‘best’ path to each of the IP addresses (including the multi-homed
    S_1 addresses).

    2.Let’s stick with one IP address to S_1 now though but there are
    still at least two possible paths to it, where the selection of
    one over any of the other possible ones could well impact the
    compute-aware suitability of S_1 over any of the other service
    instances. Problem here is that ‘including the path selection’
    would mean to impact the routing to the single S_1 IP address in a
    manner that that routing decision takes the compute-awareness into
    account. The path selection here is not indirect but direct,
    together with the IP address (i.e., service instance endpoint)
    selection. What is required here is that CAN provider and underlay
    somehow work together in selecting one path over another (to the
    same IP address), which in turn would mean to impact the overall
    routing decision for S_1’s IP address, which in turn would mean to
    impact the underlay routing infrastructure since the resulting
    (compute-aware) path configuration, in the form of suitable
    forwarding entries, needs distribution in the underlay
    infrastructure.

    I think we have to be clear which of the two options we see in the
    CAN scope but also if I may have missed options here. As we can
    see already from those two options, they  have a significant
    impact on the architecture we may envision for CAN but also for
    its solution adoption. From my side, I have seen CAN mainly as an
    endpoint selection problem, so understood ‘path selection’ as an
    indirect one in the manner described in item 1. I just want to
    throw the options out here to solicit feedback from the community
    on this so that we get a good understanding moving forward.

    Best,

    Dirk

    *From:* Dyncast [mailto:[email protected]] *On Behalf Of
    *Linda Dunbar
    *Sent:* 15 June 2022 23:07
    *To:* [email protected]; dyncast <[email protected]>
    *Cc:* rtgwg <[email protected]>; David R. Oran
    <[email protected]>; jefftant.ietf <[email protected]>
    *Subject:* Re: [Dyncast] CAN BoF issues #7 #17 #32

    Peng,

    For Issue #32, you said: “CAN does not compute path, it selects
    endpoints.”

    If CAN means Computing Aware Networking, it should include the
    path selection. Maybe CAN is about  Selecting (or computing) the
    optimal paths based on the combination of network conditions and
    the end point computing available resources?

    My two cents,

    Linda

    *From:* Dyncast <[email protected]> *On Behalf Of
    *[email protected]
    *Sent:* Monday, June 13, 2022 10:00 PM
    *To:* dyncast <[email protected]>
    *Cc:* rtgwg <[email protected]>; David R. Oran
    <[email protected]>; jefftant.ietf <[email protected]>
    *Subject:* [Dyncast] CAN BoF issues #7 #17 #32

    Dear All,

    Here are the responses to issues #7 #17 #32, any comments are
    welcome!  The issues and responses are also copied to the
    questioner (
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HA8ebRR0zU586fKOEn%2BX245pVB5wQ51BBnJjXYWD4dw%3D&reserved=0>https://datatracker.ietf.org/doc/minutes-113-can/
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HA8ebRR0zU586fKOEn%2BX245pVB5wQ51BBnJjXYWD4dw%3D&reserved=0>)
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HA8ebRR0zU586fKOEn%2BX245pVB5wQ51BBnJjXYWD4dw%3D&reserved=0>,
    hope for further suggestions and confirmation. Thanks!

    #7 This seems to assume conventional non-distributed applications
    just running at the edge. What about modern frameworks like
    Sapphire? and Ray?
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F7&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DpLlwOTLZ8V7gF%2B2JvSBIbXUnEqpEdpVWfYzv9IgRzA%3D&reserved=0>

    It would be good to understand the multi-site requirements of such
    frameworks, which seems to mainly run in single DCs.

    _#17 Whether the interests of the organization deploying the
    application and the organization providing the network
    connectivity are aligned. Google doesn't worry about this because
    they are both.
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F17&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4%2B%2FmX48%2FoHZRp8m7xVV9kOitmL6pmfb56M%2F8bGPNNDM%3D&reserved=0>_

    The question is more what the scope and semantic of information is
    that will need to cross organizational boundaries. This needs
    further study, in particular when assuming stakeholder division
    between service and network provider.

    _ #32 How to effectively compute paths? Shall we put CPUs into
    account?
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F32&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pEZtXQ54gaT4Bx4gwrKyJWyBLM6YImEwnSpg%2B5m%2FiO4%3D&reserved=0>_

    CAN does not compute path, it selects endpoints. Path selection
    (to a given endpoint) is subject to the routing at the IP
    underlay. For selecting endpoints, CPU information may be taken
    into account to achieve the 'compute-awareness' that CAN strives for.

    You can also add your comments to any of
    them(https://github.com/CAN-IETF/CAN-BoF-ietf113/issues
    
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=C2YRche0EjTbxhZWVwHSvYhN8OA7SCcfXhLSFA%2Bqnbk%3D&reserved=0>).


    Regards,

    Peng

    ------------------------------------------------------------------------

    [email protected]

            *From:*Linda Dunbar <mailto:[email protected]>

            *Date:* 2022-05-11 06:11

            *To:*[email protected] <mailto:[email protected]>

            *Subject:* [Dyncast] Categories of the CAN BoF issues

            CAN BoF proponents:

            Many thanks for creating the CAN BoF issues tracking  in
            the Github:
            
https://github.com/CAN-IETF/CAN-BoF-ietf113/issues/created_by/CAN-IETF?page=1&q=is%3Aopen+is%3Aissue+author%3ACAN-IETF
            
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2Fcreated_by%2FCAN-IETF%3Fpage%3D1%26q%3Dis%253Aopen%2Bis%253Aissue%2Bauthor%253ACAN-IETF&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C4067359765a3464eebd408da4db152f5%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637907721259352014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZqH4%2FI1csqsOVjpnw1TmFJJzMX86fCfPzgjbjfAnJHY%3D&reserved=0>

            I went through the issues captured in the Github and
            characterized them into groups. Some issues can be lumped
            together for the discussion. There are quite a few issues
            related to the requirements, which need to be clarified.

            Best Regards, Linda

            *Issues associated with Applications vs. Underlay networks:*

            ·Consider not to load underlay network with application
            details. #35

            ·We have multiple upper layer application. Do we have
            additional needs for routing(e.g. WG?) or we are using
            those applications and won't need such new WG? #30

            ·It needs application information too, so it can't just
            make a decision at the network layer. #23

            ·This is not striked as a routing problem; it's all
            service discovery that can be done in higher layers. #21

            ·*3GPP and URSP solve this based on UPF selection. It uses
            both endpoint + application. #20*

            ·One overlay plane per application. Resources/metric
            specific to the plane. #19

            ·How does the application layer or the transport layer
            learn the network status to steering traffic? #16

            *Need more clear requirements for CAN (*to be addressed by
            draft-liu-dyncast-ps-usecases*):*

            ·Need to understand if three are requirement to avoid
            extra messages or 1ms of latency #36

            ·Regarding the flow affinity, is it from network
            perspective or from application/computation perspective? #33

            ·How to effectively compute paths? Shall we put CPUs into
            account? #32

            ·*What happens when the user moves? If so we also need to
            move application context. #25*

            ·It can only move the services around as fast as it can
            update the routing plane. which comes back to the point
            about service discovery (waiting for
            convergence/distribution as opposed to just updating the
            SD server) #24

            ·Whether the interests of the organization deploying the
            application and the organization providing the network
            connectivity are aligned. Google doesn't worry about this
            because they are both. #17

            oThe question is more what the scope and semantic of
            information is that will need to cross organizational
            boundaries. This needs further study, in particular when
            assuming stakeholder division between service and network
            provider.

            ·It seems impossible to satisfy that requirement
            simultaneously with the latency requirement. #15

            ·It wasn't clear that how hard of a requirement session
            persistence is. #13

            oA session usually creates ephemeral state. If execution
            changes from one (e.g., virtualized) service instance to
            another, state/context needs transfer to another. Such
            required transfer of state/context makes it desirable to
            have session persistence (or instance affinity) as the
            default, removing the need for explicit context transfer,
            while also supporting an explicit state/context transfer
            (e.g., when metrics change significantly).

            ·*Should it select UPF based on the application? Steering
            is done per user? or per application? #9*

            ·This seems to assume conventional non-distributed
            applications just running at the edge. what about modern
            frameworks like Sapphire? and Ray? #7

            oIt would be good to understand the multi-site
            requirements of such framework, which I have understood to
            mainly run in single DCs.

            ·*Relation to 3GPP UPF #6*

            ·*Relation to ALTO #5*

            ·Do the mobility issues and associated protocols are also
            in scope? There are scenarios where routing alone would
            not be sufficient. #4

            ·What is the position in the edge location regarding to
            UPF? #3

            ·Is there some sort of authorization model so that an edge
            can indicate whether or not it will provide compute
            services? #2

            ·*What is CNC and the relationship with CAN #1*

            *Measurement of the Computing Resources*(to be addressed
            by draft-du-computing-resource-representation):

            ·It is hard to use existing work to measure the
            computation, but we can optimize the latency through the
            performance monitoring. We have performance/measurement
            matrix over there. #34

            ·Clarifications on the computing resource, its
            requirements and characteristics would be helpful. #27

            ·Each application may have a different definition of
            "resources" these then have to be boiled down into a
            single topology Network Aware Computing (NAC! :) does
            scale #14

            ·Is computing resource measurable? #10

            oIt is, and how to use the measurement would be solution
            related. See IFIP Networking 2022 paper on how to simply
            expose “computing capability” and achieve better steering
            with such simple measure.

            ·Why compute resource is different with other resources? #8

            ·

            *Load Balance based solutions:*

            ·The point is that we need a standardized LB protocol #18

            ·The LB as part of the application itself is superior
            (part of the distributed application itself is to obtain
            and keep updating the "best" unicast location to use). #22

            ·If there is anything missing from current lbs that would
            prevent their use as-is? other than there is for market
            reasons no interop standard between different lbs? #12

            ·For the load balance, should it learn the network’s
            status? #11

            ·

            *Dyncast based Solution issues:*

            ·For Dyncast, when the time is short, is it possible for
            the router to decide the routing? It is too fast. #31

            ·Is dyncast proposed to encapsulate? #29

            ·Will CAN dyncast impact each and every router? How to
            avoid loops? #28

            ·What's the assumed scale of a D-router? 10 ^ 6 sessions?
            100^ 8? What's the assumed update rate? !Gb? 1Tb? #26


_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: [Dyncast] CAN BoF issues #7 #17 #32

Reply via email to